NLP ( Natural Language Processing ) can be defined as the automatic manipulation of natural languages ( text or audio) using computer algorithms and softwares. As such NLP has great potential in cognitive and artificial intelligence , but also with increasing human to machine interaction and enhancement in Machine learning ,NLP is set to revolutionize the Voice over IP space.
Note : although not obvious but some people confuse Natural language procession with Neurolinguistic pressing which is a science in Psychology.
NLP evolves from linguistics which itself is a study of language along with its semantics , phonetics and gramer. Every language has rules and NLP uses mathematical formulation to understand it. Discrete mathematical formalisms will be discussed later in this article.
Inputs for NLP is usually though conversation, speech, correspondence, reading, print, written composition, dictation, publishing, translation, lip reading, signing etc .
Rule based vs Statistical NLP – In contrast to rule based engines which work on hard preset values using maybe a decision tree , statistical models work in a more probabilistic fashion which produces more reliable results even in unfamiliar scenarios.
Linear classifier vs Convolutional Neural Nets– CNNs are powerful supervised deep learning technique. As opposed to a linear classifier whose decision boundary on feature space is linear function , CNN increases model complexity by adding more layers . tbd-
Grammer induction , lemmitization , morphological segmentation , part of speech tagging , parsing , sentence breaking , stemming , word segmentation , terminology extraction
lexical , distributional , machine translation , Named entity recognition ( NER) , natural language understanding and generation, relationship establishment , sentimental analysis , work sense disambiguation , OCR( optical Character recognition) , recognizing textual entailment
speech recognition , specch segmentation , text to speech , dialogues
automatic summarizations , conference resolution , discourse analysis
Out of above its worthy to point out few key techniques
Parts of speech (POS )
A primary tasks in NLP is to extract tokens and sentences, identify parts of speech ( like nouns , verbs , adjectives ) and create parse trees.
POS tagging is the process of marking up a word in a corpus to a corresponding part of a speech tag . By tagging, algorithm builds lemmatizers which are used to reduce a word to its root form.
POS methods significantly differs from Bag-of-words(BOW) methods which disregards semantic relation relationship and only takes into account words and their frequencies. Whereas POS takes context and definition into consideration.
POS tagging techniques include lexical , rule based , probablistic and deep learning methods.
Named entity recognition (NER)
Given a stream of text, determine which items in the text map to proper names, such as people or places, and their types such as person, location, Organization. Example for raw test as below using Spacy.io
“Hello ! My name is Atanai and I work on Solution design and architecture, developed many custom WebRTC and SIP based solutions such as telecom applications, media stream inetgration into IOT,Unified communication-collaboration ,signalling gateways ,SBC etc. I passed out from Anna university with Betch degree in 2011 and currenlty stay in Bangalore India.”
Analysis of NER is
Noun phrases: ['My name', 'Atanai', 'I', 'Solution design', 'architecture', 'many custom', 'WebRTC and SIP based solutions', 'telecom applications', 'media stream integration', 'IOT', 'Unified communication-collaboration', 'signalling gateways', 'I', 'Anna university', 'Betch degree', 'currently stay', 'Bangalore India'] Verbs: ['be', 'work', 'develop', 'base', 'signal', 'pass'] Atanai PERSON WebRTC PRODUCT SIP ORG IOT ORG Betch NORP 2011 DATE Bangalore India LOC
Understand the overall opinion, feeling, or attitude expressed in given media ( speech , text or video) .
NLP in action
Steps to obtain insights and relevant information from an unclassified document , raw tex file or speech to text content such as recording from VOIP meeting
step 1 : upload a document which could be an invoice , order , feedback , complaint or any other unstructured raw text
Step 2 : Collect the data from the document
- use OCR (optical character recognition) for hand written or signed components
- perform search , index , duplication detection etc
- can use MNIST database as
- phrase matching and vocabulary
- Can use translation APIs to trans late from other languages
Step 3 : Collect meaning-full data
- perform Part of Speech (POS) tagging and chunking process
- topic discovery and modelling
- tokenizations and text classification , obtain domain specific entities from the document
- can use standard model language to collect relevant frequently used words
- NER ( Named Entity recognition ) to validate names , places and locations
- can extract out time and date from mentioned entities
- build relationship graphs
step 4 : extract sentiments using a trained model
- utilize Regular Expressions for pattern searching
- sentiment analysis
Application of NLP find its way into many domains
1.VOIP platforms ,media servers and automatic summarization of conference / meetings like “Minutes of Meetings” to highlight the key takeaways from a VOIP session
2. Automatic essay assessment and scripting in education setting alike.
3. Image annotation using metadata describing digital images for categorizations and easy retrieval based on keywords.
4. Spam filtering
5. Building automatic assistants and chatbots with Speech Recognition and using auto suggest with sentence completion ( Siri , Alexa , google voice etc )
6. Social Media Analytics , to track sentiments about topic , figure out influencers such as for movie or restaurant reviews .
NLP in VOIP system
To know more about sound waves go here which describes fundamental characteristics of analog waves . To know more about analog wave modulation go here , this describes how waves are modulated such as frequency , phase , amplitude etc to hold information for propagation . click here to know more about digital wave modulation such as amplitude , frequency , phase shift keying etc . This section build on top of audio streams captured or live .
Classifying Call recordings
Sound waves bear multiple features such as
- Pitch – frequency of a sound wave ,
Frequencies from 20 to 20000 Hz are audible to the human ear , while dogs can hear 50 to 45000 Hz , Freq < 20Hz – infra sound Freq > 20000 Hz – ultra sound
- Loud – amplitude of sound wave
- Amplitude, Frequency, Wavelength And Timbre
- statistical – Mean, Variance, Skewness
- zero-crossing rate (ZCR) – number of times in a sound sample that the amplitude of the sound wave changes sign
- root-mean-square (RMS) –
- Spectral Centroid
- Spectral Irregularity
- Spectral Flatness
- Spectral Tonality
- Spectral Crest
- Spectral Slope
- Spectral Rolloff
- Spectral Loudness
- Spectral Pitch
- Harmonic Odd Even Ratio
- Mel Frequency Cepstral Coefficient (MFCC)
- Bark Scale etc
Based on NLP and trained models on extracted features ,an unknown audio wave can be classified and possibly identified.
Replacing auto attendants with IVR
- Spacy – https://spacy.io/
- Google Cloud Natural Language API – https://cloud.google.com/natural-language/docs/reference/rest/
- NLTK – https://www.nltk.org/
- http://marsyas.info/ – Marsyas (Music Analysis, Retrieval and Synthesis for Audio Signals) framework