Audio and Acoustic Signal Processing

Audio signals are electronic representations of sound waves—longitudinal waves which travel through air, consisting of compressions and rarefactions and Audio Signal Processing focuses on the computational methods for intentionally altering auditory signals or sounds, in order to achieve a particular goal.

Application of audio Signal processing in general

storage
data compression
music information retrieval
speech processing ( emotion recognition/sentiment analysis , NLP)
localization
acoustic detection
Transmission / Broadcasting – enhance their fidelity or optimize for bandwidth or latency.
noise cancellation
acoustic fingerprinting
sound recognition ( speaker Identification , biometric speech verification , voice commands )
synthesis – electronic generation of audio signals. Speech synthesisers can generate human like speech.
enhancement (e.g. equalization, filtering, level compression, echo and reverb removal or addition, etc.)

Effects for audio streams processing

delay or echo
To simulate reverberation effect, one or several delayed signals are added to the original signal. To be perceived as echo, the delay has to be of order 35 milliseconds or above.
Implemented using tape delays or bucket-brigade devices.
flanger
delayed signal is added to the original signal with a continuously variable delay (usually smaller than 10 ms).
signal would fall out-of-phase with its partner, producing a phasing comb filter effect and then speed up until it was back in phase with the master
phaser
signal is split, a portion is filtered with a variable all-pass filter to produce a phase-shift, and then the unfiltered and filtered signals are mixed to produce a comb filter.
chorus
delayed version of the signal is added to the original signal. above 5 ms to be audible. Often, the delayed signals will be slightly pitch shifted to more realistically convey the effect of multiple voices.
equalization
frequency response is adjusted using audio filter(s) to produce desired spectral characteristics. Frequency ranges can be emphasized or attenuated using low-pass, high-pass, band-pass or band-stop filters.
overdrive effects such as the use of a fuzz box can be used to produce distorted sounds, such as for imitating robotic voices or to simulate distorted radiotelephone traffic
pitch shift
shifts a signal up or down in pitch. For example, a signal may be shifted an octave up or down. This is usually applied to the entire signal, and not to each note separately. Blending the original signal with shifted duplicate(s) can create harmonies from one voice.
time stretching
changing the speed of an audio signal without affecting its pitch.
resonators
emphasize harmonic frequency content on specified frequencies. These may be created from parametric EQs or from delay-based comb-filters.
modulation
change the frequency or amplitude of a carrier signal in relation to a predefined signal.
compression
reduction of the dynamic range of a sound to avoid unintentional fluctuation in the dynamics. Level compression is not to be confused with audio data compression, where the amount of data is reduced without affecting the amplitude of the sound it represents.
3D audio effects
place sounds outside the stereo basis
reverse echo
swelling effect created by reversing an audio signal and recording echo and/or delay while the signal runs in reverse.
wave field synthesis
spatial audio rendering technique for the creation of virtual acoustic environments

ASP application in Telephony and mobile phones, by ITU (International Telegraph Union)

Acoustic echo control
aims to eliminate the acoustic feedback, which is particularly problematic in the speakerphone use-case during bidirectional voice
Noise control
microphone doesn’t only pick up the desired speech signal, but often also unwanted background noise. Noise control tries to minimize those unwanted signals . Multi-microphone AASP, has enabled the suppression of directional interferers.
Gain control
how loud a speech signal should be when leaving a telephony transmitter as well as when it is being played back at the receiver. Implemented either statically during the handset design stage or automatically/adaptively during operation in real-time.
Linear filtering
ITU defines an acceptable timbre range for optimum speech intelligibility. AASP in the form of linear filtering can help the handset manufacturer to meet these requirements.
Speech coding: from analog POTS based call to G.711 narrowband (approximately 300 Hz to 3.4 kHz) speech coder is a big leap in terms of call capacity. other speech coders with varying tradeoffs between compression ratio, speech quality, and computational complexity have been also made available. AASP provides higher quality wideband speech (approximately 150 Hz to 7 kHz).

ASP applications in music playback

AASP is used to provide audio post-processing and audio decoding capabilities for mobile media consumption needs, such as listening to music, watching videos, and gaming

Post-processing
techniques as equalization and filtering allow the user to adjust the timbre of the audio such as bass boost and parametric equalization. Other techniques like adding reverberation, pitch shift, time stretching etc
Audio (de)coding: audio contianers like mp3 and AAC define how music is distributed, stored, and consumed also in Online music streaming services

ASP for virtual assitants

Virtual Assistance include a variety of servies from Apple’s Siri, Microsoft’s Cortana , Google’s Now , Alexa etc. ASP is used in

Speech enhancement
multi-microphone speech pickup using beamforming and noise suppression to isolate the desired speech prior to forwarding it to the speech recognition engine.
Speech recognition (speech-to-text): this draws ideas from multiple disciplinary fields including linguistics, computer science, and AASP. Ongoing work in acoustic modeling is a major contribution to recognition accuracy improvement in speech recognition by AASP.
Speech synthesis (text-to-speech): this technology has come a very long way from its very robotic sounding introduction in the 1930s to making synthesized speech sound more and more natural.

Other areas of ASP

Virtual reality (VR) like VR headset / gaming simulators use three-dimensional soundfield acquisition and representation like Ambisonics (also known as B-format).

Ref :
wikipedia – https://en.wikipedia.org/wiki/Audio_signal_processing
IEEE – https://signalprocessingsociety.org/publications-resources/blog/audio-and-acoustic-signal-processing%E2%80%99s-major-impact-smartphones

M	T	W	T	F	S	S
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29
30

	Anonymous on NAT traversal using STUN and…
	Anonymous on VoIP/ OTT / Telecom Solution s…
	What is IPTV Player… on IPTV ( Internet Based Televisi…
	Anonymous on Proxying Media Streams via Kam…
	Anonymous on Proxying Media Streams via Kam…
	WebRTC 安全之道 –… on WebRTC Security Architecture
	Boris Ivanov on Asterisk – installation…

Share this:

Related

Leave a comment Cancel reply