Audio signals are electronic representations of sound waves—longitudinal waves which travel through air, consisting of compressions and rarefactions and Audio Signal Processing focuses on the computational methods for intentionally altering auditory signals or sounds, in order to achieve a particular goal.
Application of audio Signal processing in general
- data compression
- music information retrieval
- speech processing ( emotion recognition/sentiment analysis , NLP)
- acoustic detection
- Transmission / Broadcasting – enhance their fidelity or optimize for bandwidth or latency.
- noise cancellation
- acoustic fingerprinting
- sound recognition ( speaker Identification , biometric speech verification , voice commands )
- synthesis – electronic generation of audio signals. Speech synthesisers can generate human like speech.
- enhancement (e.g. equalization, filtering, level compression, echo and reverb removal or addition, etc.)
Effects for audio streams processing
- delay or echo
To simulate reverberation effect, one or several delayed signals are added to the original signal. To be perceived as echo, the delay has to be of order 35 milliseconds or above.
Implemented using tape delays or bucket-brigade devices.
delayed signal is added to the original signal with a continuously variable delay (usually smaller than 10 ms).
signal would fall out-of-phase with its partner, producing a phasing comb filter effect and then speed up until it was back in phase with the master
signal is split, a portion is filtered with a variable all-pass filter to produce a phase-shift, and then the unfiltered and filtered signals are mixed to produce a comb filter.
delayed version of the signal is added to the original signal. above 5 ms to be audible. Often, the delayed signals will be slightly pitch shifted to more realistically convey the effect of multiple voices.
frequency response is adjusted using audio filter(s) to produce desired spectral characteristics. Frequency ranges can be emphasized or attenuated using low-pass, high-pass, band-pass or band-stop filters.
overdrive effects such as the use of a fuzz box can be used to produce distorted sounds, such as for imitating robotic voices or to simulate distorted radiotelephone traffic
- pitch shift
shifts a signal up or down in pitch. For example, a signal may be shifted an octave up or down. This is usually applied to the entire signal, and not to each note separately. Blending the original signal with shifted duplicate(s) can create harmonies from one voice.
- time stretching
changing the speed of an audio signal without affecting its pitch.
emphasize harmonic frequency content on specified frequencies. These may be created from parametric EQs or from delay-based comb-filters.
change the frequency or amplitude of a carrier signal in relation to a predefined signal.
reduction of the dynamic range of a sound to avoid unintentional fluctuation in the dynamics. Level compression is not to be confused with audio data compression, where the amount of data is reduced without affecting the amplitude of the sound it represents.
- 3D audio effects
place sounds outside the stereo basis
- reverse echo
swelling effect created by reversing an audio signal and recording echo and/or delay while the signal runs in reverse.
- wave field synthesis
spatial audio rendering technique for the creation of virtual acoustic environments
ASP application in Telephony and mobile phones, by ITU (International Telegraph Union)
- Acoustic echo control
aims to eliminate the acoustic feedback, which is particularly problematic in the speakerphone use-case during bidirectional voice
- Noise control
microphone doesn’t only pick up the desired speech signal, but often also unwanted background noise. Noise control tries to minimize those unwanted signals . Multi-microphone AASP, has enabled the suppression of directional interferers.
- Gain control
how loud a speech signal should be when leaving a telephony transmitter as well as when it is being played back at the receiver. Implemented either statically during the handset design stage or automatically/adaptively during operation in real-time.
- Linear filtering
ITU defines an acceptable timbre range for optimum speech intelligibility. AASP in the form of linear filtering can help the handset manufacturer to meet these requirements.
- Speech coding: from analog POTS based call to G.711 narrowband (approximately 300 Hz to 3.4 kHz) speech coder is a big leap in terms of call capacity. other speech coders with varying tradeoffs between compression ratio, speech quality, and computational complexity have been also made available. AASP provides higher quality wideband speech (approximately 150 Hz to 7 kHz).
ASP applications in music playback
AASP is used to provide audio post-processing and audio decoding capabilities for mobile media consumption needs, such as listening to music, watching videos, and gaming
techniques as equalization and filtering allow the user to adjust the timbre of the audio such as bass boost and parametric equalization. Other techniques like adding reverberation, pitch shift, time stretching etc
- Audio (de)coding: audio contianers like mp3 and AAC define how music is distributed, stored, and consumed also in Online music streaming services
ASP for virtual assitants
Virtual Assistance include a variety of servies from Apple’s Siri, Microsoft’s Cortana , Google’s Now , Alexa etc. ASP is used in
- Speech enhancement
multi-microphone speech pickup using beamforming and noise suppression to isolate the desired speech prior to forwarding it to the speech recognition engine.
- Speech recognition (speech-to-text): this draws ideas from multiple disciplinary fields including linguistics, computer science, and AASP. Ongoing work in acoustic modeling is a major contribution to recognition accuracy improvement in speech recognition by AASP.
- Speech synthesis (text-to-speech): this technology has come a very long way from its very robotic sounding introduction in the 1930s to making synthesized speech sound more and more natural.
Other areas of ASP
- Virtual reality (VR) like VR headset / gaming simulators use three-dimensional soundfield acquisition and representation like Ambisonics (also known as B-format).
wikipedia – https://en.wikipedia.org/wiki/Audio_signal_processing
IEEE – https://signalprocessingsociety.org/publications-resources/blog/audio-and-acoustic-signal-processing%E2%80%99s-major-impact-smartphones