VoIP API design

Public API endpoints
Internal API gateways
API Rate Limiter
- Token based Rate Limiting
- Token bucket filter
- Hierarchical Token Bucket (HTB)
- Fair Queing
- CBQ (Class Based Queing)
- Modular QoS command-Line interface (MQC) Shaping
Throttling

VoIP manages Call setup and teardown using IP protocol. The APIs can be used to provide public or internal endpoinst to create mnage calls , conference addon services like recording , tgranscription or even do auth and heartbeat. This article lists some external programmable Call Control APIs, internal APIs for biling , health as well as Rate limitting.

Public API endpoints

Programmatic call control APIs

Making a Call

HTTP POST https://www.altteelcom.com/voice/call

Parameters

to: '+14155551212',
from: '+18668675310'

Calback params

statusCallback: 'https://www.myapp.com/events',
statusCallbackEvent: ['initiated', 'answered'],
statusCallbackMethod: 'POST'

Response

"from": "+9999999999"
"to": "+111111111",
"status": "ongoing"

Tmestamps
"date_created": "Mon, 5 Sep 2020 20:36:28 +0000"
"start_time": "Mon, 5 Sep 2020 20:36:29 +0000"
"date_updated": "Mon, 5 Sep 2020 20:36:44 +0000"
"direction": "outbound",
"duration": ""
"end_time": ""

Price
"price": "-0.03000"
"price_unit": "USD"

The response can additional have SID and app version and other URI for recording , transcription , apyment and other services for this call .

2. Ending an ongoing Call

HTTP UPDATE https://www.altteelcom.com/voice/call/callid001

params

status: 'end'

This updates the end time of the call and sets the evenst for CDR processing

Services API

Call Reording
Call transcription

Confernece APIs
HTTP POST https://www.altteelcom.com/voice/conferences

creating a conf
fetching conf based on date or room name
updating a ongoing conf
ending a conf
set IVR announcement on ongoing conf

Auth API

CDR APIs

HTTP POST https://www.altteelcom.com/cdr

get CDR ( filtered per cal or acc to specific date or account)
bulk export of CDR

Internal API gateways

API Rate Limiter

Noisy neighbour is when one of the clients monoplizes the bandwidth using most of the i/o or cpu or other resources which can negatively affect the performance for other users . Throttling is a good way to solve this problem by limit.

Auto scaling	Load balancer	Rate Limiter
horizotal or vertical scalling can countger incoming traffic	LB can limit number of simultaneous requests. It can reject or send to queue for later operation	Can intelligently understand the cost of each operation and perform throttling.
(-) takes time to scale out thus cannot solve noisy neighbour problem immediately	(-) but the LB’s behaviour is indiscriminate ( cannot distinguish between the cost of diff operations) (-) LB cannot ensure uniform distribution of distribution of operations among all servers.

A rate limiter should have low latency, accurate and scalable.

RateLimiter inside the serviceprocess	Rate Limiter as its own process outside as a daemon
(+) faster , no IPC (+) reisstnt to interprocess call failures	(+) programming langiage agnostic daemon (+) uses its own memory space, more predictable
(-) service meory needs to allocate space for rate limiters
	widely used for auto discovery of service host

Token based Rate Limiting

provides admission contro

Token bucket filter

define a users quota in terms average rate and burst capacity

Hierarchical Token Bucket ( HTB)

uses the deficit round-robin algorithm for fair queuing

Fair Queing

give paying users a bandwidth fraction of 25%

priority queuing

decide 1 packet/ms for free or reduce rate user

distributes that sender’s bandwidth among the other senders

CBQ ( Class Based Queing)

Shaping is performed using link idle time calculations based on the timing of dequeue events and underlying link bandwidth. Input classes that tried to send too much were restricted, unless the node was permitted to “borrow” bandwidth from a sibling.

Modular QoS command-Line interface (MQC) Shaping

mplement traffic shaping for a specific type of traffic using a traffic policy

When the rate of packets matching the specified traffic classifier exceeds the rate limit, the device buffers the excess packets.
When there are sufficient tokens in the token bucket, the device forwards the buffered packets at an even rate.
When the buffer queue is full, the device discards the buffered packets.

Throttling

delay the packet until the bucket is ready / shaping
drop the packet / Policing
mark the packet as non-compliant

Failure management on Rate Limiter

Node Crash : just less requests trolled
Leaky bucket
tokens can go into -ve

System Design for API gateway

Important points for design API gateway

Serialize data in company binary format
allocate buffer in memory and build frequency count hash table and flash once full or based on time to calculate counters
aggregation on API gateway on the fly

Frontend Service	Partitioned Service	Backend Service
Lightweight web service Stateless Request Validation Auth / Authorization TLS(SSL ) termination Server sode encryption Caching Rate Limiting(throttling) Request deduplication	Caching layer between frontend and backend	Replication Leader Selection + Quorem

Distributed messaging system( fast and slow paths) for API

A distributed messahing system such as Apache kafka or AWs kinesis, internally splits a msg accross serveral partitions where each parition can be placed on a single shard in a seprate machine on a clustered system.

Applications of this system design

Find heavy hitters ( Top K problem )
Popular products / trends
Voltaile stocks
DDoS Attack Prevention

References :

	Boris Ivanov on Asterisk – installation…
	Paras Kumar on Hosted IP-PBX and SBC
	altanai on Hosted IP-PBX and SBC
	Debra Olsen on Streaming / broadcasting Live…
	Things to know about… on WebRTC
	Hugo K on FreeSwitch SIP and Media …
	Bert H on Evolution of voice Commun…