VoIP API design

  • Public API endpoints
  • Internal API gateways
  • API Rate Limiter
    • Token based Rate Limiting
    • Token bucket filter
    • Hierarchical Token Bucket (HTB)
    • Fair Queing
    • CBQ (Class Based Queing)
    • Modular QoS command-Line interface (MQC) Shaping
  • Throttling

VoIP manages Call setup and teardown using IP protocol. The APIs can be used to provide public or internal endpoinst to create mnage calls , conference addon services like recording , tgranscription or even do auth and heartbeat. This article lists some external programmable Call Control APIs, internal APIs for biling , health as well as Rate limitting.

Public API endpoints

Programmatic call control APIs

  1. Making a Call

HTTP POST https://www.altteelcom.com/voice/call


to: '+14155551212',
from: '+18668675310'

Calback params

statusCallback: 'https://www.myapp.com/events',
statusCallbackEvent: ['initiated', 'answered'],
statusCallbackMethod: 'POST'


"from": "+9999999999"
"to": "+111111111",
"status": "ongoing"

"date_created": "Mon, 5 Sep 2020 20:36:28 +0000"
"start_time": "Mon, 5 Sep 2020 20:36:29 +0000"
"date_updated": "Mon, 5 Sep 2020 20:36:44 +0000"
"direction": "outbound",
"duration": ""
"end_time": ""

"price": "-0.03000"
"price_unit": "USD"

The response can additional have SID and app version and other URI for recording , transcription , apyment and other services for this call .

2. Ending an ongoing Call

HTTP UPDATE https://www.altteelcom.com/voice/call/callid001


status: 'end'

This updates the end time of the call and sets the evenst for CDR processing

Services API

  • Call Reording
  • Call transcription

Confernece APIs
HTTP POST https://www.altteelcom.com/voice/conferences

  • creating a conf
  • fetching conf based on date or room name
  • updating a ongoing conf
  • ending a conf
  • set IVR announcement on ongoing conf

Auth API


HTTP POST https://www.altteelcom.com/cdr

  • get CDR ( filtered per cal or acc to specific date or account)
  • bulk export of CDR

Internal API gateways

API Rate Limiter

Noisy neighbour is when one of the clients monoplizes the bandwidth using most of the i/o or cpu or other resources which can negatively affect the performance for other users . Throttling is a good way to solve this problem by limit.

Auto scaling Load balancerRate Limiter
horizotal or vertical scalling can countger incoming trafficLB can limit number of simultaneous requests. It can reject or send to queue for later operationCan intelligently understand the cost of each operation and perform throttling.
(-) takes time to scale out thus cannot solve noisy neighbour problem immediately(-) but the LB’s behaviour is indiscriminate ( cannot distinguish between the cost of diff operations)
(-) LB cannot ensure uniform distribution of distribution of operations among all servers.

A rate limiter should have low latency, accurate and scalable.

RateLimiter inside the serviceprocessRate Limiter as its own process outside as a daemon
(+) faster , no IPC
(+) reisstnt to interprocess call failures
(+) programming langiage agnostic daemon
(+) uses its own memory space, more predictable
(-) service meory needs to allocate space for rate limiters
widely used for auto discovery of service host

Token based Rate Limiting

 provides admission contro

Token bucket filter

define a users quota in terms average rate and burst capacity

Hierarchical Token Bucket ( HTB)

 uses the deficit round-robin algorithm for fair queuing

Fair Queing

give paying users a bandwidth fraction of 25%

priority queuing

decide 1 packet/ms for free or reduce rate user

distributes that sender’s bandwidth among the other senders

CBQ ( Class Based Queing)

Shaping is performed using link idle time calculations based on the timing of dequeue events and underlying link bandwidth. Input classes that tried to send too much were restricted, unless the node was permitted to “borrow” bandwidth from a sibling.

Modular QoS command-Line interface (MQC) Shaping

mplement traffic shaping for a specific type of traffic using a traffic policy

  • When the rate of packets matching the specified traffic classifier exceeds the rate limit, the device buffers the excess packets.
  • When there are sufficient tokens in the token bucket, the device forwards the buffered packets at an even rate.
  • When the buffer queue is full, the device discards the buffered packets.


  • delay the packet until the bucket is ready / shaping
  • drop the packet / Policing
  • mark the packet as non-compliant

Failure management on Rate Limiter

  • Node Crash : just less requests trolled
  • Leaky bucket
  • tokens can go into -ve

System Design for API gateway

Important points for design API gateway

  • Serialize data in company binary format
  • allocate buffer in memory and build frequency count hash table and flash once full or based on time to calculate counters
  • aggregation on API gateway on the fly
Frontend ServicePartitioned ServiceBackend Service
Lightweight web service
Request Validation
Auth / Authorization
TLS(SSL ) termination
Server sode encryption
Rate Limiting(throttling)
Request deduplication
Caching layer between frontend and backend
Leader Selection + Quorem

Distributed messaging system( fast and slow paths) for API

A distributed messahing system such as Apache kafka or AWs kinesis, internally splits a msg accross serveral partitions where each parition can be placed on a single shard in a seprate machine on a clustered system.

Applications of this system design

  • Find heavy hitters ( Top K problem )
  • Popular products / trends
  • Voltaile stocks
  • DDoS Attack Prevention

References :

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.