- Public API endpoints
- Internal API gateways
- API Rate Limiter
- Token based Rate Limiting
- Token bucket filter
- Hierarchical Token Bucket (HTB)
- Fair Queing
- CBQ (Class Based Queing)
- Modular QoS command-Line interface (MQC) Shaping
- Throttling
VoIP manages Call setup and teardown using IP protocol. The APIs can be used to provide public or internal endpoinst to create mnage calls , conference addon services like recording , tgranscription or even do auth and heartbeat. This article lists some external programmable Call Control APIs, internal APIs for biling , health as well as Rate limitting.
Public API endpoints
Programmatic call control APIs
- Making a Call
HTTP POST https://www.altteelcom.com/voice/call
Parameters
to: '+14155551212',
from: '+18668675310'
Calback params
statusCallback: 'https://www.myapp.com/events',
statusCallbackEvent: ['initiated', 'answered'],
statusCallbackMethod: 'POST'
Response
"from": "+9999999999" "to": "+111111111", "status": "ongoing" Tmestamps "date_created": "Mon, 5 Sep 2020 20:36:28 +0000" "start_time": "Mon, 5 Sep 2020 20:36:29 +0000" "date_updated": "Mon, 5 Sep 2020 20:36:44 +0000" "direction": "outbound", "duration": "" "end_time": "" Price "price": "-0.03000" "price_unit": "USD"
The response can additional have SID and app version and other URI for recording , transcription , apyment and other services for this call .
2. Ending an ongoing Call
HTTP UPDATE https://www.altteelcom.com/voice/call/callid001
params
status: 'end'
This updates the end time of the call and sets the evenst for CDR processing
Services API
- Call Reording
- Call transcription
Confernece APIs
HTTP POST https://www.altteelcom.com/voice/conferences
- creating a conf
- fetching conf based on date or room name
- updating a ongoing conf
- ending a conf
- set IVR announcement on ongoing conf
Auth API
CDR APIs
HTTP POST https://www.altteelcom.com/cdr
- get CDR ( filtered per cal or acc to specific date or account)
- bulk export of CDR
Internal API gateways
API Rate Limiter
Noisy neighbour is when one of the clients monoplizes the bandwidth using most of the i/o or cpu or other resources which can negatively affect the performance for other users . Throttling is a good way to solve this problem by limit.
Auto scaling | Load balancer | Rate Limiter |
horizotal or vertical scalling can countger incoming traffic | LB can limit number of simultaneous requests. It can reject or send to queue for later operation | Can intelligently understand the cost of each operation and perform throttling. |
(-) takes time to scale out thus cannot solve noisy neighbour problem immediately | (-) but the LB’s behaviour is indiscriminate ( cannot distinguish between the cost of diff operations) (-) LB cannot ensure uniform distribution of distribution of operations among all servers. |
A rate limiter should have low latency, accurate and scalable.
RateLimiter inside the serviceprocess | Rate Limiter as its own process outside as a daemon |
(+) faster , no IPC (+) reisstnt to interprocess call failures | (+) programming langiage agnostic daemon (+) uses its own memory space, more predictable |
(-) service meory needs to allocate space for rate limiters | |
widely used for auto discovery of service host |

Token based Rate Limiting
provides admission contro
Token bucket filter
define a users quota in terms average rate and burst capacity
Hierarchical Token Bucket ( HTB)
uses the deficit round-robin algorithm for fair queuing
Fair Queing
give paying users a bandwidth fraction of 25%
priority queuing
decide 1 packet/ms for free or reduce rate user
distributes that sender’s bandwidth among the other senders
CBQ ( Class Based Queing)
Shaping is performed using link idle time calculations based on the timing of dequeue events and underlying link bandwidth. Input classes that tried to send too much were restricted, unless the node was permitted to “borrow” bandwidth from a sibling.
Modular QoS command-Line interface (MQC) Shaping
mplement traffic shaping for a specific type of traffic using a traffic policy
- When the rate of packets matching the specified traffic classifier exceeds the rate limit, the device buffers the excess packets.
- When there are sufficient tokens in the token bucket, the device forwards the buffered packets at an even rate.
- When the buffer queue is full, the device discards the buffered packets.
Throttling
- delay the packet until the bucket is ready / shaping
- drop the packet / Policing
- mark the packet as non-compliant
Failure management on Rate Limiter
- Node Crash : just less requests trolled
- Leaky bucket
- tokens can go into -ve
System Design for API gateway
Important points for design API gateway
- Serialize data in company binary format
- allocate buffer in memory and build frequency count hash table and flash once full or based on time to calculate counters
- aggregation on API gateway on the fly

Frontend Service | Partitioned Service | Backend Service |
Lightweight web service Stateless Request Validation Auth / Authorization TLS(SSL ) termination Server sode encryption Caching Rate Limiting(throttling) Request deduplication | Caching layer between frontend and backend | Replication Leader Selection + Quorem |

Distributed messaging system( fast and slow paths) for API
A distributed messahing system such as Apache kafka or AWs kinesis, internally splits a msg accross serveral partitions where each parition can be placed on a single shard in a seprate machine on a clustered system.

Applications of this system design
- Find heavy hitters ( Top K problem )
- Popular products / trends
- Voltaile stocks
- DDoS Attack Prevention
References :