
Modern applications live and die by their APIs. Mobile apps, web apps, partners, and even internal services constantly hit APIs — often thousands or millions of times per minute.
Without protection, a single bug, bot, or malicious actor can bring an entire system down.
This is where rate limiting comes in.
In this blog, we’ll break down — very practically — what rate limiting is, why it matters, how it works internally, and how real companies use it to survive abuse and traffic spikes.
1. The Core Problem: APIs Are Fragile at Scale
APIs look simple:
- Request comes in
- Server processes it
- Response goes out
But at scale:
- Servers have CPU limits
- Databases have connection limits
- Third-party services have quotas
If requests spike uncontrollably, systems don’t slow down gracefully — they collapse.
2. Real-World API Abuse Scenarios
Scenario 1: Buggy Client
A mobile app releases an update with a bug:
- API is called every second instead of once per minute
- Millions of users update the app
Result:
- Backend overload
- Database connection exhaustion
- System outage
Scenario 2: Malicious Scraping
- Bots aggressively scrape public APIs
- No authentication, no limits
Result:
- Infrastructure cost explodes
- Real users face latency and errors
Scenario 3: Accidental DDoS
Not all attacks are intentional.
- A retry loop without backoff
- Network timeout triggers infinite retries
Result:
- Traffic multiplies itself
- System attacks itself
3. What Rate Limiting Actually Does
Rate limiting defines how many requests are allowed:
- Per user
- Per IP
- Per API key
- Per service
Within a given time window.
If the limit is exceeded:
- Requests are rejected
- Or delayed
- Or downgraded
This protects core systems.
4. Why Blocking Users Is a Bad Idea
Naively blocking traffic causes:
- Legitimate users to get locked out
- Poor user experience
- Support escalations
Modern rate limiting focuses on:
- Fair usage
- Gradual throttling
- Graceful degradation
5. Token Bucket Algorithm (Most Popular)
How it works
- A bucket holds tokens
- Tokens are added at a fixed rate
- Each request consumes one token
If tokens exist → request allowed
If bucket empty → request rejected
Why companies love it
- Allows short bursts
- Enforces long-term limits
- Simple to implement
Real-world example
- 100 requests per minute
- Burst allowed up to 20 extra
Perfect for mobile apps and user-facing APIs.
6. Leaky Bucket Algorithm (Smooth Traffic)
How it works
- Requests enter a bucket
- Bucket leaks at a constant rate
- Excess requests overflow and are dropped
Why it’s useful
- Keeps traffic smooth
- Prevents sudden spikes
Tradeoff
- Less flexible than token bucket
- Not ideal for bursty user behavior
Often used in network-level traffic shaping.
7. Where Rate Limiting Is Implemented
At API Gateway
- First line of defense
- Stops traffic before hitting backend
At Load Balancer
- IP-based protection
- Coarse-grained limits
At Application Level
- User-aware logic
- Business rules
Large systems use multiple layers.
8. How Companies Stop DDoS Without Blocking Users
Modern systems:
- Identify abnormal patterns
- Apply stricter limits dynamically
- Allow normal behavior to pass
Techniques include:
- Adaptive rate limiting
- Progressive throttling
- CAPTCHA challenges (last resort)
Goal: Protect the system, not punish users.
9. Rate Limiting and Distributed Systems
Challenges:
- Multiple servers
- Shared limits
Solutions:
- Centralized counters (Redis)
- Consistent hashing
- Approximate algorithms
Accuracy is traded for performance.
10. Common Mistakes Teams Make
- Using fixed limits for all users
- No backoff strategy
- Applying limits too late in the stack
- No monitoring of rejected requests
Rate limiting without observability is dangerous.
11. Interview Perspective
If asked:
“How do you protect APIs?”
Strong answer includes:
- Rate limiting
- Token bucket vs leaky bucket
- Graceful degradation
- Multi-layer defense
This shows real-world thinking.
Final Thoughts
Rate limiting is not about blocking traffic.
It’s about:
- Fairness
- Stability
- Cost control
- User trust
Behind every reliable API is a carefully designed rate limiting strategy.
Without it, even the best system eventually destroys itself.