Introduction to Rate Limiting
Rate limiting is a technique used to control the amount of incoming and outgoing traffic to or from a network, server, or service. In the context of APIs, rate limiting restricts how many requests a client can make in a given time period.
Think of rate limiting like a nightclub with a maximum capacity. The bouncer (rate limiter) only allows a certain number of people in at a time, and once the club reaches capacity, newcomers have to wait until someone leaves before they can enter. This ensures the club doesn't get overcrowded and everyone inside has a good experience.
Why Rate Limiting Matters
Rate limiting serves several important purposes in API management:
- Preventing Abuse: Protects against malicious users attempting to overwhelm your service
- Resource Protection: Ensures that a single user or client can't consume all available resources
- Cost Control: Helps manage infrastructure costs by preventing excessive usage
- Traffic Shaping: Maintains consistent service performance during traffic spikes
- Compliance: Helps meet SLAs (Service Level Agreements) and regulatory requirements
- Fair Usage: Provides equitable access to all users of your API
Without rate limiting, your API is vulnerable to various issues like:
- Denial of Service (DoS): Malicious actors can overload your service with requests
- Degraded Performance: Excessive requests can slow down your service for all users
- "Noisy Neighbor" Problem: One aggressive client impacts service quality for others
- Resource Exhaustion: Heavy traffic can deplete server resources like memory, CPU, or database connections
- Cascading Failures: Overloaded services can trigger failures in dependent systems
Rate Limiting vs Throttling
The terms "rate limiting" and "throttling" are often used interchangeably, but there are subtle differences in how they're applied:
- Rate Limiting: Sets hard limits on the number of requests allowed in a time period. Requests exceeding the limit are rejected (with a 429 Too Many Requests HTTP status code).
- Throttling: Controls the rate at which requests are processed, often by queuing and delaying rather than rejecting them outright.
Using our nightclub analogy:
- Rate Limiting: The bouncer refuses entry to anyone when the club is at capacity ("Come back later")
- Throttling: The bouncer creates a queue and lets people in gradually as others leave ("Wait in line")
In practice, many APIs implement a combination of both approaches - using rate limiting to set maximum thresholds and throttling to smooth out traffic patterns.
Rate Limiting vs Quota Management
It's also worth distinguishing rate limiting from quota management:
- Rate Limiting: Controls the frequency of requests over short time periods (seconds, minutes, hours)
- Quota Management: Controls the total number of requests over longer periods (days, months) - often aligned with billing cycles
For example, an API might have both:
- A rate limit of 100 requests per minute (for traffic smoothing)
- A monthly quota of 100,000 requests (for billing purposes)
Rate Limiting Algorithms
There are several common algorithms used to implement rate limiting, each with its own advantages and use cases:
Fixed Window Counter
The simplest rate limiting approach that counts requests in fixed time windows (e.g., 100 requests per hour).
(0-60 mins)"] B --> D["Window 2
(60-120 mins)"] B --> E["Window 3
(120-180 mins)"] F[Requests] --> G{Counter} G --> C C --> H{Limit Check} H -->|Under Limit| I[Accept] H -->|Over Limit| J[Reject]
How it works:
- Divide time into fixed windows (e.g., 1-hour blocks)
- Count requests in the current window
- Reset the counter at the beginning of each window
- Reject requests if the counter exceeds the limit
// Simple fixed window rate limiter in Redis
function checkRateLimit(userId, limitPerHour) {
// Get the current hour as a timestamp (floor to hour)
const currentHour = Math.floor(Date.now() / 3600000);
// Create a key that includes the user and the time window
const key = `ratelimit:${userId}:${currentHour}`;
// Get the current count for this window
redis.get(key, (err, count) => {
if (err) {
// Handle error, perhaps default to allowing the request
return true;
}
// If no count exists or it's below the limit
if (!count || parseInt(count) < limitPerHour) {
// Increment the counter
redis.incr(key);
// Set expiry if this is a new key (to clean up old keys)
if (!count) {
redis.expire(key, 3600); // Expire after 1 hour
}
return true; // Allow the request
} else {
return false; // Reject the request
}
});
}
Advantages:
- Simple to understand and implement
- Low memory usage
- Works well for predictable traffic patterns
Disadvantages:
- Vulnerable to "burst" traffic at window boundaries ("edge spike" problem)
- A user could make 100 requests at 1:59 PM and then another 100 at 2:01 PM (200 requests in 2 minutes)
- Does not account for the distribution of requests within the window
Sliding Window Counter
An improved version of the fixed window approach that smooths out the boundaries between windows.
(Weighted)"] B --> D["Previous Window
(Weighted)"] C --> E{Weighted Sum} D --> E E --> F{Limit Check} F -->|Under Limit| G[Accept] F -->|Over Limit| H[Reject]
How it works:
- Track requests in the current and previous fixed windows
- Calculate a weighted sum based on how far into the current window we are
- Reject requests if the weighted sum exceeds the limit
// Sliding window rate limiter in Redis
function checkRateLimit(userId, limitPerHour) {
// Get the current and previous hour
const currentHour = Math.floor(Date.now() / 3600000);
const previousHour = currentHour - 1;
// Keys for the current and previous windows
const currentKey = `ratelimit:${userId}:${currentHour}`;
const previousKey = `ratelimit:${userId}:${previousHour}`;
// Get counts for both windows
redis.mget([currentKey, previousKey], (err, results) => {
if (err) {
return true; // Default to allowing on error
}
const currentCount = parseInt(results[0]) || 0;
const previousCount = parseInt(results[1]) || 0;
// Calculate how far into the current window we are (0 to 1)
const windowPosition = (Date.now() % 3600000) / 3600000;
// Calculate the weighted sum
// Current window counts fully, previous window counts less as we progress
const weightedSum = currentCount + previousCount * (1 - windowPosition);
if (weightedSum < limitPerHour) {
// Increment current window counter
redis.incr(currentKey);
// Set expiry for automatic cleanup
redis.expire(currentKey, 7200); // 2 hours (we need previous window too)
return true; // Allow request
} else {
return false; // Reject request
}
});
}
Advantages:
- Smooths out traffic at window boundaries
- Prevents the "edge spike" problem
- Still relatively simple to implement
Disadvantages:
- Requires tracking two windows
- Slightly higher computational cost
- Still allows some bursting (but less than fixed window)
Token Bucket Algorithm
A flexible algorithm that models rate limiting as tokens in a bucket, with tokens being refilled at a constant rate.
at Fixed Rate] --> A
How it works:
- Maintain a "bucket" of tokens (the maximum burst capacity)
- Add new tokens to the bucket at a fixed rate (the sustained rate limit)
- Each request consumes one or more tokens
- Reject or delay requests when no tokens are available
// Token bucket implementation in JavaScript
class TokenBucket {
constructor(capacity, fillRate) {
this.capacity = capacity; // Maximum tokens the bucket can hold
this.fillRate = fillRate; // Tokens added per second
this.tokens = capacity; // Start with a full bucket
this.lastFilled = Date.now(); // Last time we refilled the bucket
}
// Refill the bucket based on elapsed time
refill() {
const now = Date.now();
const elapsedSeconds = (now - this.lastFilled) / 1000;
// Calculate new tokens to add
const newTokens = elapsedSeconds * this.fillRate;
if (newTokens > 0) {
this.tokens = Math.min(this.capacity, this.tokens + newTokens);
this.lastFilled = now;
}
}
// Try to consume tokens
tryConsume(tokensToConsume = 1) {
this.refill();
if (this.tokens >= tokensToConsume) {
this.tokens -= tokensToConsume;
return true; // Request allowed
}
return false; // Request rejected
}
// Get the waiting time until enough tokens are available
getWaitTime(tokensToConsume = 1) {
this.refill();
if (this.tokens >= tokensToConsume) {
return 0; // No need to wait
}
// Calculate time to wait in milliseconds
const tokensNeeded = tokensToConsume - this.tokens;
return (tokensNeeded / this.fillRate) * 1000;
}
}
// Usage example
const rateLimiter = new TokenBucket(10, 1); // 10 tokens capacity, 1 token per second
function handleRequest(req, res) {
if (rateLimiter.tryConsume()) {
// Process the request
processRequest(req, res);
} else {
// Get wait time for the next available token
const waitTime = rateLimiter.getWaitTime();
// Set retry-after header (in seconds)
res.setHeader('Retry-After', Math.ceil(waitTime / 1000));
// Return 429 Too Many Requests
res.status(429).json({
error: 'Rate limit exceeded',
retryAfter: Math.ceil(waitTime / 1000)
});
}
}
Advantages:
- Allows for bursts of traffic (up to the bucket capacity)
- Enforces a consistent long-term rate
- Very flexible and can be adjusted for different traffic patterns
- Can track token accumulation over time (for infrequent users)
Disadvantages:
- More complex to implement
- Requires more state (token count and last update time)
- May need fine-tuning of capacity and refill rate
Leaky Bucket Algorithm
Models rate limiting as a bucket with a constant "leak" rate, smoothing out bursts of traffic.
How it works:
- Maintain a queue (the "bucket") of incoming requests
- Process requests from the queue at a fixed rate
- If the bucket is full, new requests are rejected
- Otherwise, they're added to the queue for processing
// Leaky bucket implementation
class LeakyBucket {
constructor(capacity, leakRate) {
this.capacity = capacity; // Maximum queue size
this.leakRate = leakRate; // Requests processed per second
this.queue = []; // Request queue
this.lastLeaked = Date.now(); // Last time we processed requests
this.processing = false; // Flag to prevent multiple processing loops
}
// Try to add a request to the bucket
addRequest(request) {
// First leak any requests that should have been processed
this.leak();
// If there's room in the bucket, add the request
if (this.queue.length < this.capacity) {
this.queue.push(request);
// Start processing if not already running
if (!this.processing) {
this.processQueue();
}
return true; // Request accepted
}
return false; // Request rejected (bucket full)
}
// Leak requests from the bucket based on elapsed time
leak() {
const now = Date.now();
const elapsedSeconds = (now - this.lastLeaked) / 1000;
// Calculate how many requests should have been processed
const leakedCount = Math.floor(elapsedSeconds * this.leakRate);
if (leakedCount > 0) {
// Remove processed requests from the queue
this.queue.splice(0, Math.min(leakedCount, this.queue.length));
this.lastLeaked = now;
}
}
// Process requests in the queue
async processQueue() {
this.processing = true;
while (this.queue.length > 0) {
// Calculate time until next request should be processed
const now = Date.now();
const timeSinceLastLeak = (now - this.lastLeaked) / 1000;
const requestsToLeak = Math.floor(timeSinceLastLeak * this.leakRate);
if (requestsToLeak > 0) {
// Process the next request
const request = this.queue.shift();
this.lastLeaked = now;
// Actually process the request (asynchronously)
await processRequest(request);
} else {
// Calculate wait time until next request should be processed
const waitTime = (1 / this.leakRate) * 1000;
await new Promise(resolve => setTimeout(resolve, waitTime));
}
}
this.processing = false;
}
}
// Usage example
const rateLimiter = new LeakyBucket(100, 10); // 100 requests capacity, 10 requests per second
function handleRequest(req, res) {
if (rateLimiter.addRequest({ req, res })) {
// Request accepted into the queue
// The actual processing will happen in the processQueue method
} else {
// Return 429 Too Many Requests
res.status(429).json({
error: 'Rate limit exceeded, try again later'
});
}
}
async function processRequest({ req, res }) {
// Process the request and send the response
const result = await someAsyncOperation(req.body);
res.json(result);
}
Advantages:
- Ensures a consistent outflow rate of requests
- Great for smoothing out traffic and preventing downstream overload
- Works well for queue-based processing systems
- Can be implemented as a true queue (FIFO) for fair processing
Disadvantages:
- More complex to implement
- Can introduce latency (requests wait in the queue)
- Requires more resources to maintain the queue
- May not be suitable for real-time systems with low latency requirements
Adaptive Rate Limiting
An advanced approach that dynamically adjusts rate limits based on system health, current load, or other factors.
How it works:
- Monitor system metrics (CPU, memory, error rates, response times)
- Adjust rate limits dynamically based on current conditions
- May incorporate machine learning to predict capacity
- Can prioritize certain users or request types during high load
// Simplified adaptive rate limiter example
class AdaptiveRateLimiter {
constructor(baseLimit, minLimit, maxLimit) {
this.baseLimit = baseLimit; // Normal request limit
this.minLimit = minLimit; // Minimum limit during high load
this.maxLimit = maxLimit; // Maximum limit during low load
this.currentLimit = baseLimit;// Current active limit
// Start monitoring system resources
this.startMonitoring();
}
// Monitor system health and adjust limits
startMonitoring() {
setInterval(() => {
// Get current system metrics
const cpuUsage = this.getCpuUsage();
const memoryUsage = this.getMemoryUsage();
const errorRate = this.getErrorRate();
const responseTime = this.getAverageResponseTime();
// Adjust rate limit based on system health
this.adjustRateLimit(cpuUsage, memoryUsage, errorRate, responseTime);
}, 5000); // Check every 5 seconds
}
// Adjust rate limit based on system metrics
adjustRateLimit(cpuUsage, memoryUsage, errorRate, responseTime) {
// Create a "health score" from 0 to 1 (0 = unhealthy, 1 = very healthy)
const healthScore = this.calculateHealthScore(cpuUsage, memoryUsage, errorRate, responseTime);
// Adjust the rate limit based on health score
const range = this.maxLimit - this.minLimit;
this.currentLimit = Math.floor(this.minLimit + (range * healthScore));
console.log(`System health: ${healthScore.toFixed(2)}, New rate limit: ${this.currentLimit} req/min`);
}
// Calculate a health score based on multiple metrics
calculateHealthScore(cpuUsage, memoryUsage, errorRate, responseTime) {
// This is a simplified example - real implementations would be more sophisticated
// Convert each metric to a score between 0 and 1
const cpuScore = 1 - (cpuUsage / 100); // 0% CPU = 1, 100% CPU = 0
const memScore = 1 - (memoryUsage / 100); // 0% Mem = 1, 100% Mem = 0
// Error rate (e.g., 0% = 1, 5%+ = 0)
const errorScore = Math.max(0, 1 - (errorRate / 5));
// Response time (e.g., 100ms = 1, 1000ms = 0)
const responseScore = Math.max(0, 1 - ((responseTime - 100) / 900));
// Weighted average of all scores
return (cpuScore * 0.4) + (memScore * 0.2) + (errorScore * 0.2) + (responseScore * 0.2);
}
// Check if a request should be allowed
checkRateLimit(userId) {
// Get the current count for this user
const userCount = this.getUserRequestCount(userId);
// Check against the current adaptive limit
if (userCount < this.currentLimit) {
this.incrementUserCount(userId);
return true; // Allow request
}
return false; // Reject request
}
// Example methods to get system metrics
getCpuUsage() {
// In a real implementation, this would get actual CPU usage
// For this example, we'll simulate varying load
return Math.random() * 60 + 20; // 20% to 80%
}
getMemoryUsage() {
// Simulated memory usage
return Math.random() * 50 + 30; // 30% to 80%
}
getErrorRate() {
// Simulated error rate (percentage of requests)
return Math.random() * 3; // 0% to 3%
}
getAverageResponseTime() {
// Simulated response time in milliseconds
return Math.random() * 500 + 200; // 200ms to 700ms
}
// User tracking methods would be implemented here
getUserRequestCount(userId) { /* ... */ }
incrementUserCount(userId) { /* ... */ }
}
// Usage
const adaptiveLimiter = new AdaptiveRateLimiter(100, 50, 200); // Base: 100, Min: 50, Max: 200 req/min
function handleRequest(req, res) {
const userId = req.user.id;
if (adaptiveLimiter.checkRateLimit(userId)) {
// Process the request
processRequest(req, res);
} else {
// Return 429 Too Many Requests
res.status(429).json({
error: 'Rate limit exceeded, please slow down'
});
}
}
Advantages:
- Automatically adjusts to changing system conditions
- Can maximize throughput while preventing overload
- More efficient use of resources during varying load
- Can incorporate business priorities (e.g., prioritize paying customers)
Disadvantages:
- Significantly more complex to implement and maintain
- Requires monitoring infrastructure
- Harder to predict and communicate limits to API consumers
- May require tuning and calibration
Implementing Rate Limiting in Different Environments
Rate limiting can be implemented at various levels in your architecture. Let's look at different approaches:
Application-Level Rate Limiting
Implementing rate limiting directly in your application code.
Express.js (Node.js)
// Using express-rate-limit package
const rateLimit = require('express-rate-limit');
const app = express();
// Create a rate limiter middleware
const apiLimiter = rateLimit({
windowMs: 15 * 60 * 1000, // 15 minutes
max: 100, // limit each IP to 100 requests per windowMs
standardHeaders: true, // Return rate limit info in the `RateLimit-*` headers
legacyHeaders: false, // Disable the `X-RateLimit-*` headers
message: 'Too many requests from this IP, please try again after 15 minutes'
});
// Apply rate limiting to all API routes
app.use('/api/', apiLimiter);
// Different rate limits for different endpoints
const authLimiter = rateLimit({
windowMs: 60 * 60 * 1000, // 1 hour window
max: 5, // start blocking after 5 requests
message: 'Too many login attempts, please try again after an hour'
});
// Apply to authentication endpoints
app.use('/api/auth/', authLimiter);
Django (Python)
# Using django-ratelimit
from django.shortcuts import render
from django_ratelimit.decorators import ratelimit
# Rate limit based on IP address
@ratelimit(key='ip', rate='10/m')
def my_view(request):
# View logic here
return render(request, 'template.html')
# Rate limit based on user ID (authenticated users)
@ratelimit(key='user', rate='100/h')
def user_api_view(request):
# API logic here
return JsonResponse({'data': 'result'})
# Custom rate limiting key (e.g., by API key)
def get_api_key(request):
return request.META.get('HTTP_X_API_KEY', '')
@ratelimit(key=get_api_key, rate='1000/d')
def api_endpoint(request):
# API logic here
return JsonResponse({'status': 'success'})
Laravel (PHP)
// Using Laravel's built-in rate limiting
Route::middleware('throttle:60,1')->group(function () {
Route::get('/api/endpoint', function () {
// Endpoint logic
});
});
// Rate limiting with different limits for guests vs. authenticated users
Route::middleware('throttle:10|60,1')->group(function () {
Route::post('/api/comments', 'CommentController@store');
});
// Custom rate limiter in RouteServiceProvider.php
public function boot()
{
RateLimiter::for('api', function (Request $request) {
return Limit::perMinute(60)->by(optional($request->user())->id ?: $request->ip());
});
// Custom limiter with exponential backoff
RateLimiter::for('uploads', function (Request $request) {
return Limit::perDay(100)
->by($request->user()->id)
->response(function () {
return response('Upload limit exceeded, please try again tomorrow.', 429);
});
});
}
Database-Level Rate Limiting
Using a database to track and enforce rate limits, useful for distributed applications.
Redis-Based Implementation
// Redis-based rate limiting using the fixed window algorithm
const redis = require('redis');
const client = redis.createClient();
async function checkRateLimit(userId, limit, windowSizeInSeconds) {
const key = `ratelimit:${userId}:${Math.floor(Date.now() / (windowSizeInSeconds * 1000))}`;
// Use Redis MULTI to make this atomic
try {
const result = await client.multi()
.incr(key) // Increment the counter
.expire(key, windowSizeInSeconds) // Set expiration
.exec(); // Execute as transaction
const count = result[0];
// If count exceeds limit, rate limit is hit
return count <= limit;
} catch (error) {
console.error('Redis error:', error);
// In case of error, default to allowing the request
return true;
}
}
// Example usage in Express
app.use(async (req, res, next) => {
const userId = req.user ? req.user.id : req.ip;
const allowed = await checkRateLimit(userId, 100, 60); // 100 requests per minute
if (allowed) {
next();
} else {
res.status(429).json({ error: 'Rate limit exceeded' });
}
});
PostgreSQL-Based Implementation
-- Create a table for rate limiting
CREATE TABLE rate_limits (
id SERIAL PRIMARY KEY,
key VARCHAR(255) NOT NULL,
count INTEGER NOT NULL DEFAULT 1,
window_start TIMESTAMP NOT NULL,
UNIQUE(key, window_start)
);
-- Create index for efficient lookups
CREATE INDEX rate_limits_key_window ON rate_limits(key, window_start);
-- Function to check and increment rate limit
CREATE OR REPLACE FUNCTION check_rate_limit(
p_key VARCHAR,
p_limit INTEGER,
p_window_size_minutes INTEGER
) RETURNS BOOLEAN AS $
DECLARE
v_count INTEGER;
v_window_start TIMESTAMP;
BEGIN
-- Calculate the start of the current window
v_window_start := date_trunc('minute', now())
- (date_part('minute', now()) % p_window_size_minutes) * interval '1 minute';
-- Try to increment the counter
INSERT INTO rate_limits (key, count, window_start)
VALUES (p_key, 1, v_window_start)
ON CONFLICT (key, window_start) DO UPDATE
SET count = rate_limits.count + 1
RETURNING count INTO v_count;
-- Clean up old entries (optional, can also be done by a separate job)
DELETE FROM rate_limits
WHERE window_start < now() - interval '1 day';
-- Return whether the request is allowed
RETURN v_count <= p_limit;
END;
$ LANGUAGE plpgsql;
-- Example usage
SELECT check_rate_limit('user:123', 100, 60); -- 100 requests per 60-minute window
Infrastructure-Level Rate Limiting
Implementing rate limiting at the infrastructure level, such as in an API gateway, load balancer, or reverse proxy.
Nginx Rate Limiting
# Nginx configuration for rate limiting
http {
# Define a limit zone based on client IP address
limit_req_zone $binary_remote_addr zone=api_limit:10m rate=10r/s;
server {
location /api/ {
# Apply rate limiting with a burst of 20 requests
limit_req zone=api_limit burst=20 nodelay;
# Proxy to your application
proxy_pass http://backend_server;
}
# Different limits for different endpoints
location /api/auth/ {
limit_req zone=api_limit burst=5 nodelay;
proxy_pass http://backend_server;
}
# Include rate limit information in response headers
add_header X-RateLimit-Limit 10;
add_header X-RateLimit-Remaining $remaining;
}
}
AWS API Gateway
// AWS CloudFormation template for API Gateway with rate limiting
{
"Resources": {
"MyApi": {
"Type": "AWS::ApiGateway::RestApi",
"Properties": {
"Name": "Rate Limited API"
}
},
"MyApiStage": {
"Type": "AWS::ApiGateway::Stage",
"Properties": {
"DeploymentId": { "Ref": "ApiDeployment" },
"RestApiId": { "Ref": "MyApi" },
"StageName": "prod",
"MethodSettings": [
{
"ResourcePath": "/*",
"HttpMethod": "*",
"ThrottlingRateLimit": 100,
"ThrottlingBurstLimit": 50
}
]
}
},
"ApiUsagePlan": {
"Type": "AWS::ApiGateway::UsagePlan",
"Properties": {
"ApiStages": [
{
"ApiId": { "Ref": "MyApi" },
"Stage": { "Ref": "MyApiStage" }
}
],
"Description": "Rate limits for API",
"Quota": {
"Limit": 10000,
"Period": "MONTH"
},
"Throttle": {
"RateLimit": 10,
"BurstLimit": 20
}
}
}
}
}
Kong API Gateway
# Kong rate limiting plugin configuration
curl -X POST http://kong:8001/services/my-service/plugins \
--data "name=rate-limiting" \
--data "config.minute=100" \
--data "config.hour=1000" \
--data "config.policy=local"
# Advanced rate limiting with Redis
curl -X POST http://kong:8001/services/my-service/plugins \
--data "name=rate-limiting" \
--data "config.minute=100" \
--data "config.limit_by=credential" \
--data "config.policy=redis" \
--data "config.redis_host=redis-server" \
--data "config.redis_port=6379" \
--data "config.redis_database=0"
Rate Limiting Design Patterns and Best Practices
Granularity of Rate Limits
Choose the appropriate level of granularity for your rate limits:
- By IP Address:
- Simplest to implement
- Problematic for users behind shared IPs (NAT, proxies)
- Can be circumvented by IP rotation
- By User/Account:
- More accurate for authenticated users
- Doesn't affect anonymous users sharing IPs
- Requires user authentication
- By API Key:
- Ideal for B2B APIs
- Allows for different tiers of service
- Good for monetization models
- By Resource Type:
- Different limits for different API endpoints
- Can protect specific critical resources
- More complex to implement and maintain
- By Combination:
- Combining multiple factors (e.g., user + endpoint)
- Most flexible and precise
- Highest implementation complexity
Multiple Tiers of Rate Limits
Implement multiple tiers of rate limiting for defense in depth:
// Example of multi-tier rate limiting in Express
const globalLimiter = rateLimit({
windowMs: 60 * 1000, // 1 minute
max: 1000, // 1000 requests per minute across all routes
});
const apiLimiter = rateLimit({
windowMs: 60 * 1000, // 1 minute
max: 500, // 500 requests per minute for API routes
keyGenerator: (req) => req.ip // Limit by IP
});
const userLimiter = rateLimit({
windowMs: 60 * 1000, // 1 minute
max: 100, // 100 requests per minute per user
keyGenerator: (req) => req.user ? req.user.id : req.ip // Limit by user ID if authenticated
});
// Apply limits in sequence
app.use(globalLimiter); // Applied to all routes
app.use('/api', apiLimiter); // Applied to all API routes
app.use('/api/user', userLimiter); // Applied to user-specific routes
Rate Limiting Response Headers
Include standardized rate limit information in response headers:
// Example of setting rate limit headers in Express
function setRateLimitHeaders(req, res, limit, remaining, reset) {
res.setHeader('RateLimit-Limit', limit); // Total requests allowed in window
res.setHeader('RateLimit-Remaining', remaining); // Requests remaining in current window
res.setHeader('RateLimit-Reset', reset); // Timestamp when the window resets
// Include Retry-After when rate limited (429 response)
if (remaining <= 0) {
const retryAfter = Math.ceil((reset - Date.now()) / 1000);
res.setHeader('Retry-After', retryAfter);
}
}
// Usage in a middleware
app.use(async (req, res, next) => {
const userId = req.user ? req.user.id : req.ip;
// Get rate limit info
const { allowed, limit, remaining, reset } = await getRateLimitInfo(userId);
// Set headers
setRateLimitHeaders(req, res, limit, remaining, reset);
if (allowed) {
next();
} else {
res.status(429).json({
error: 'Rate limit exceeded',
retryAfter: Math.ceil((reset - Date.now()) / 1000)
});
}
});
Handling Rate Limit Exceeding
When a client exceeds the rate limit, follow these best practices:
- Use Correct Status Code: Return HTTP 429 Too Many Requests
- Include Retry-After Header: Indicate when the client can retry
- Provide Clear Error Messages: Explain why the request was rejected
- Log Rate Limit Violations: Monitor for abuse patterns
- Consider Graduated Response: More aggressive throttling for repeated violators
// Example of a 429 Too Many Requests response
HTTP/1.1 429 Too Many Requests
Content-Type: application/json
RateLimit-Limit: 100
RateLimit-Remaining: 0
RateLimit-Reset: 1618884000
Retry-After: 60
{
"error": {
"code": "rate_limit_exceeded",
"message": "You have exceeded the rate limit of 100 requests per hour",
"retryAfter": 60,
"documentation": "https://api.example.com/docs/rate-limits"
}
}
Distributed Rate Limiting
For applications running on multiple servers, use a centralized store for rate limiting:
// Distributed rate limiting with Redis
const Redis = require('ioredis');
const redis = new Redis({
host: process.env.REDIS_HOST,
port: process.env.REDIS_PORT
});
async function checkDistributedRateLimit(key, limit, windowSizeInSeconds) {
// Use Redis Lua script for atomic operations
const lua = `
local current = redis.call("INCR", KEYS[1])
if current == 1 then
redis.call("EXPIRE", KEYS[1], ARGV[1])
end
return current
`;
const redisKey = `ratelimit:${key}:${Math.floor(Date.now() / (windowSizeInSeconds * 1000))}`;
try {
// Execute the Lua script
const result = await redis.eval(lua, 1, redisKey, windowSizeInSeconds);
const count = parseInt(result);
// Calculate the rate limit info
return {
allowed: count <= limit,
limit: limit,
remaining: Math.max(0, limit - count),
reset: Math.floor(Date.now() / 1000) + windowSizeInSeconds
};
} catch (error) {
console.error('Redis error:', error);
// In case of error, default to allowing the request
return {
allowed: true,
limit: limit,
remaining: limit - 1,
reset: Math.floor(Date.now() / 1000) + windowSizeInSeconds
};
}
}
Bypass Mechanisms for Critical Operations
Create mechanisms to bypass rate limits for certain scenarios:
- Emergency operations
- System administrators
- Critical business functions
- Health checks and monitoring
// Example of rate limit bypass for certain users/operations
function shouldBypassRateLimit(req) {
// Bypass for internal system users
if (req.user && req.user.role === 'system') {
return true;
}
// Bypass for health checks
if (req.path === '/health' && req.ip === process.env.MONITORING_SERVER_IP) {
return true;
}
// Bypass for critical operations with special header
if (
req.headers['x-critical-operation'] === process.env.CRITICAL_OPERATION_KEY &&
req.path.startsWith('/api/critical/')
) {
// Log the bypass for auditing
logger.info('Rate limit bypassed for critical operation', {
userId: req.user ? req.user.id : 'anonymous',
path: req.path,
ip: req.ip
});
return true;
}
return false;
}
// Usage in rate limiting middleware
app.use((req, res, next) => {
if (shouldBypassRateLimit(req)) {
return next();
}
// Apply normal rate limiting
rateLimiter(req, res, next);
});
Advanced Rate Limiting Techniques
Client-Side Rate Limiting
Implement rate limiting on the client-side to prevent unnecessary requests:
// JavaScript client-side throttling example
class APIClient {
constructor(baseUrl, requestsPerMinute) {
this.baseUrl = baseUrl;
this.tokenBucket = {
tokens: requestsPerMinute,
capacity: requestsPerMinute,
lastRefill: Date.now(),
refillRate: requestsPerMinute / 60000 // Tokens per millisecond
};
this.requestQueue = [];
this.processing = false;
}
async request(endpoint, options = {}) {
return new Promise((resolve, reject) => {
// Add to queue
this.requestQueue.push({ endpoint, options, resolve, reject });
// Start processing if not already running
if (!this.processing) {
this.processQueue();
}
});
}
async processQueue() {
this.processing = true;
while (this.requestQueue.length > 0) {
// Refill tokens
this.refillTokens();
// If we have tokens, process the next request
if (this.tokenBucket.tokens >= 1) {
const { endpoint, options, resolve, reject } = this.requestQueue.shift();
// Consume a token
this.tokenBucket.tokens -= 1;
try {
// Make the actual request
const response = await fetch(`${this.baseUrl}${endpoint}`, options);
// Check for 429 to potentially backoff
if (response.status === 429) {
const retryAfter = response.headers.get('Retry-After');
const waitTime = retryAfter ? parseInt(retryAfter) * 1000 : 60000;
console.warn(`Rate limited by server, waiting ${waitTime}ms before next request`);
await new Promise(r => setTimeout(r, waitTime));
}
resolve(response);
} catch (error) {
reject(error);
}
} else {
// Calculate time until next token is available
const waitTime = this.getWaitTime();
await new Promise(r => setTimeout(r, waitTime));
}
}
this.processing = false;
}
refillTokens() {
const now = Date.now();
const timePassed = now - this.tokenBucket.lastRefill;
if (timePassed > 0) {
// Calculate new tokens based on time passed
const newTokens = timePassed * this.tokenBucket.refillRate;
// Add tokens up to capacity
this.tokenBucket.tokens = Math.min(
this.tokenBucket.capacity,
this.tokenBucket.tokens + newTokens
);
// Update last refill time
this.tokenBucket.lastRefill = now;
}
}
getWaitTime() {
const tokensNeeded = 1 - this.tokenBucket.tokens;
return Math.ceil(tokensNeeded / this.tokenBucket.refillRate);
}
}
// Usage
const api = new APIClient('https://api.example.com', 60); // 60 requests per minute
// Make requests that will be automatically rate limited
async function fetchUserData(userId) {
return api.request(`/users/${userId}`);
}
Rate Limiting with Priority Queues
Implement a priority system for requests during high load:
// Priority-based rate limiting example
class PriorityRateLimiter {
constructor() {
// Separate buckets for different priority levels
this.highPriorityBucket = new TokenBucket(100, 10); // 100 capacity, 10 tokens/sec
this.mediumPriorityBucket = new TokenBucket(50, 5); // 50 capacity, 5 tokens/sec
this.lowPriorityBucket = new TokenBucket(20, 2); // 20 capacity, 2 tokens/sec
// Priority queues
this.highPriorityQueue = [];
this.mediumPriorityQueue = [];
this.lowPriorityQueue = [];
// Start processing
this.processQueues();
}
enqueueRequest(request, priority = 'medium') {
switch (priority) {
case 'high':
this.highPriorityQueue.push(request);
break;
case 'medium':
this.mediumPriorityQueue.push(request);
break;
case 'low':
this.lowPriorityQueue.push(request);
break;
default:
this.mediumPriorityQueue.push(request);
}
}
async processQueues() {
while (true) {
// Process high priority first
if (this.highPriorityQueue.length > 0 && this.highPriorityBucket.tryConsume()) {
const request = this.highPriorityQueue.shift();
this.processRequest(request);
}
// Then medium priority
else if (this.mediumPriorityQueue.length > 0 && this.mediumPriorityBucket.tryConsume()) {
const request = this.mediumPriorityQueue.shift();
this.processRequest(request);
}
// Finally low priority
else if (this.lowPriorityQueue.length > 0 && this.lowPriorityBucket.tryConsume()) {
const request = this.lowPriorityQueue.shift();
this.processRequest(request);
}
// No requests or no tokens, wait a bit
else {
await new Promise(resolve => setTimeout(resolve, 10));
}
}
}
processRequest(request) {
// Process the request
console.log(`Processing ${request.priority} priority request: ${request.id}`);
// In a real implementation, you'd handle the actual request here
request.process();
}
}
// Token bucket implementation (from earlier)
class TokenBucket {
// ... (same as before)
}
// Usage in an API server
const priorityLimiter = new PriorityRateLimiter();
app.use((req, res, next) => {
// Determine priority based on request or user
let priority = 'medium'; // Default
if (req.user && req.user.isPremium) {
priority = 'high';
} else if (req.path.includes('/admin/')) {
priority = 'high';
} else if (req.path.includes('/metrics/')) {
priority = 'low';
}
// Create a request object
const request = {
id: req.id,
priority,
process: () => {
// Continue with the request
next();
}
};
// Enqueue the request with appropriate priority
priorityLimiter.enqueueRequest(request, priority);
});
Intelligent/Dynamic Rate Limiting
Implement dynamic rate limiting based on user behavior patterns:
- Reputation-Based: Users with good history get higher limits
- Anomaly Detection: Detect and block suspicious request patterns
- Machine Learning: Predict and adjust limits based on historical usage
- Time-of-Day Adjustments: Different limits during peak vs. off-peak hours
// Example of reputation-based rate limiting
class ReputationRateLimiter {
constructor(redisClient) {
this.redis = redisClient;
this.baseLimit = 100; // Base requests per hour
this.maxLimit = 500; // Maximum possible limit
this.minLimit = 20; // Minimum possible limit
}
async getUserLimit(userId) {
// Get user reputation score (0-100)
const reputationScore = await this.getUserReputation(userId);
// Calculate limit based on reputation
// 0 reputation = minLimit, 100 reputation = maxLimit
const reputationFactor = reputationScore / 100;
const limit = Math.round(this.minLimit + (this.maxLimit - this.minLimit) * reputationFactor);
return Math.min(Math.max(limit, this.minLimit), this.maxLimit);
}
async checkRateLimit(userId) {
// Get the dynamic limit for this user
const userLimit = await this.getUserLimit(userId);
// Get the current hour key
const hourKey = `ratelimit:${userId}:${Math.floor(Date.now() / 3600000)}`;
// Increment and check
const count = await this.redis.incr(hourKey);
// Set expiry if new key
if (count === 1) {
await this.redis.expire(hourKey, 3600);
}
// Return rate limit info
return {
allowed: count <= userLimit,
limit: userLimit,
remaining: Math.max(0, userLimit - count),
reset: Math.floor(Date.now() / 1000) + (3600 - (Date.now() / 1000) % 3600)
};
}
async updateUserReputation(userId, event) {
const reputationKey = `reputation:${userId}`;
// Different events affect reputation differently
switch (event) {
case 'rate_limit_exceeded':
// Reduce reputation when user hits limits
await this.redis.decrby(reputationKey, 5);
break;
case 'successful_request':
// Slowly build reputation with successful requests
await this.redis.incr(reputationKey);
break;
case 'api_abuse_detected':
// Significantly reduce reputation for abuse
await this.redis.decrby(reputationKey, 20);
break;
case 'payment_success':
// Reward paying customers
await this.redis.incrby(reputationKey, 10);
break;
}
// Ensure reputation stays between 0 and 100
await this.redis.get(reputationKey).then(async (score) => {
if (score < 0) await this.redis.set(reputationKey, 0);
if (score > 100) await this.redis.set(reputationKey, 100);
});
}
async getUserReputation(userId) {
const score = await this.redis.get(`reputation:${userId}`);
if (!score) {
// New user starts with a moderate reputation
await this.redis.set(`reputation:${userId}`, 50);
return 50;
}
return parseInt(score);
}
}
// Usage in an API server
app.use(async (req, res, next) => {
const userId = req.user ? req.user.id : req.ip;
const { allowed, limit, remaining, reset } = await rateLimiter.checkRateLimit(userId);
// Set rate limit headers
res.setHeader('RateLimit-Limit', limit);
res.setHeader('RateLimit-Remaining', remaining);
res.setHeader('RateLimit-Reset', reset);
if (allowed) {
// Update reputation on successful request
await rateLimiter.updateUserReputation(userId, 'successful_request');
next();
} else {
// Update reputation on rate limit exceeded
await rateLimiter.updateUserReputation(userId, 'rate_limit_exceeded');
// Set retry-after header
const retryAfter = Math.ceil(reset - (Date.now() / 1000));
res.setHeader('Retry-After', retryAfter);
res.status(429).json({
error: 'Rate limit exceeded',
retryAfter: retryAfter
});
}
});
Monitoring and Analytics for Rate Limiting
Implement monitoring to understand rate limiting effectiveness and optimize limits.
Key Metrics to Track
- Rate Limit Hit Ratio: Percentage of requests that hit rate limits
- Near-Limit Users: Users consistently approaching their limits
- Limit Utilization: How much of allocated capacity is being used
- Throttled Request Patterns: Temporal patterns of rate limit hits
- Response Time Impact: How rate limiting affects overall response times
// Example of rate limit monitoring with Prometheus
const promClient = require('prom-client');
// Create metrics
const rateLimitHits = new promClient.Counter({
name: 'api_rate_limit_hits_total',
help: 'Total number of requests that hit rate limits',
labelNames: ['endpoint', 'user_type']
});
const rateLimitApproaches = new promClient.Counter({
name: 'api_rate_limit_approaches_total',
help: 'Total number of requests that approached rate limits (>80%)',
labelNames: ['endpoint', 'user_type']
});
const rateLimitUtilization = new promClient.Gauge({
name: 'api_rate_limit_utilization_ratio',
help: 'Ratio of used capacity to total capacity',
labelNames: ['endpoint', 'user_type']
});
const requestDuration = new promClient.Histogram({
name: 'api_request_duration_seconds',
help: 'Duration of API requests',
labelNames: ['endpoint', 'rate_limited'],
buckets: [0.01, 0.05, 0.1, 0.5, 1, 2, 5]
});
// Middleware to track rate limit metrics
app.use((req, res, next) => {
const endpoint = req.path;
const userType = req.user ? req.user.type : 'anonymous';
// Start timer for request duration
const end = requestDuration.startTimer({ endpoint });
// Track original end method
const originalEnd = res.end;
// Override end method to capture metrics
res.end = function(...args) {
// Complete the request duration timer
end({ rate_limited: res.statusCode === 429 });
// Get rate limit info from headers
const limit = parseInt(res.getHeader('RateLimit-Limit') || 0);
const remaining = parseInt(res.getHeader('RateLimit-Remaining') || 0);
if (limit > 0) {
// Calculate utilization
const utilization = (limit - remaining) / limit;
rateLimitUtilization.set({ endpoint, userType }, utilization);
// Check if approaching limit
if (utilization >= 0.8) {
rateLimitApproaches.inc({ endpoint, userType });
}
// Check if hit limit
if (res.statusCode === 429) {
rateLimitHits.inc({ endpoint, userType });
}
}
originalEnd.apply(res, args);
};
next();
});
// Expose metrics endpoint for Prometheus
app.get('/metrics', (req, res) => {
res.set('Content-Type', promClient.register.contentType);
res.end(promClient.register.metrics());
});
Visualizing Rate Limit Data
Create dashboards to visualize rate limit metrics:
Using Analytics to Optimize Limits
Use data to optimize your rate limiting strategy:
- Identify users who consistently hit limits (may need higher tier)
- Find endpoints with excessive rate limit hits
- Adjust limits based on actual usage patterns
- Create custom plans for power users
- Implement progressive rate limiting for new users
Case Studies and Real-World Examples
GitHub API Rate Limiting
GitHub API implements a tiered rate limiting approach:
- Unauthenticated requests: 60 requests per hour
- Authenticated requests: 5,000 requests per hour
- GitHub Apps: 5,000 requests per hour per installation
- Search API: 30 requests per minute
They include detailed rate limit information in response headers:
HTTP/1.1 200 OK
Date: Mon, 01 Jul 2023 17:27:06 GMT
Status: 200 OK
X-RateLimit-Limit: 5000
X-RateLimit-Remaining: 4999
X-RateLimit-Reset: 1688231226
X-RateLimit-Used: 1
X-RateLimit-Resource: core
GitHub also provides a dedicated rate limit endpoint (/rate_limit) where users can check their current rate limit status without using up a request.
Twitter API Rate Limiting
Twitter (now X) uses a complex, endpoint-specific rate limiting strategy:
- Timeline endpoints: 15-900 requests per 15-minute window
- Tweet posting endpoints: 200-300 tweets per 3-hour window
- DM endpoints: 15-1000 requests per 15-minute window
- Different limits for free vs. paid developer accounts
- Rate limiting is per user access token, not per application
Twitter's API returns specific HTTP status codes and headers when rate limits are hit:
HTTP/1.1 429 Too Many Requests
content-length: 181
content-type: application/json
x-rate-limit-limit: 15
x-rate-limit-remaining: 0
x-rate-limit-reset: 1616559502
{
"errors": [
{
"code": 88,
"message": "Rate limit exceeded"
}
]
}
Stripe API Rate Limiting
Stripe uses an adaptive rate limiting approach:
- No hard published limits (they adapt based on account history)
- Sustained rate limits for regular API usage
- Burst rate limits for short-term spikes
- Per-account rather than per-key rate limiting
- Different limits for live vs. test mode
When a rate limit is hit, Stripe responds with a 429 status code and a specific error type:
HTTP/1.1 429 Too Many Requests
Content-Type: application/json
{
"error": {
"type": "rate_limit_error",
"message": "Too many requests in a period of time",
"code": "rate_limit"
}
}
Cloudflare Rate Limiting
Cloudflare offers rate limiting at the edge for websites:
- Block or challenge visitors exceeding thresholds
- Custom response options (status code, body, headers)
- Configurable count keys (IP, cookies, tokens, etc.)
- Supports both rate limiting and concurrency limiting
- Multiple configurable rules with different actions
Example Cloudflare rate limiting configuration:
// Cloudflare rate limiting rule
{
"description": "Login rate limiting",
"disabled": false,
"expressions": [
"(http.request.uri.path eq \"/login\")"
],
"period_seconds": 60,
"requests_per_period": 5,
"action": {
"mode": "challenge"
},
"characteristics": [
"cf.client.ip"
]
}
Practical Activities
Activity 1: Implementing Basic Rate Limiting
Implement a basic rate limiting middleware for an Express application:
- Create a simple Node.js/Express API with a few endpoints
- Implement the fixed window algorithm for rate limiting
- Use an in-memory store to track request counts
- Configure different limits for different endpoints
- Include appropriate headers in responses
- Test with a tool like Apache Bench or Postman
Activity 2: Redis-Based Distributed Rate Limiting
Extend your rate limiting implementation to work in a distributed environment:
- Set up a Redis server or use a cloud-based Redis service
- Modify your rate limiting middleware to use Redis as the store
- Implement atomic operations using Lua scripts or transactions
- Test rate limiting across multiple server instances
- Visualize rate limit metrics using a monitoring tool
Activity 3: Implementing Token Bucket Algorithm
Implement a more sophisticated rate limiting solution:
- Create a token bucket implementation
- Configure different bucket sizes and refill rates
- Implement request queuing for handling traffic spikes
- Add support for different bucket sizes based on user tier
- Create a simple dashboard to visualize token bucket state
Activity 4: Rate Limiting Strategy Design
Design a comprehensive rate limiting strategy for a hypothetical application:
- Choose a real-world application type (e.g., social media, e-commerce, SaaS)
- Identify different user personas and their usage patterns
- Design appropriate rate limits for different endpoints
- Create a tiered plan structure with different limits
- Design monitoring and analytics to track rate limit effectiveness
- Document your rate limiting strategy and communication plan
Additional Resources
Libraries and Tools
- express-rate-limit - Rate limiting middleware for Express
- rate-limiter-flexible - Flexible rate limiter for Node.js
- limiter - Dead simple rate limit middleware for Go
- django-ratelimit - Rate limiting for Django applications
- resilience4j - Rate limiting and fault tolerance library for Java
Articles and Documentation
- Google Cloud: Rate-Limiting Strategies and Techniques
- Cloudflare: What is Rate Limiting?
- Kong: How to Design a Scalable Rate Limiting Algorithm
- Stripe: Scaling your API with rate limiters
- NGINX: Rate Limiting with NGINX
Books
- "Web Scalability for Startup Engineers" by Artur Ejsmont (Chapter on Rate Limiting)
- "Systems Performance: Enterprise and the Cloud" by Brendan Gregg (Performance aspects of throttling)
- "Cloud Native Patterns" by Cornelia Davis (Resilience patterns including rate limiting)
- "Release It!: Design and Deploy Production-Ready Software" by Michael T. Nygard (Stability patterns)