How to Rate Limit an API Endpoint (Algorithms, Code & Best Practices)
Learn the most common rate limiting algorithms — token bucket, sliding window, fixed window — with practical code examples and tips for protecting your API from abuse.
Every API that faces the internet needs rate limiting. Without it, a single misbehaving client — or a bot — can exhaust your compute, inflate your cloud bill, and degrade the experience for every other user. Rate limiting is the seatbelt of API design: you hope you never need it, but you always should have it.
This guide covers the main algorithms, how to implement them, and how managed platforms like API Snap handle rate limiting automatically so you can skip the plumbing entirely.
Why Rate Limiting Matters
Rate limiting isn't just about stopping abuse. It serves several critical functions:
- Cost control — cloud services charge per request or per compute-second. An uncapped endpoint is an uncapped bill.
- Fair access — without limits, one noisy tenant can starve everyone else of resources in a shared system.
- Stability — back-pressure from rate limits prevents cascading failures when downstream services slow down.
- Security — rate limits make brute-force attacks, credential stuffing, and scraping dramatically harder.
The Three Most Common Algorithms
1. Fixed Window
The simplest approach. Pick a time window (e.g., 1 minute), count requests within it, and reject anything over the limit. When the window resets, the counter resets.
// Fixed window — in-memory, single-process example
const windows = new Map<string, { count: number; resetAt: number }>();
function fixedWindowCheck(key: string, limit: number, windowMs: number): boolean {
const now = Date.now();
const entry = windows.get(key);
if (!entry || now >= entry.resetAt) {
windows.set(key, { count: 1, resetAt: now + windowMs });
return true; // allowed
}
if (entry.count < limit) {
entry.count++;
return true;
}
return false; // rate limited
}Downside: burst traffic at the boundary of two windows can allow 2× the intended limit in a short period (the "boundary burst" problem).
2. Sliding Window Log
Instead of fixed boundaries, store the timestamp of every request and count how many fall within the trailing window. This eliminates the boundary burst problem but uses more memory — you're keeping a log per key.
// Sliding window log
const logs = new Map<string, number[]>();
function slidingWindowCheck(key: string, limit: number, windowMs: number): boolean {
const now = Date.now();
const timestamps = logs.get(key) ?? [];
// Remove expired entries
const valid = timestamps.filter((t) => now - t < windowMs);
if (valid.length < limit) {
valid.push(now);
logs.set(key, valid);
return true;
}
logs.set(key, valid);
return false;
}3. Token Bucket
Each client gets a bucket of tokens. Every request costs one token. Tokens refill at a steady rate. This naturally smooths traffic while still allowing short bursts — the bucket can hold more tokens than the per-second rate.
// Token bucket
const buckets = new Map<string, { tokens: number; lastRefill: number }>();
function tokenBucketCheck(
key: string,
maxTokens: number,
refillRate: number // tokens per second
): boolean {
const now = Date.now();
const bucket = buckets.get(key) ?? { tokens: maxTokens, lastRefill: now };
// Refill tokens based on elapsed time
const elapsed = (now - bucket.lastRefill) / 1000;
bucket.tokens = Math.min(maxTokens, bucket.tokens + elapsed * refillRate);
bucket.lastRefill = now;
if (bucket.tokens >= 1) {
bucket.tokens -= 1;
buckets.set(key, bucket);
return true;
}
buckets.set(key, bucket);
return false;
}Token bucket is the most widely used algorithm in production API gateways, including AWS API Gateway and Stripe.
Rate Limiting in Express.js
If you're building an Express API, the express-rate-limit package gives you a fixed-window limiter in two lines:
import rateLimit from "express-rate-limit";
const limiter = rateLimit({
windowMs: 60 * 1000, // 1 minute
max: 100, // 100 requests per window
standardHeaders: true,
legacyHeaders: false,
});
app.use("/api/", limiter);For distributed systems (multiple server instances), swap the default in-memory store for Redis using rate-limit-redis.
What Headers to Return
Good rate limiting is transparent. Use the standard RateLimit headers (RFC 6585 / draft-ietf-httpapi-ratelimit-headers) so clients can adapt:
RateLimit-Limit— the maximum requests per windowRateLimit-Remaining— how many requests are leftRateLimit-Reset— when the window resets (Unix epoch seconds)Retry-After— seconds until the client should retry (on 429 responses)
Skip the Plumbing: Use a Managed API Platform
Building rate limiting from scratch means implementing the algorithm, wiring up Redis, deciding on per-key vs per-IP limits, returning the right headers, and handling edge cases like clock skew in distributed deployments.
Or you can skip all of that. Platforms like API Snap include rate limiting out of the box — every API key gets a quota based on the pricing tier, and usage is tracked per key with proper headers in every response. You focus on the logic your API provides; the platform handles throttling, metering, and abuse prevention.
Best Practices Checklist
- Rate limit by API key, not just IP — IP-based limits break for users behind shared NATs or corporate proxies. Key-based limits are more precise and fair.
- Return 429 with a
Retry-Afterheader — don't just drop the connection. A proper 429 response lets well-behaved clients back off gracefully. - Log rate limit events — track which keys hit limits and how often. This data reveals abuse patterns and helps you tune your thresholds.
- Set different limits for different endpoints — a search endpoint is more expensive than a health check. Weight your limits accordingly.
- Test under load — use tools like
heyork6to verify your rate limiter actually works before production traffic hits it.
Get Started
If you're building an API and don't want to implement rate limiting yourself, create a free API Snap account and let the platform handle it. Browse the API docs to see how rate limiting is built into every endpoint, or jump into the playground to see rate limit headers in action on live requests.