Do you have a public API? If you haven't implemented rate limiting, someone is probably using it for scraping, DDoS, or brute force. It's not a matter of 'if' but 'when'. At Meteora Web, we see it every day in projects coming to us: unprotected APIs collapsing under malicious load or burning third-party budgets because a misconfigured client spams requests.
In this hands-on guide, we'll show you what rate limiting is, how to implement it in Node.js, Express, and NGINX, and which throttling strategies to use to protect your APIs without penalizing legitimate users.
What are rate limiting and throttling, and why are they essential for APIs?
Rate limiting controls the number of requests a client can make within a given time window. For example: max 100 requests every 15 minutes per IP. Throttling is a softer variant: instead of blocking, it slows down the response or reduces priority. Analogy: rate limiting is a traffic light that turns red (block), throttling is a speed bump that forces slowing down.
Why do you need them? Protection from abuse, security, fairness, cost. An e-commerce client we worked with had a public catalog API with no limits. A competitor scraped the entire database every 10 seconds for days. Simple rate limiting would have stopped it all. Instead, the server crashed and the client lost sales for hours.
An API without rate limiting is like a bank without doors: sooner or later someone walks in. If you integrate paid third-party APIs (e.g., OpenAI, Stripe), rate limiting prevents you from exceeding your budget due to an infinite loop.
Sponsored Protocol
Action for you: Check if your API has a limit right now. Open a terminal and fire 50 consecutive requests with curl. If you always get 200, you're vulnerable.
How to implement rate limiting in a REST API with Node.js and Express?
The fastest and most robust way to add rate limiting to an Express API is the express-rate-limit middleware. It supports fixed windows, sliding windows with Redis, and standard headers. We use it in production for dozens of projects.
Install the package:
npm install express-rate-limitBasic configuration:
const rateLimit = require('express-rate-limit');
const limiter = rateLimit({
windowMs: 15 * 60 * 1000, // 15 minutes
max: 100, // limit each IP to 100 requests per window
standardHeaders: true,
legacyHeaders: false,
message: { error: 'Too many requests. Please try again later.' }
});
// Apply to all /api/ routes
app.use('/api/', limiter);You can customize per route. For authentication (prone to brute force), use stricter limits:
const authLimiter = rateLimit({
windowMs: 15 * 60 * 1000,
max: 5,
message: { error: 'Too many login attempts. Try again in 15 minutes.' }
});
app.use('/api/auth/login', authLimiter);Important: In multi-server environments, use a shared store like Redis. Example with rate-limit-redis:
Sponsored Protocol
const RedisStore = require('rate-limit-redis');
const redisClient = require('./redis');
const limiter = rateLimit({
store: new RedisStore({
sendCommand: (...args) => redisClient.call(...args),
}),
windowMs: 15 * 60 * 1000,
max: 100,
});Action for you: Add this middleware to your project. Test with a simple script:
for i in {1..110}; do curl -s -o /dev/null -w "%{http_code}\n" http://localhost:3000/api/ | tail -n 5You should see 429 after the hundredth attempt.
Which throttling strategies should you choose for your use case?
Not all APIs have the same traffic pattern. Here are the most common strategies with pros and cons:
- Fixed Window – counts requests in a fixed time interval. Simple but allows bursts at the window boundary. Good to start.
- Sliding Window – tracks timestamps of requests, more precise. We use it almost always in production with Redis.
- Token Bucket – accumulates tokens over time, allows controlled bursts. Ideal for APIs with occasional bursts (e.g., push notifications).
- Leaky Bucket – requests enter a queue and are processed at a constant rate. Useful for backend with limited resources.
At Meteora Web, we chose sliding window with Redis for a client integrating a billing API with usage-based pricing. With a simple token bucket we would exceed the budget during peaks. With sliding window we granularly limited requests, saving hundreds of euros each month.
Sponsored Protocol
How to implement sliding window with Redis manually? You can use the Redis store of express-rate-limit (shown above) or write custom logic with ZADD and ZREMRANGEBYSCORE. Simplified example:
const WINDOW = 60; // seconds
const MAX = 10;
async function rateLimit(clientId) {
const now = Date.now();
const key = `ratelimit:${clientId}`;
const multi = redisClient.multi();
multi.zremrangebyscore(key, 0, now - WINDOW * 1000);
multi.zadd(key, now, now);
multi.zcard(key);
multi.expire(key, WINDOW);
const results = await multi.exec();
const count = results[2][1];
if (count > MAX) {
throw new Error('Rate limit exceeded');
}
}Action for you: If you already have an API in production, move from in-memory store to Redis sliding window. Clients using fixed window often report unexpected bursts. Test the new configuration by simulating load with Artillery.
How to protect APIs with NGINX or an API Gateway?
You don't always want to modify your application code. At the reverse proxy or API Gateway level, you can implement rate limiting transparently. NGINX, which many already use, offers the limit_req module. Configuration example:
http {
limit_req_zone $binary_remote_addr zone=api:10m rate=10r/s;
server {
location /api/ {
limit_req zone=api burst=20 nodelay;
proxy_pass http://backend;
}
}
}Here we allow 10 requests per second per IP, with a burst of 20. Excess requests get 503 (by default). Advantages: no backend code changes, centralized configuration. Disadvantage: less flexibility than application logic (e.g., different limits for authenticated vs anonymous users).
Sponsored Protocol
More structured alternatives are API Gateways like Kong or AWS API Gateway. They're useful for complex architectures, but watch out for vendor lock-in. We prefer solutions that keep control with the client – a well-configured NGINX with Lua scripting can do everything an enterprise gateway does.
Action for you: If you already have an NGINX proxy, add the limit_req directive to your API block. Then test with wrk or siege. Example with wrk:
wrk -t2 -c20 -d10s http://localhost/api/Observe how many 503s you get. Tweak the parameters until you find the sweet spot.
How to monitor and test rate limiting in production?
Rate limiting should never be left unmonitored. Three things to do immediately:
- Add response headers. Express-rate-limit adds them automatically if
standardHeaders: true. You'll getX-RateLimit-Limit,X-RateLimit-Remaining, andRetry-Afteron 429s. NGINX can also expose them withlimit_req_statusand logging. - Log every block. Example with Express: use a custom handler.
- Set alerts. If 429s exceed a threshold, something is wrong (attack or false positive).
Test the system with a load script. Simple command to see behavior locally:
Sponsored Protocol
for i in {1..120}; do curl -s -o /dev/null -w "%{http_code} %{time_total}s\n" http://localhost:3000/api/; done | sort | uniq -cYou should see a mix of 200 and 429. If you get only 200, rate limiting isn't working.
Action for you: Set an alert in Grafana or via webhook when the percentage of 429 exceeds 5% of total requests per minute. If you have an external API, use logs to analyze blocked IPs.
What to do now
Here's your operational checklist to secure your API:
- Identify critical routes (login, registration, search, checkout).
- Choose your strategy – fixed for simplicity, sliding window for production.
- Implement with express-rate-limit (Node.js) or limit_req (NGINX).
- Configure headers and logging for debugging and monitoring.
- Test with load tools (wrk, artillery, autocannon).
- Monitor in production – alert on too many 429s.
If you don't have time or skills to implement it, reach out. At Meteora Web we do this for our clients every week, integrating it into custom architectures. Rate limiting is not a luxury – it's a deadbolt for your API. Don't wait for someone to break it.