Functional Requirements

What is a Rate Limiter?#

A rate limiter is an infrastructure component — not a product, not a user-facing feature. Its job is to sit in the request path and make a single decision on every incoming request: allow it through, or reject it.

Most case studies ask you to design a product. Rate limiter is different. You're designing a piece of machinery that every other system in your architecture depends on. The API gateway calls it. Individual services call it. It has no UI, no users in the traditional sense — just callers who need a fast, consistent allow/block answer.

The why behind it is protecting the system from being overwhelmed — whether by a single misbehaving user hammering an endpoint, a traffic spike that blows past capacity, or a DDoS attempting to take the service down. But the what is simpler: evaluate every request, decide allow or block, tell the caller the result.

Scope decisions made during requirements#

The core action: evaluate every incoming request

Before talking about dimensions or algorithms, the fundamental FR is this: for every request that arrives, the rate limiter must make a binary decision — allow or reject. Everything else (what you're counting, over what time window, with what algorithm) is implementation detail that comes later.

Multiple limiting dimensions

This is where most candidates only say "per user" and stop. A production rate limiter needs to support several dimensions, because different problems require different granularity:

Per user (by user ID or IP) — the most obvious one. One user is hammering your API. You want to limit that user without affecting anyone else. User ID is preferred when the user is authenticated. IP address is the fallback for unauthenticated requests, though IP-based limiting has problems with shared IPs (corporate NAT, university networks where hundreds of users share one IP).
Per endpoint — /login and /search have completely different risk profiles. Login should be heavily limited (5 attempts per minute) to prevent brute-force attacks. Search can be more generous (100 per minute). A global limit can't capture this — you need per-endpoint rules.
Per user per endpoint — the combination. User A can hit /search 100 times per minute, but User A can only hit /login 5 times per minute. This is the most granular and most useful dimension for protecting specific sensitive endpoints per user.
Global (all users, all endpoints) — a ceiling for total system throughput. If your backend can handle 50,000 req/sec, you want a global limit that kicks in before the system falls over, regardless of which users or endpoints are generating the load.

Retry-after: blocking must include when to retry

When a request is rejected, the caller needs to know when they can try again. Without this, clients either give up immediately or retry in a tight loop — both bad outcomes. The rate limiter must return the time-until-retry alongside the rejection decision. This becomes a header in the API response (Retry-After), but the behavior itself is a functional requirement — the system promises to communicate recovery time, not just block silently.

Rules are configurable from outside

The rate limiter can't have hardcoded limits — that would require a deployment every time you want to change a threshold. Admins need to be able to configure rules: "for endpoint /login, limit each user to 5 requests per minute." Different endpoints get different rules, different user tiers might get different limits (free vs paid users), and these need to change at runtime without restarting the service.

Where it lives: outside the stateless app servers

Rate limiting state (how many requests has this user made in the last minute?) cannot live inside individual app servers. App servers are stateless and horizontally scaled — if User A hits Server 1 for 3 requests and Server 2 for 3 requests, neither server alone sees a limit violation. The state must be centralized, outside the app servers, so every server checks the same counter. This points directly toward a standalone rate limiter service backed by a shared store (Redis), sitting between the API gateway and the app servers.

Final Functional Requirements#

1. Evaluate every incoming request — allow it or reject it

2. Support multiple limiting dimensions:
     - Per user (by user ID or IP)
     - Per endpoint
     - Per user per endpoint
     - Global (all users, all endpoints)

3. When a request is rejected, return the time until
   the client can retry (Retry-After)

4. Rules are configurable — admins can set different limits
   per dimension
   (e.g. /login → 5 req/min per user, /search → 100 req/min per user)

Interview framing

"Core FR: evaluate every incoming request and make an allow/block decision. Support multiple limiting dimensions — per user, per endpoint, per user-per-endpoint, and global. When blocking, return retry-after so clients know when to try again. Rules must be admin-configurable at runtime — different limits for different endpoints and user tiers. The rate limiter must live outside stateless app servers because the state (request counts) has to be shared across all server instances."