
There’s a number that quietly runs most platforms: the average request rate. It’s the figure that ends up in capacity spreadsheets, the one autoscaling targets settle around, the one everyone feels comfortable with. It’s also the wrong number to design for.
The traffic that hurts you isn’t the average. It’s the launch, the promotion, the match kicking off, the moment everyone arrives at once. Those are the minutes your business actually depends on — and they look nothing like a Tuesday afternoon.
If your system only works at the average, it doesn’t work. It just hasn’t been tested yet.
Start from the peak, work backwards
We size systems from the worst plausible minute, not the typical one. That means answering a few uncomfortable questions early:
- What’s the largest concurrent audience we can realistically expect in the next 12 months?
- How fast does it arrive — a smooth ramp, or a wall of traffic in 30 seconds?
- Which single component sees the sharpest multiplier when that happens?
The last question is the important one. Load is never uniform. A 5x jump in active users can be a 20x jump on the one endpoint that fans out to everyone, and a 50x jump on the database row everybody reads. The spike concentrates. Capacity planning that treats the system as one big average misses exactly where it will break.
Make headroom a number, not a feeling
“We have plenty of headroom” is not a plan. We turn it into a measured budget:
- Load test to failure, not to target. You don’t know your ceiling until you’ve hit it on purpose, in a controlled environment, before the market does.
- Express capacity as a multiple of current peak — “we hold 10x today’s busiest minute” — so it stays meaningful as traffic grows.
- Re-measure after every significant change. Headroom decays. A new feature, a heavier query, an extra synchronous call, and yesterday’s 10x is today’s 4x.
Absorb, shed, degrade — in that order
When the spike exceeds even your planned headroom (and one day it will), the system should fail in a designed sequence rather than falling over:
- Absorb with elasticity — stateless services and horizontal autoscaling that add capacity faster than traffic climbs.
- Shed the non-essential — rate limit, queue, and drop low-value work to protect the critical path.
- Degrade gracefully — serve a slightly staler cache, defer a non-urgent write, show a lighter page. A degraded experience beats an outage every time.
The platforms that survive their big day aren’t the ones that never hit a limit. They’re the ones that hit it gently.
Rehearse before it’s real
The last piece is cultural. We run game days — deliberate, scheduled exercises where we drive synthetic load at production-like systems and watch what bends. The goal isn’t to pass. It’s to find the surprise while it’s cheap: the connection pool that saturates, the retry storm that amplifies a blip into an outage, the dashboard that goes blind at exactly the wrong moment.
By the time the real spike arrives, it should feel familiar. That’s the whole point of engineering for the moment that breaks other systems: when it comes, it’s just another rehearsal you’ve already run.
