Watching a race condition in slow motion

Two people open the seat map at the same instant. Both see seat 12 as free. Both pay. Both get a confirmation email. One seat, two owners, and nowhere in the logs is there a single error to find. That collision is the entire reason this project exists: to take an invisible distributed-systems bug and make it something you can watch happen.

01The silent oversell

Most bugs announce themselves. A null dereferences, a request 500s, a queue backs up. Something, somewhere, goes red. The read-then-write race does none of that. Under a single Redis key tracking a seat, the check (is it free?) and the write (take it) are two separate round-trips, and in the gap between them another buyer can slip in. Both buyers passed the check honestly. Both writes succeeded honestly. The system did exactly what the code told it to, and the result is wrong anyway.

It stays hidden because a normal test never reproduces it. You click the seat, it works. You write an integration test, it passes. The bug only surfaces when thousands of requests arrive in the same few milliseconds, all reaching for the same few hundred seats, and by then your customers are the ones discovering it, at the turnstile. The sandbox manufactures exactly that moment on demand.

01The sandbox doesn't ramp up to load. It arrives at it. Every virtual buyer is released in the same instant, all racing for the same small block of seats. Contention is the experiment, not an accident.

02A read-then-write race

Here is the whole bug in four steps. Buyer A reads seat 12 and sees free. Buyer B reads seat 12 and also sees free, because A hasn't written anything yet. A writes sold. B writes sold. Nothing in that sequence is illegal in isolation; the failure is in the interleaving. The window between A's read and A's write is unguarded, and B walked straight through it.

02 The interleaving that sells one seat twice. Both buyers read free before either has written, so both believe they won. Nothing locked the seat between the read and the write.

In the sandbox this is the naivestrategy, and it's deliberately about twenty lines. It is also the only one of the four that is wrong, which is exactly why it ships in the rig: you cannot trust the fixes until you have watched the bug they fix.

strategies/naive.ts

// Naive claim: a check and a write, with a gap in between.
const taken = await conn.get(`seat:${id}`);   // round-trip #1
if (taken) return "rejected";
await conn.set(`seat:${id}`, buyerId);          // round-trip #2
return "claimed";   // two buyers can both reach this line for one seat

A check and a write that aren't the same instruction aren't a check: they're a polite suggestion the next request is free to ignore.
Operating principle, the seat-inventory core

03Measure it, don't assume

The trap with concurrency bugs is reasoning about them. It is easy to argue that a strategy is correct, ship it, and never find out it isn't, because the race is rare and the failure is silent. So the sandbox refuses to take any strategy's word for it. Every claim, from every worker, increments an atomic per-seat counter held centrally. After the herd drains, any seat whose counter went above one is an oversell, counted exactly. Correctness is measured, not asserted.

That single decision is what turns the project from a demo into an instrument. The naive strategy doesn't “probably” double-book; it double-books a specific, reproducible number of seats on every run. And the other three don't “seem” safe: they post a hard zero, run after run, and you can see why.

04Four ways to claim a seat

The rig pits four claim strategies against the identical herd. They differ only in which Redis primitive guards the seat, and the differences are stark:

Naive, GET → SET. The check and the write are separate round-trips; everyone who reads before the first write slips through. Oversells, by construction.

Optimistic, WATCH → MULTI/EXEC. Commit inside a transaction that aborts if the watched seat changed under you. The first writer wins; the losers get a null EXEC and retry. Correct, at the cost of retries when contention is high.

Pessimistic, SET NX PX → DEL. Take a per-seat distributed mutex, do the work, release it. Everyone else queues on the lock. Correct, but the queue shows up as lock-wait latency.

Atomic, EVAL (Lua). Run the check and the claim as one indivisible script on the server. There is no window between read and write because there is no read-then-write. It's a single operation. Correct, and the fastest: one round-trip, zero retries.

03 Oversold seats in one representative herd: roughly five thousand buyers contending for 250 seats at 20× over-subscription. Naive is the lone white bar; the other three hold a hard zero. White is the emphasis colour, spent here on the strategy that fails.

The atomic version is the punchline of the whole sandbox. It is the shortest strategy, the only one that needs neither a retry loop nor a lock, and the fastest under load, all because it stops trying to coordinate a read and a write and instead makes them one thing the database can't be interrupted in the middle of.

strategies/claim.lua

-- claim.lua: check-and-claim in one indivisible server-side script.
-- KEYS[1] = seat:{id}   ARGV[1] = buyerId        run with: EVAL
if redis.call('GET', KEYS[1]) then
  return 0            -- already sold
end
redis.call('SET', KEYS[1], ARGV[1])
return 1              -- claimed; there is no window to race into

05Two subtle ways to fail

Two of the “correct” strategies are correct only if you get a detail right that is very easy to get wrong, and both produce demos that look fine while being silently broken. The sandbox gets them right on purpose, because a teaching tool that quietly cheats teaches the wrong lesson.

WATCH is per-connection

Optimistic locking is meaningless if every buyer shares one socket. WATCH tracks a key on the connection it was issued on; pipe the whole herd through a single shared connection and the transaction can't isolate one buyer from another. So the rig runs the herd through a fixed connection pool, one connection per worker, exactly the way a real web server fronts Redis. Get this wrong and optimistic locking appears to work in a single-threaded test and falls apart in production.

A lock TTL shorter than the work is a bug

The pessimistic lock needs a TTL so a crashed worker can't wedge a seat forever. But if that TTL expires while the holder is still inside the critical section, a second worker acquires the “free” lock and you oversell anyway, now with a lock in place, which is worse, because you'll trust it. The TTL is sized to outlive the hold, and releasing it is a compare-and-delete Lua script so a worker can never delete a lock that has already rolled over to someone else.

strategies/release.lua

-- release.lua: drop the pessimistic lock, but only if it is still ours.
-- An unconditional DEL could free a lock a second worker already holds.
if redis.call('GET', KEYS[1]) == ARGV[1] then
  return redis.call('DEL', KEYS[1])
end
return 0

06The thundering herd

The engine lives in one file, lib/loadtest.ts. Its job is to generate a worst-case-shaped load and run it through the same machinery a real server would use, so the contention is honest. A single run walks a deliberately short path:

Shape the demand

Each virtual buyer is assigned a target seat under one of three demand models: uniform, hotspot, or Zipf. Skew is the point: real sales pile onto the good seats, and that's where races concentrate.

Hand out a connection

Every worker draws its own connection from a fixed pool, never a shared socket, so per-connection primitives like WATCH behave exactly as they would under a real web server.

Fire the claim

The worker runs the chosen strategy against its seat, retrying only where that strategy demands it. The whole herd is released at once.

Record the attempt

Each try appends a compact replay event, [t, seat, outcome, retries], to an in-memory log. That log is what makes the slow-motion replay possible later.

Tally centrally

An atomic per-seat counter is the source of truth for oversell; latency percentiles, a histogram, lock-hold times and throughput aggregate alongside it.

Because the rig measures the correct strategies side by side, it also prices them. Atomic stays flat and lowest: one round-trip, no retries, no queue. Pessimistic climbs as buyers pile up on each seat's lock and wait their turn. Optimistic sits between the two, paying in retries that grow with contention rather than in lock-wait.

atomic (EVAL)optimistic (WATCH)pessimistic (lock)

04 p99 claim latency as the herd grows, across the three correct strategies. Atomic (white) is both correct and the flattest: being right and being fast turn out to be the same decision.

07The race in slow motion

A number that says “238 oversells” is convincing. Watching the 238th happen is visceral. The replay layer takes the compact event log the engine recorded and turns it back into time. usePlayback derives a timeline from those [t, seat, outcome, retries] tuples and drives a requestAnimationFrame scrubber you can slow right down; SeatGridis the venue, where each seat lights up as it's claimed and flashes the moment a second buyer claims one that was already sold.

Run the naive strategy at quarter-speed and you can point at the collisions as they land. Run atomic and the grid fills in cleanly, seat by seat, never once flashing. Same herd, same seats, two completely different outcomes: the difference is four lines of Lua.

NAIVE · GET → SET4 double-booked

ATOMIC · EVAL0 double-booked

freesold oncedouble-booked

05 The same sold-out venue under two strategies. Left: naive leaves white, double-booked seats scattered through the block. Right: atomic sells the identical seats with not one collision.

The sandbox dashboard mid-run: the strategy picker, live oversell and throughput counters, and the seat grid filling in. — 06 The sandbox in use: pick a strategy and run, or race all four against the identical herd and compare oversells, throughput and p99 side by side.

The slow-motion scrubber: play and reset controls, speed multipliers, and the race-condition-triggered banner. — 06 The sandbox in use: pick a strategy and run, or race all four against the identical herd and compare oversells, throughput and p99 side by side.

08What the race costs

The payoff isn't a benchmark leaderboard: it's a single, stark comparison run on the identical herd, where the only variable is the line of Redis that guards the seat:

238

seats double-booked by naive
in one 5,000-buyer herd

oversells once the check and write
are the same instruction

redis round-trip, zero retries,
and atomic is the fastest path

What I keep coming back to is how little the fix has to do with cleverness. There is no exotic datastore, no consensus protocol, no queue. It is one indivisible operation pushed down to the layer that can enforce it for free. The hard part was never inventing a mechanism: it was the discipline to measure the failure first, so the fix had something real to be measured against.

09Teaching the sandbox to learn

The obvious next step is just more rows. Point the same herd at multi-seat orders, where one buyer claims four seats at once and a partial failure has to roll back cleanly, and you get the same race with several rows in flight instead of one. Harder, but the same shape. The question I actually keep poking at is stranger: what if the sandbox stopped being a fixed test bench and started to learn?

Herds that move like real crowds

Today the demand is hand-drawn: uniform, hotspot, Zipf. Those are caricatures of how people actually stampede a sale. Train a generative model on the real thing, the per-second seat heatmap from an actual on-sale, and it can produce synthetic herds that move like the genuine crowd: the front blocks going first, the slow drift to the cheap seats, the second spike when a resale drops. The rig stops guessing at a worst case and starts replaying a learned one.

The sandbox as a reinforcement-learning gym

This is the part I actually want to build. The rig already emits everything a reinforcement-learning loop needs. There is an observation (live contention per seat, queue depth, retry counts), an action space (which primitive to use, what TTL, how much retry budget to spend), and a reward it computes anyway (throughput and p99), with oversell as a hard constraint that voids the reward the instant it is broken. Wrap that as a Gymnasium environment and you can train an agent to choose a strategy per seat instead of committing to one for the whole venue: a cheap atomic claim for the cold seats, something more defensive for the three blocks everyone is fighting over. The strategy stops being a decision you make up front and becomes a policy learned against measured contention.

07 The sandbox as a gym. It already produces an observation, an action space and a reward on every run; the only missing piece is the agent in the left-hand box.

Calling the fire before it starts

A small time-series model watching the live claim stream could flag the hotspots before they fully form, pre-warming or pre-sharding the seats about to catch fire. The same signal runs in reverse as a safety net: a model that has watched a thousand clean runs knows the shape of a healthy sale, so it can spot a strategy starting to leak and raise a flag in the replay stream before the oversell counter has even confirmed it. And because every run already produces a structured replay log, it is a short hop to handing that log to a language model and getting back a plain-English postmortem: naive oversold 238 seats; the collisions clustered in block A between twelve and forty milliseconds, right at peak arrival. The instrument writes its own incident report.

None of this loosens the one rule the whole project is built on. A learned policy earns trust for exactly the same reason the hand-written one does: at the end of the run, the same atomic counter still has to read zero.

Machine learning gets to choose the strategy. It does not get a vote on whether the seat was sold twice.
The one rule that does not get to learn

Share this post

Gaurav Joshi

Software Engineer & Curious Technologist

I build scalable products from the ground up: ticketing infrastructure, government platforms, and the backend systems that hold them together under load. I write up the ones with interesting failure modes.

GitHub ↗LinkedIn ↗Email ↗