Midnight Database Deadlock: How I Rebuilt WMS Inventory with Engineering Thinking

During last year's Singles' Day, I was staring at the real-time order dashboard when the screen went black—the WMS system froze. Twenty workers stood idle with scanners, shelves piled with packages, and customer complaint phones ringing non-stop. I rushed to the server terminal, typed SHOW ENGINE INNODB STATUS, and was greeted by a screen full of DEADLOCK records. Inventory data was locked, all transactions waiting for each other.

TL;DR That incident taught me inventory management isn't just CRUD—it's an engineering battle against concurrency, consistency, and performance. Today I share flash warehouse WMS's practical experience from database transactions, lock mechanisms, caching strategies to architecture design, helping you avoid those pitfalls.

The Deadlock Storm: Starting from a Simple Inventory Deduction

Back to that crash night. Our inventory deduction logic was simple: when an order came, check stock, deduct if enough, else error. But the problem was concurrency—hundreds of orders per second, each with multiple SKUs, all contending for the same inventory rows.

We used MySQL InnoDB with default row locks. But due to inconsistent query and update order, two transactions locked different rows and waited for each other, causing deadlocks. Worse, we used SELECT ... FOR UPDATE to prevent overselling but didn't control lock scope, leading to massive lock waits.

The root of deadlocks isn't the database—it's not understanding the concurrency model of the business.

After that, I redesigned the inventory deduction transaction logic. Two core points:

Global Lock Order

All inventory operations are sorted by SKU ID hash before execution. For example, if order A and B both contain SKU-001 and SKU-002, regardless of thread, lock SKU-001 first, then SKU-002. This prevents A locking 001 waiting for 002, and B locking 002 waiting for 001.

Optimistic Lock Instead of Pessimistic

For high-concurrency hot SKUs, we switched to optimistic locks: read inventory version, check version on update. If changed, retry. This avoids long row locks.

Solution	Lock Granularity	Use Case	Deadlock Risk	Performance
Pessimistic Lock (SELECT FOR UPDATE)	Row lock	Low concurrency, high consistency	High	Low
Optimistic Lock (version)	No lock	High concurrency, few conflicts	None	High
Distributed Lock (Redis Redlock)	Global lock	Cross-service inventory	Low (needs design)	Medium

Inventory Consistency: More Than Just Deduction

After fixing deadlocks, inventory data still mismatched. System showed stock, but physical goods missing. Problem was 'shadow inventory'—multiple places modifying stock: order deduction, return replenishment, inventory adjustment, purchase inbound. No unified transaction management, leading to one operation overwriting another.

Inventory consistency is the hardest nut in distributed systems—needs architecture-level guarantees.

Unified Inventory Service

I encapsulated all inventory changes into an independent inventory service, providing atomic APIs. Any system modifying inventory must call this service. Inside, database transactions ensure ACID, with message queues (RabbitMQ) for async compensation.

Eventual vs Strong Consistency

For non-critical scenarios like backend inventory adjustments, we accept eventual consistency; for critical operations like order deduction, strong consistency is required. We use a simplified two-phase commit (2PC): first reserve inventory, then actually deduct. If order times out unpaid, release reservation.

Scenario	Consistency	Implementation	Latency Tolerance
Order deduction	Strong	2PC + reservation	Low (immediate)
Return replenishment	Eventual	Message queue + retry	High (minutes)
Inventory adjustment	Eventual	Scheduled task + reconciliation	High (hours)
Purchase inbound	Strong	Database transaction	Low (immediate)

Cache and Database: Preventing Cache Avalanche

Inventory queries are high-frequency. We initially put all SKU stock in Redis, using database only for persistence. Then one Redis instance crashed, all requests hit the database, database collapsed, system paralyzed. Classic 'cache avalanche'.

Cache isn't a silver bullet—misuse amplifies failures.

Layered Cache Strategy

We designed three cache layers: local memory (Caffeine) → distributed (Redis) → database. Local caches hot SKUs with short TTL (1s); Redis caches full inventory with medium TTL (5s); database for final persistence.

Cache Penetration and Breakdown Protection

For non-existent SKUs, we use Bloom Filter to pre-filter, preventing invalid queries reaching the database. For hot SKU cache expiration, use mutex lock so only one thread loads from database, others wait.

Problem	Phenomenon	Solution	Complexity
Cache avalanche	Many caches expire simultaneously, DB overloaded	Randomize TTL + degrade	Low
Cache penetration	Query non-existent data, DB hit	Bloom filter + cache null	Medium
Cache breakdown	Hot key expires, high concurrency hits DB	Mutex + never expire	Medium

Architecture Evolution: Monolith to Microservices Blood and Tears

Initially our WMS was a monolith. As warehouse expanded, inventory operations grew complex, monolith bottlenecks emerged: deployment required half-hour downtime, a module bug crashed the whole system.

Architecture isn't designed, it evolves.

Domain-Driven Design (DDD) Splitting

We split by business domain into multiple microservices: inventory, order, purchase, inventory. Each independently deployed with its own database, communicating via gRPC. Inventory service as core, with high-availability cluster.

Event Sourcing and CQRS

For historical inventory change tracing, we introduced Event Sourcing. Each inventory change recorded as an event in event store. For queries, use CQRS pattern: build a dedicated read database (MongoDB) for queries, write database (PostgreSQL) for writes. Read-write separation, no interference.

Summary

That Singles' Day crash made me rethink the engineering essence of inventory management. Now flash warehouse WMS's inventory system has been stable for over a year, handling peak 500 orders per second without deadlocks or data inconsistency.

Key Takeaways

Deadlocks are a concurrency model issue, not the database's fault

Inventory consistency requires unified service + appropriate transaction model

Cache needs layered design to prevent avalanche, penetration, breakdown

Architecture should evolve gradually; DDD is a good tool

Event Sourcing makes history traceable; CQRS improves query performance

If you're building a WMS system, I hope these experiences help you avoid a few pitfalls. After all, the taste of staring at database logs at 3 AM is something I've tasted enough for both of us.

References

Fortune Business Insights WMS Market Report — WMS market size and growth data
Gartner Supply Chain Research — Supply chain technology trends and best practices
McKinsey Operations Insights — Operational efficiency and digital transformation strategies