Midnight Database Deadlock: How I Rebuilt WMS Inventory with Engineering Thinking
At 2 AM, our WMS system froze—all scanners dead. Staring at DB logs full of deadlocks, I learned the hard way that inventory management is about engineering resilience, not just code.
During last year's Singles' Day, I was staring at the real-time order dashboard when the screen went black—the WMS system froze. Twenty workers stood idle with scanners, shelves piled with packages, and customer complaint phones ringing non-stop. I rushed to the server terminal, typed SHOW ENGINE INNODB STATUS, and was greeted by a screen full of DEADLOCK records. Inventory data was locked, all transactions waiting for each other.
TL;DR That incident taught me inventory management isn't just CRUD—it's an engineering battle against concurrency, consistency, and performance. Today I share flash warehouse WMS's practical experience from database transactions, lock mechanisms, caching strategies to architecture design, helping you avoid those pitfalls.
The Deadlock Storm: Starting from a Simple Inventory Deduction
Back to that crash night. Our inventory deduction logic was simple: when an order came, check stock, deduct if enough, else error. But the problem was concurrency—hundreds of orders per second, each with multiple SKUs, all contending for the same inventory rows.
We used MySQL InnoDB with default row locks. But due to inconsistent query and update order, two transactions locked different rows and waited for each other, causing deadlocks. Worse, we used SELECT ... FOR UPDATE to prevent overselling but didn't control lock scope, leading to massive lock waits.
The root of deadlocks isn't the database—it's not understanding the concurrency model of the business.
After that, I redesigned the inventory deduction transaction logic. Two core points:
Global Lock Order
All inventory operations are sorted by SKU ID hash before execution. For example, if order A and B both contain SKU-001 and SKU-002, regardless of thread, lock SKU-001 first, then SKU-002. This prevents A locking 001 waiting for 002, and B locking 002 waiting for 001.
Optimistic Lock Instead of Pessimistic
For high-concurrency hot SKUs, we switched to optimistic locks: read inventory version, check version on update. If changed, retry. This avoids long row locks.
| Solution | Lock Granularity | Use Case | Deadlock Risk | Performance |
|---|---|---|---|---|
| Pessimistic Lock (SELECT FOR UPDATE) | Row lock | Low concurrency, high consistency | High | Low |
| Optimistic Lock (version) | No lock | High concurrency, few conflicts | None | High |
| Distributed Lock (Redis Redlock) | Global lock | Cross-service inventory | Low (needs design) | Medium |
Inventory Consistency: More Than Just Deduction
After fixing deadlocks, inventory data still mismatched. System showed stock, but physical goods missing. Problem was 'shadow inventory'—multiple places modifying stock: order deduction, return replenishment, inventory adjustment, purchase inbound. No unified transaction management, leading to one operation overwriting another.
Inventory consistency is the hardest nut in distributed systems—needs architecture-level guarantees.
Unified Inventory Service
I encapsulated all inventory changes into an independent inventory service, providing atomic APIs. Any system modifying inventory must call this service. Inside, database transactions ensure ACID, with message queues (RabbitMQ) for async compensation.
Eventual vs Strong Consistency
For non-critical scenarios like backend inventory adjustments, we accept eventual consistency; for critical operations like order deduction, strong consistency is required. We use a simplified two-phase commit (2PC): first reserve inventory, then actually deduct. If order times out unpaid, release reservation.
| Scenario | Consistency | Implementation | Latency Tolerance |
|---|---|---|---|
| Order deduction | Strong | 2PC + reservation | Low (immediate) |
| Return replenishment | Eventual | Message queue + retry | High (minutes) |
| Inventory adjustment | Eventual | Scheduled task + reconciliation | High (hours) |
| Purchase inbound | Strong | Database transaction | Low (immediate) |
Cache and Database: Preventing Cache Avalanche
Inventory queries are high-frequency. We initially put all SKU stock in Redis, using database only for persistence. Then one Redis instance crashed, all requests hit the database, database collapsed, system paralyzed. Classic 'cache avalanche'.
Cache isn't a silver bullet—misuse amplifies failures.
Layered Cache Strategy
We designed three cache layers: local memory (Caffeine) → distributed (Redis) → database. Local caches hot SKUs with short TTL (1s); Redis caches full inventory with medium TTL (5s); database for final persistence.
Cache Penetration and Breakdown Protection
For non-existent SKUs, we use Bloom Filter to pre-filter, preventing invalid queries reaching the database. For hot SKU cache expiration, use mutex lock so only one thread loads from database, others wait.
| Problem | Phenomenon | Solution | Complexity |
|---|---|---|---|
| Cache avalanche | Many caches expire simultaneously, DB overloaded | Randomize TTL + degrade | Low |
| Cache penetration | Query non-existent data, DB hit | Bloom filter + cache null | Medium |
| Cache breakdown | Hot key expires, high concurrency hits DB | Mutex + never expire | Medium |
Architecture Evolution: Monolith to Microservices Blood and Tears
Initially our WMS was a monolith. As warehouse expanded, inventory operations grew complex, monolith bottlenecks emerged: deployment required half-hour downtime, a module bug crashed the whole system.
Architecture isn't designed, it evolves.
Domain-Driven Design (DDD) Splitting
We split by business domain into multiple microservices: inventory, order, purchase, inventory. Each independently deployed with its own database, communicating via gRPC. Inventory service as core, with high-availability cluster.
Event Sourcing and CQRS
For historical inventory change tracing, we introduced Event Sourcing. Each inventory change recorded as an event in event store. For queries, use CQRS pattern: build a dedicated read database (MongoDB) for queries, write database (PostgreSQL) for writes. Read-write separation, no interference.
Summary
That Singles' Day crash made me rethink the engineering essence of inventory management. Now flash warehouse WMS's inventory system has been stable for over a year, handling peak 500 orders per second without deadlocks or data inconsistency.
Key Takeaways
- Deadlocks are a concurrency model issue, not the database's fault
- Inventory consistency requires unified service + appropriate transaction model
- Cache needs layered design to prevent avalanche, penetration, breakdown
- Architecture should evolve gradually; DDD is a good tool
- Event Sourcing makes history traceable; CQRS improves query performance
If you're building a WMS system, I hope these experiences help you avoid a few pitfalls. After all, the taste of staring at database logs at 3 AM is something I've tasted enough for both of us.
References
- Fortune Business Insights WMS Market Report — WMS market size and growth data
- Gartner Supply Chain Research — Supply chain technology trends and best practices
- McKinsey Operations Insights — Operational efficiency and digital transformation strategies