Chapter 7 – Consistency, Availability, and Latency in Practice
Translating Theory Into Milliseconds and Dollars
Every distributed systems textbook introduces the CAP theorem: in the presence of network partitions, you must choose between Consistency and Availability[1]. It’s elegant, provable, and almost useless for making real-world architectural decisions.
The problem isn’t that CAP is wrong—it’s that it’s too abstract. What does “consistency” actually mean for your shopping cart? What does “availability” cost in terms of infrastructure? And what the theorem doesn’t tell you: when the network is working fine (which is 99.9% of the time), you’re not choosing between C and A—you’re choosing between consistency, latency, and operational complexity.
This is PACELC: in the absence of Partitions, you trade off Availability vs Latency, and during Partitions you trade off Consistency vs Availability[2]. Better, but still theoretical.
In this chapter, we’re going to make it concrete. We’ll quantify what each consistency level costs in milliseconds and dollars. We’ll examine real production incidents where consistency choices led to outages. We’ll explore hybrid models that try to give you the best of multiple worlds. And we’ll provide a decision framework for choosing consistency based on your actual workload characteristics.
Because here’s the reality: there’s no one “correct” consistency level. There’s only the level that matches your requirements and your budget.
The Consistency Spectrum: What You Actually Get
Let’s define what different consistency levels mean in practice, not theory.
Level 1: Eventual Consistency
Promise: All replicas will converge to the same value if writes stop.
What this actually means:
Reads might be stale
Two clients reading simultaneously might see different values
Write conflicts might occur and must be resolved (last-write-wins, vector clocks, or application logic)
Latency characteristics:
Writes: 1-5ms (local node only, async replication)
Reads: 1-5ms (local replica, might be stale)
Replication lag: typically 100-1000ms, can spike to seconds during failures
Example systems: DynamoDB (default), Cassandra (CL=ONE), MongoDB with read preference secondary[3][4][5].
Real-world behavior:
T=0ms: User A (NYC) writes: item_stock = 10
T=1ms: NYC replica persists write, ACK to user
T=150ms: Replication reaches London replica
T=100ms: User B (London) reads: sees item_stock = 15 (stale)
T=200ms: User B reads again: sees item_stock = 10 (fresh)
User B sees the inventory go from 15 to 10 units—backwards in time from their perspective. This is normal for eventual consistency.
Cost model (100 nodes, 10k writes/sec):
Write latency: 2ms average
Cross-region bandwidth: Minimal immediate cost (async replication)
Infrastructure: ~$20k/month (no synchronous coordination overhead)
Level 2: Read Your Writes (Session Consistency)
Promise: A client will always see its own writes.
What this actually means:
Your writes are immediately visible to you
Other clients might not see your writes yet
No guarantee about seeing other clients’ writes
Latency characteristics:
Writes: 1-5ms (local, but session tracking adds overhead)
Reads: 1-5ms (must route to nodes with your writes)
Sticky sessions required (client must query same node or replica set)
Example systems: DynamoDB with ConsistentRead, MongoDB with read concern “majority” after write[3][5].
Real-world behavior:
T=0ms: User A writes: profile_picture = “new.jpg”
T=1ms: Write persists to NYC replica
T=2ms: User A refreshes page, reads profile
→ System routes to NYC replica
→ Sees profile_picture = “new.jpg” ✓
T=100ms: User B (different session) reads User A’s profile
→ Routes to London replica
→ Sees profile_picture = “old.jpg” (stale)
User A always sees their own updates. User B might see stale data. This prevents the jarring “my change disappeared” experience while keeping latency low.
Cost model:
Write latency: 2-3ms (session tracking adds ~1ms)
Infrastructure: ~$22k/month (+10% for session management)
Level 3: Monotonic Reads
Promise: If a client reads value X, subsequent reads will never return a value older than X.
What this actually means:
Time doesn’t go backwards from a client’s perspective
You might see stale data, but staleness only decreases, never increases
Different clients might see data at different points in time
Latency characteristics:
Writes: 1-5ms (local)
Reads: 1-5ms (but must track read timestamps per client)
Requires version vectors or read timestamps
Example systems: Riak with monotonic reads, Cassandra with client-side timestamp tracking[6].
Real-world behavior:
T=0ms: Item stock = 10
T=50ms: User A reads: stock = 10
T=100ms: User B writes: stock = 5
T=150ms: User A reads again
→ Must see stock = 5 OR stock = 10
→ NEVER stock = 15 (an older value)
This prevents the “inventory went from 10 to 15 to 5” confusion that pure eventual consistency allows.
Cost model:
Infrastructure: ~$23k/month (+15% for timestamp tracking)
Level 4: Causal Consistency
Promise: If operation A causally affects operation B, all nodes see A before B.
What this actually means:
Writes that depend on each other are ordered correctly
Independent writes can be seen in any order
Prevents reading effects before causes
Latency characteristics:
Writes: 5-20ms (must track causality metadata)
Reads: 1-5ms (local)
Metadata overhead: ~50-200% storage increase (vector clocks, version vectors)
Example systems: MongoDB with causal consistency, COPS, Eiger[5][7][8].
Real-world behavior:
T=0ms: User A writes: post_id = 123, content = “Hello”
T=50ms: User A writes: comment = “First!”, post_id = 123
T=100ms: User B reads:
→ Sees either:
a) Nothing yet (writes haven’t propagated)
b) Post only
c) Post + Comment
→ NEVER: Comment without Post
This prevents breaking referential integrity even with eventual consistency.
Cost model:
Write latency: 10-15ms (causality tracking)
Storage overhead: +50-200% (vector clocks)
Infrastructure: ~$30k/month (+50% for causality metadata and processing)
Level 5: Sequential Consistency
Promise: All operations appear to execute in some sequential order, and operations of each individual process appear in order.
What this actually means:
There’s a global order that all nodes agree on
Your writes appear in the order you made them
Other clients’ writes might interleave with yours
Latency characteristics:
Writes: 50-150ms (requires coordination across replicas)
Reads: 1-5ms (can read from local replica)
Coordination: Requires leader election and log replication
Example systems: etcd, Consul, Zookeeper (for metadata)[9][10].
Real-world behavior:
Client A writes: x = 1, then y = 2
Client B writes: x = 3, then y = 4
All nodes see one of these orderings:
a) x=1, y=2, x=3, y=4
b) x=1, x=3, y=2, y=4
c) x=3, y=4, x=1, y=2
Never: x=3, x=1, y=2, y=4 (A’s operations out of order)
Cost model:
Write latency: 80-150ms (cross-region coordination)
Infrastructure: ~$45k/month (+125% for consensus protocols)
Level 6: Linearizability (Strict Serializability)
Promise: Operations appear to execute instantaneously at some point between invocation and response, respecting real-time ordering.
What this actually means:
Strongest possible consistency
Database behaves like a single machine
Every read sees the most recent write globally
Transactions appear atomic and isolated
Latency characteristics:
Writes: 100-300ms (global quorum consensus)
Reads: 1-5ms (stale read from replica) OR 100-300ms (linearizable read from leader)
Transaction latency: 200-500ms for multi-region transactions
Example systems: Google Spanner, CockroachDB (default), etcd (for metadata)[11][12][9].
Real-world behavior:
T=0ms: Client A writes: balance = $1000
T=150ms: Write completes, balance committed
T=151ms: Client B reads: balance = $1000 (guaranteed)
T=151ms: Client C (anywhere in world) reads: balance = $1000
No client can ever read balance < $1000 after T=150ms
This is what you need for financial transactions, inventory management, and any scenario where stale reads cause correctness problems.
Cost model:
Write latency: 150-250ms average
Transaction latency: 300-500ms
Infrastructure: ~$60k/month (+200% for global consensus)
The Latency Tax: Quantified
Let’s model a concrete application: e-commerce checkout flow with 1M transactions/day.
User flow:
Read cart (1 query)
Check inventory (10 queries, one per item)
Create order (1 write)
Update inventory (10 writes)
Charge payment (1 external API call)
Confirm order (1 write)
Total: 11 reads, 12 writes per transaction
Scenario 1: Eventual Consistency (All Operations)
Reads: 11 × 2ms = 22ms
Writes: 12 × 2ms = 24ms
External API: 100ms
Total: 146ms per transaction
P99: ~200ms
Problem: User adds item to cart. Inventory shows “10 available.” They checkout. Order fails because actual inventory was 0 (stale read). User frustrated.
Failure rate: ~2-5% of transactions fail due to stale reads at high load.
Scenario 2: Linearizable Reads + Eventual Writes
Reads: 11 × 150ms = 1,650ms
Writes: 12 × 2ms = 24ms
External API: 100ms
Total: 1,774ms per transaction
P99: ~2,500ms
Problem: Checkout takes nearly 2 seconds. Conversion rate drops. Users abandon carts.
Business impact: 2-second delay = 20-30% conversion drop[13].
Scenario 3: Hybrid (Smart Consistency Selection)
Cart reads: Eventual (2ms × 1 = 2ms)
Inventory reads: Bounded staleness, max 1 second old (5ms × 10 = 50ms)
Order creation: Linearizable (150ms)
Inventory updates: Serializable transaction (200ms)
Payment: External (100ms)
Order confirmation: Async (2ms)
Total: 504ms per transaction
P99: ~800ms
Result:
No stale cart data issues (mild staleness acceptable)
Inventory checks are recent enough (1-second bound acceptable)
Order creation is strongly consistent (required for correctness)
Inventory updates are transactional (prevents overselling)
Confirmation is async (user doesn’t wait)
This is 3× faster than full linearizability, 3× more consistent than eventual consistency.
Real-World Incidents: When Consistency Choices Fail
Let’s examine production failures caused by consistency trade-offs.
Incident 1: Slack Outage (February 2022)
What happened: Slack experienced a multi-hour outage affecting message delivery[14].
Root cause:
Slack uses eventual consistency for message delivery
A deployment introduced a bug in the conflict resolution logic
Two users sent messages simultaneously to the same channel
Conflict resolution failed, causing a cascade of retries
Retry storm amplified, overwhelming message queues
Message delivery degraded cluster-wide
Consistency choice impact:
Eventual consistency allowed the conflict to occur
No coordination to prevent simultaneous writes
Application-level conflict resolution was the failure point
What linearizability would have prevented:
Writes would coordinate, preventing conflicts
But: Message latency would increase from ~50ms to ~200ms
And: Write throughput would decrease by ~5×
Slack’s trade-off: They chose eventual consistency for performance, accepted occasional conflict resolution complexity, but this time the complexity broke.
Incident 2: Cloudflare Durable Objects Latency Spike (2021)
What happened: Durable Objects experienced P99 latency spike from 50ms to 5,000ms[15].
Root cause:
Durable Objects provide single-writer strong consistency
Each object has a designated leader datacenter
Network congestion caused some objects to migrate between datacenters
During migration, writes blocked waiting for state transfer
P99 latency spiked as ~1% of objects were in migration state
Consistency choice impact:
Strong consistency requires single-writer semantics
Single-writer means objects can’t be accessed during migration
Migration is unavoidable during network events
What eventual consistency would have provided:
Reads could continue from stale replicas during migration
But: Would violate the strong consistency guarantee users depend on
Cloudflare’s trade-off: They chose strong consistency for correctness, accepted migration latency risk, and are investing in faster migration protocols.
Incident 3: DynamoDB Global Table Replication Lag (Ongoing)
What happens: DynamoDB Global Tables occasionally experience replication lag spikes to 10-60 seconds[16].
Root cause:
Global Tables use eventual consistency with async replication
Cross-region replication competes with application traffic for bandwidth
During traffic spikes, replication falls behind
Lag accumulates, taking minutes to drain
Consistency choice impact:
Eventual consistency means replication can lag
Applications reading from non-primary regions see stale data
“Last write wins” conflict resolution can lose updates
Real-world impact example:
Gaming leaderboard updates in us-east
European users read from eu-west
Leaderboard shows stale rankings for 30 seconds
Users complain about incorrect rankings
What strong consistency would provide:
Immediate global visibility of updates
But: Write latency increases from 5ms to 150ms
And: Cross-region bandwidth costs increase dramatically
DynamoDB’s trade-off: They provide eventual consistency by default, with option for strongly consistent reads (but only within a single region).
Hybrid Models: The Practical Middle Ground
Most production systems don’t use a single consistency level. They use different levels for different data based on requirements.
Bounded Staleness
Guarantee: Reads are at most N seconds or K versions stale.
Example: Azure Cosmos DB’s bounded staleness (1-5 second lag guaranteed)[17].
Use case: Analytics dashboards. Users understand “data as of 5 seconds ago” is acceptable. You get local-read performance with bounded inconsistency.
Implementation:
Track timestamp on all writes
Reads compare timestamp: if too old, redirect to primary or wait for replication
Requires synchronized clocks (loose synchronization acceptable, ~1 second skew)
Cost: 10-20% overhead for timestamp tracking, occasional redirects add latency spikes.
Session Consistency Within Region, Eventual Across Regions
Guarantee: Your writes visible to you immediately in your region. Visible globally eventually.
Example: Instagram likes and comments[18].
Use case: Social feeds. You must see your own likes/comments immediately. Others seeing them 500ms later is fine.
Implementation:
Writes go to local region’s primary
Session tracks which writes belong to which user
Reads check: “do I need to wait for this user’s writes?” If yes, query primary. If no, query any replica.
Cost: Minimal—just session tracking overhead (~5-10ms per request).
Transactional Consistency for Critical Data, Eventual for Everything Else
Guarantee: Some tables/keys get strong consistency, others get eventual.
Example: E-commerce (discussed above).
Critical data (strong consistency):
Inventory counts
Order state
Payment records
User account balances
Non-critical data (eventual consistency):
Product descriptions
User reviews
Recommendations
Analytics events
Implementation:
Tag tables/keys with consistency level
Router directs queries to appropriate consistency service
Critical path uses serializable transactions (~200ms)
Non-critical path uses local reads (~2ms)
Cost: Dual infrastructure—must run both strongly consistent and eventually consistent systems. But you don’t pay strong consistency tax for bulk of data.
The Decision Framework: Choosing Consistency Per Workload
Here’s a framework for deciding consistency requirements:
Question 1: What Happens If the Data Is Stale?
If stale data causes incorrect behavior:
User sees wrong inventory → tries to buy → order fails → frustration
Requires: Strong consistency (linearizable or serializable)
If stale data causes degraded experience:
User sees cached product description → slightly outdated info → minor confusion
Requires: Bounded staleness (5-60 second bound acceptable)
If stale data is irrelevant:
User sees yesterday’s analytics report → no impact on decisions
Requires: Eventual consistency (hours/days of lag acceptable)
Question 2: What’s the Write Rate?
Low write rate (<100 writes/second):
Strong consistency overhead is negligible
Use: Linearizable or serializable
Medium write rate (100-10k writes/second):
Strong consistency adds 50-100ms per write
Use: Hybrid—strong for critical, eventual for non-critical
High write rate (>10k writes/second):
Strong consistency may not be achievable
Use: Eventual with conflict resolution
Question 3: What’s the Read Rate vs Write Rate?
Read-heavy (100:1 read:write ratio):
Can afford slower writes for fast reads
Use: Async replication, read from replicas
Balanced (1:1 read:write):
Trade-offs matter more
Use: Hybrid consistency based on data criticality
Write-heavy (1:10 read:write):
Cannot afford write coordination overhead
Use: Eventual consistency with good conflict resolution
Question 4: What’s Your Budget?
Cost of consistency levels (relative, normalized to eventual = 1.0×):
Eventual: 1.0× infrastructure cost
Read your writes: 1.1×
Monotonic reads: 1.15×
Causal: 1.5×
Sequential: 2.25×
Linearizable: 3.0×
If budget is constrained, eventual consistency might be forced regardless of requirements.
Question 5: What’s Your Operational Complexity Tolerance?
Simple operations (small team, limited experience):
Avoid causal consistency (complex metadata management)
Avoid hybrid models (multiple consistency levels increase cognitive load)
Use: Single consistency level—either eventual or strong
Complex operations (large team, distributed systems expertise):
Can handle multiple consistency levels
Can build custom conflict resolution
Use: Hybrid models optimized per data type
The Cost-Latency-Consistency Triangle
Here’s the fundamental trade-off visualized as cost vs latency at different consistency levels:
Consistency Level | Latency | Infrastructure Cost | Use Case
-------------------------|---------|---------------------|------------------
Eventual | 2ms | $20k/month | Logs, metrics
Read Your Writes | 3ms | $22k/month | Social feeds
Monotonic Reads | 3ms | $23k/month | User preferences
Causal | 15ms | $30k/month | Collaborative apps
Sequential | 100ms | $45k/month | Metadata stores
Linearizable (stale reads)| 3ms | $60k/month | Financial (reads)
Linearizable (fresh reads)| 150ms | $60k/month | Financial (writes)
For a 100-node, 3-region cluster handling 10k writes/sec, 100k reads/sec.
Key insight: Moving from eventual to linearizable:
Increases write latency by 75× (2ms → 150ms)
Increases infrastructure cost by 3× ($20k → $60k/month)
But: Eliminates entire classes of bugs and edge cases
Whether that trade-off is worth it depends entirely on your application requirements and budget.
The Path Forward
We’ve now established the full spectrum of consistency models and their real-world costs. We’ve seen that:
Eventual consistency is fast and cheap but requires careful conflict resolution
Linearizability is correct and simple but slow and expensive
Hybrid models are practical but operationally complex
The question becomes: can we build systems that adapt consistency based on access patterns? That provide strong consistency for hot, critical data and eventual consistency for cold, non-critical data? That automatically migrate data between consistency tiers based on observed behavior?
This is part of the “Intelligent Data Plane” vision we’ll explore in Part III—systems that don’t force you to choose a single consistency level upfront, but rather optimize consistency per data item based on its characteristics.
In Chapter 8, we’ll examine how security and compliance intersect with data locality and consistency. Because it turns out that where your data lives and how strongly consistent it is directly impacts your security posture, compliance requirements, and regulatory exposure.
References
[1] S. Gilbert and N. Lynch, “Brewer’s Conjecture and the Feasibility of Consistent, Available, Partition-Tolerant Web Services,” ACM SIGACT News, vol. 33, no. 2, pp. 51-59, 2002.
[2] D. J. Abadi, “Consistency Tradeoffs in Modern Distributed Database System Design,” IEEE Computer, vol. 45, no. 2, pp. 37-42, 2012.
[3] G. DeCandia et al., “Dynamo: Amazon’s Highly Available Key-value Store,” Proc. 21st ACM Symposium on Operating Systems Principles, pp. 205-220, 2007.
[4] A. Lakshman and P. Malik, “Cassandra: A Decentralized Structured Storage System,” ACM SIGOPS Operating Systems Review, vol. 44, no. 2, pp. 35-40, 2010.
[5] MongoDB, “Read Concern,” MongoDB Manual, 2024. [Online]. Available: https://docs.mongodb.com/manual/reference/read-concern/
[6] Basho Technologies, “Riak KV Documentation,” 2024. [Online]. Available: https://docs.riak.com/
[7] W. Lloyd et al., “Don’t Settle for Eventual: Scalable Causal Consistency for Wide-Area Storage with COPS,” Proc. 23rd ACM Symposium on Operating Systems Principles, pp. 401-416, 2011.
[8] W. Lloyd et al., “Stronger Semantics for Low-Latency Geo-Replicated Storage,” Proc. 10th USENIX Symposium on Networked Systems Design and Implementation, pp. 313-328, 2013.
[9] etcd, “etcd Documentation,” 2024. [Online]. Available: https://etcd.io/docs/
[10] HashiCorp, “Consul Documentation,” 2024. [Online]. Available: https://www.consul.io/docs
[11] J. C. Corbett et al., “Spanner: Google’s Globally-Distributed Database,” ACM Transactions on Computer Systems, vol. 31, no. 3, pp. 8:1-8:22, 2013.
[12] R. Taft et al., “CockroachDB: The Resilient Geo-Distributed SQL Database,” Proc. 2020 ACM SIGMOD International Conference on Management of Data, pp. 1493-1509, 2020.
[13] Google, “The Impact of Page Speed on Conversion Rates,” Google Research, 2018. [Online]. Available: https://web.dev/
[14] Slack Engineering, “Slack Outage Postmortem - February 22, 2022,” Slack Engineering Blog, 2022. [Online]. Available: https://slack.engineering/
[15] Cloudflare, “Durable Objects Performance Improvements,” Cloudflare Blog, 2021. [Online]. Available: https://blog.cloudflare.com/
[16] AWS, “Amazon DynamoDB Global Tables,” AWS Documentation, 2024. [Online]. Available: https://aws.amazon.com/dynamodb/global-tables/
[17] Microsoft, “Consistency Levels in Azure Cosmos DB,” Azure Documentation, 2024. [Online]. Available: https://docs.microsoft.com/azure/cosmos-db/consistency-levels
[18] Instagram Engineering, “Scaling Instagram Infrastructure,” Instagram Engineering Blog, 2014. [Online]. Available: https://instagram-engineering.com/
Next in this series: Chapter 8 - Security and Compliance Across Regions, where we’ll explore how data locality intersects with encryption, tokenization, regulatory requirements, and the challenge of building secure systems that span geographies.

