Apendix B: Glossary of Distributed Data Terms

Concise Definitions for Engineers and Executives

Oct 12, 2025

This glossary provides clear, practical definitions of key terms used throughout the series. Definitions prioritize clarity over academic precision.

A

Adaptive Storage Storage systems that automatically move data between tiers (hot/warm/cold) based on observed access patterns rather than static rules. Example: Redpanda automatically demoting cold topics to object storage.

Availability The percentage of time a system is operational and accessible. Measured as uptime divided by total time. See “Five Nines” and “SLA.”

Availability Zone (AZ) An isolated datacenter within a cloud region, typically with independent power, cooling, and networking. AZs in the same region have low latency (1-2ms) but protection against single-datacenter failures.

B

Bandwidth The rate at which data can be transferred between systems, measured in bits or bytes per second. Network bandwidth determines how quickly you can move data between locations.

Blast Radius The scope of impact when a component fails. Smaller blast radius (through sharding, isolation) means failures affect fewer users.

Byzantine Fault A failure where a component behaves arbitrarily or maliciously, potentially sending conflicting information to different parts of the system. Harder to handle than simple crash failures.

C

CAP Theorem Theorem stating that distributed systems can provide at most two of: Consistency (all nodes see same data), Availability (system responds to requests), Partition tolerance (system works despite network failures). In practice, partition tolerance is mandatory, so the choice is between consistency and availability during network partitions.

Causal Consistency Consistency model guaranteeing that causally related operations are seen in order by all nodes. Weaker than sequential consistency but stronger than eventual consistency. Example: If you post a message then edit it, everyone sees the edit after the original.

CDN (Content Delivery Network) Geographically distributed network of servers that cache and serve content from locations near users. Reduces latency and bandwidth costs for static assets.

Cold Data Data that is rarely accessed (e.g., monthly or less). Typically stored in cheaper, slower storage tiers like object storage or archival systems.

Consistency Level The guarantee about how current and synchronized data reads are across replicas. Ranges from eventual (may be stale) to linearizable (always current). See Chapter 7.

Consensus Algorithm Protocol for getting distributed nodes to agree on a value despite failures. Examples: Paxos, Raft. Required for strong consistency but adds latency overhead.

Controller (IDP) Component of the Intelligent Data Plane that makes placement and optimization decisions based on telemetry. Examples: Placement Controller, Cost Controller, Compliance Controller.

Cross-Region Replication Copying data between geographically distant datacenters, typically across continents. Adds significant latency (50-200ms) but improves availability and performance for global users.

D

Data Gravity The concept that data and compute mutually attract each other—large datasets attract compute workloads, and heavy compute workloads attract data. The system should optimize placement of both.

Data Residency Legal or regulatory requirement that certain data must be stored in specific geographic locations. Common in GDPR (EU data in EU) and other privacy regulations.

Data Temperature Classification of how frequently data is accessed: hot (frequent), warm (occasional), cold (rare). Determines optimal storage tier.

Durable Execution Execution model where application state automatically persists across failures, allowing workflows to pause and resume. Implemented by systems like Temporal and Durable Objects.

Durability Guarantee that once data is written, it will not be lost even if systems fail. Typically achieved through replication and persistent storage.

E

Edge Computing Running computation close to data sources or users, typically in distributed mini-datacenters rather than centralized cloud regions. Reduces latency but increases operational complexity.

Egress Cost Charges for data leaving a cloud provider’s network, typically to the internet. Often the largest bandwidth cost component ($0.05-$0.12/GB).

Embedded Database Database library that runs within an application process (e.g., SQLite, RocksDB) rather than as a separate server. Eliminates network latency but limits sharing between applications.

Eventual Consistency Consistency model where replicas may temporarily diverge but will eventually converge to the same state if writes stop. Provides high availability but requires application-level conflict resolution.

F

Failover Process of switching to a backup system when the primary fails. Can be automatic or manual. Fast failover is critical for high availability.

Five Nines (99.999%) Availability level allowing only 5.26 minutes of downtime per year. Expensive to achieve, requiring redundancy and automatic failover.

Follower (Replica) In leader-follower replication, nodes that receive and apply writes from the leader but don’t directly serve writes. May serve reads depending on consistency requirements.

G

GDPR (General Data Protection Regulation) European Union privacy law requiring strict controls on personal data, including data residency (EU data stays in EU), right to erasure, and explicit consent.

Graceful Degradation Design principle where systems continue providing reduced functionality when components fail, rather than failing completely. Example: Serve stale cache if database is slow.

Gossip Protocol Communication pattern where nodes randomly share information with neighbors, and information spreads through the cluster. Used for cluster membership and eventual consistency.

H

Heartbeat Periodic signal sent between nodes to indicate they’re alive and functioning. Missed heartbeats trigger failover or rerouting.

HIPAA (Health Insurance Portability and Accountability Act) US law governing healthcare data, requiring encryption, access controls, audit logging, and specific handling of Protected Health Information (PHI).

Homeostasis In systems theory, the property of maintaining stable internal conditions despite external changes through feedback loops. Applied to distributed systems in Chapter 14.

Hot Data Frequently accessed data (e.g., accessed daily or hourly) that should live in fast storage tiers for optimal performance.

Hot Spot Situation where one shard or node receives disproportionately high load, becoming a bottleneck. Often caused by poor shard key selection.

I

IDP (Intelligent Data Plane) Control layer that orchestrates data placement across the locality spectrum using telemetry, prediction, and continuous optimization. Central concept of Chapters 9-12.

Idempotency Property where applying an operation multiple times has the same effect as applying it once. Critical for retry safety in distributed systems.

IOPS (Input/Output Operations Per Second) Measure of storage performance, indicating how many read or write operations can be performed per second. SSDs: 10k-500k IOPS, HDDs: 100-200 IOPS.

J

Jitter Variability in latency. High jitter means unpredictable response times, which can be worse for user experience than consistently higher latency.

L

Latency Time delay between request and response. Composed of propagation delay (distance), transmission delay (bandwidth), processing delay (computation), and queueing delay (congestion).

Leader (Primary) In leader-follower replication, the node that receives all writes and coordinates replication to followers. Single point of write coordination.

Linearizability Strongest consistency model, guaranteeing that operations appear to occur atomically at some point between invocation and completion. Expensive in terms of latency and coordination.

Load Balancer Component that distributes incoming requests across multiple servers. Can be hardware or software, Layer 4 (TCP) or Layer 7 (HTTP).

Locality Property of data being close (in network terms) to the computation or users that need it. Better locality means lower latency and bandwidth costs.

LSM-Tree (Log-Structured Merge Tree) Storage structure used by many databases (Cassandra, RocksDB) that optimizes for write throughput by appending to logs and periodically merging. Causes write amplification.

M

Multi-Tenancy Architecture where a single system serves multiple customers (tenants) with logical isolation. More efficient than per-tenant infrastructure but requires careful isolation.

MVCC (Multi-Version Concurrency Control) Technique where the database maintains multiple versions of data to allow reads without blocking writes. Used by PostgreSQL, CockroachDB.

N

Network Partition Failure where some nodes can communicate with each other but not with other nodes, splitting the cluster. Forces choice between consistency and availability (CAP theorem).

Nines Shorthand for availability. “Three nines” = 99.9%, “four nines” = 99.99%, “five nines” = 99.999%. Each additional nine is exponentially harder to achieve.

O

Object Storage Storage service providing key-value access to data objects (files) over HTTP APIs. Examples: AWS S3, Google Cloud Storage. Cheaper than block storage but higher latency.

P

PACELC Theorem Extension of CAP theorem: If Partition, choose Availability or Consistency; Else (no partition), choose Latency or Consistency. Acknowledges that trade-offs exist even without failures.

Partition (Shard) Subset of data assigned to a specific node or group of nodes. Partitioning (sharding) distributes data across multiple nodes for scalability.

P99 Latency (99th Percentile) Latency value where 99% of requests are faster. More indicative of user experience than average latency, as tail latencies affect actual users.

Predictive Placement Data placement strategy that anticipates future demand patterns and pre-migrates data before spikes occur. Core concept of Vector Sharding (Chapter 11).

Primary Key Unique identifier for a database record. Often used as the shard key in distributed databases.

Q

Quorum Minimum number of nodes that must agree for an operation to succeed. Typical quorum: majority (e.g., 2 of 3, 3 of 5). Balances consistency and availability.

Query Planner Component of a database that determines the optimal way to execute a query (which indexes to use, join order, etc.). In distributed databases, also determines which shards to query.

R

Rack Awareness Configuration where the system knows which physical rack each node is on, allowing it to place replicas on different racks for fault tolerance.

Read Replica Copy of data used only for serving reads, not writes. Can be asynchronously updated (eventual consistency) for lower overhead than synchronous replication.

Read Your Writes Consistency Guarantee that after you write data, your subsequent reads will see that write. May not guarantee others see your writes immediately.

Region Geographic location containing one or more datacenters (availability zones). Cloud providers have regions in different continents. Cross-region latency: 50-200ms.

Replication Maintaining copies of data on multiple nodes for durability and availability. Key trade-off: more replicas = higher availability but higher cost and write amplification.

Replication Factor Number of copies of data maintained. Factor of 3 means data exists on 3 nodes. Higher factor improves durability and read scalability but increases write cost.

Replication Lag Time delay between a write occurring on the primary and appearing on replicas. In eventual consistency, lag can be seconds to minutes.

S

Serverless Execution model where the cloud provider manages infrastructure and charges per-request rather than per-server. Examples: AWS Lambda, Cloudflare Workers.

Shard (Partition) See Partition.

Shard Key Attribute used to determine which shard a piece of data belongs to. Critical design decision—poor shard keys cause hot spots. Example: user_id, geography.

SLA (Service Level Agreement) Contract specifying minimum service levels (availability, latency, etc.). Violations may incur penalties. Example: 99.9% uptime SLA.

Split-Brain Failure scenario where network partition causes multiple nodes to each believe they’re the leader, potentially leading to data divergence. Prevented by quorum mechanisms.

Stale Read Read operation that returns outdated data because it’s served from a replica that hasn’t received recent writes yet. Trade-off for lower latency in eventual consistency systems.

Strong Consistency General term for consistency models (linearizability, sequential consistency) that guarantee recent writes are visible. Opposed to eventual consistency.

T

Tail Latency Latency experienced by the slowest requests (P95, P99, P99.9). Often 3-10× higher than median due to queuing, retries, and stragglers.

Telemetry Automated collection and transmission of measurements from distributed systems. Foundation of observable and self-managing systems.

Throughput Amount of work a system can handle per unit time. Measured in queries per second (QPS), transactions per second (TPS), or requests per second (RPS).

Tiering Strategy of placing data in different storage layers (tiers) based on access patterns: hot (SSD), warm (HDD), cold (object storage), archive (glacier).

Topology Physical and logical arrangement of nodes in a distributed system. Affects latency, fault tolerance, and operational complexity.

V

Vector Sharding Predictive data placement approach modeling data distribution as multidimensional vectors and using learned patterns to anticipate optimal placement. Original contribution introduced in Chapter 11.

Versioning Maintaining multiple versions of data, either for conflict resolution (vector clocks) or time-travel queries (temporal databases).

W

WAL (Write-Ahead Log) Log of all writes before they’re applied to the database. Enables durability and replication. Also called redo log or commit log.

Warm Data Data accessed occasionally (e.g., weekly or monthly). Optimal storage: mid-tier options like HDD or infrequent-access object storage.

Write Amplification Phenomenon where a single logical write causes multiple physical writes due to replication, compaction, or journaling. Major cost factor in distributed systems. Explored in Chapter 5.

Write-Ahead Log See WAL.

Z

Zero-Copy Technique where data is transferred without copying between memory regions, reducing CPU overhead and latency. Used in high-performance networking.

Zone See Availability Zone.

Acronyms Reference

AZ    - Availability Zone
CAP   - Consistency, Availability, Partition tolerance
CDN   - Content Delivery Network
CRUD  - Create, Read, Update, Delete
DB    - Database
EC2   - Elastic Compute Cloud (AWS)
GDPR  - General Data Protection Regulation
HIPAA - Health Insurance Portability and Accountability Act
IDP   - Intelligent Data Plane
IOPS  - Input/Output Operations Per Second
LSM   - Log-Structured Merge
MVCC  - Multi-Version Concurrency Control
PACELC - Partition-Availability-Consistency, Else-Latency-Consistency
QPS   - Queries Per Second
RAFT  - Replication algorithm (not an acronym)
RPO   - Recovery Point Objective
RPS   - Requests Per Second
RTO   - Recovery Time Objective
RTT   - Round-Trip Time
SLA   - Service Level Agreement
SSD   - Solid State Drive
TCP   - Transmission Control Protocol
TLS   - Transport Layer Security
TPS   - Transactions Per Second
TTL   - Time To Live
VM    - Virtual Machine
WAL   - Write-Ahead Log

Usage Notes

For Engineers: These definitions prioritize practical understanding over academic precision. When implementing systems, always consult specific documentation for your chosen technologies.

For Executives: These terms represent key decision points in distributed system architecture. Understanding the trade-offs (cost vs. latency, consistency vs. availability) is more important than technical details.

For Further Reading: Each term connects to detailed discussions in the main chapters. Chapter references are provided where particularly relevant.

This glossary is a living document. Distributed systems terminology evolves as the field advances.

It Should Just Work®

Discussion about this post