A TPM’s Guide to Availability vs Consistency
In this article, we jump right into availability and consistency, terms you would no doubt come across in interviews as well as solving real-world problems, two concepts often at odds with one and other in the world of scaling micro-services, Availability and Consistency.
Let’s kick off a series on diving deep into various systems design topics, from the perspective of a TPM. In this article, we jump right into availability and consistency, terms you would no doubt come across in interviews as well as solving real-world problems, two concepts often at odds with one and other in the world of scaling micro-services. Let’s start off by defining both of those terms separately.
Availability
Availability refers to a system’s capacity to remain operational under duress, such as partial failures and load fluctuations. Maintaining High-Availability An available system guarantees that it responds to requests (even if some components are failing), aiming for minimal downtime. In technical terms, availability is the probability that a system is operational at a given time. Mathematically, availability can be represented as:
Availability = Uptime ÷ (Uptime + downtime)
Availability is also a component of the CAP Theorem whereby a distributed system can concurrently only satisfy two out of three guarantees, 1) Consistency, 2) Availability, and 3) Partition Tolerance.
Going back to availability, a highly-available system ensures that every request gets a response, even if the response may not reflect the latest state, thereby sacrificing strict consistency in some cases.
Let’s look at examples of tools and services that support availability in micro-services:
- Load Balancers & Redundancy. Load Balancers (LB) distributes requests across multiple service replicas. If one replica becomes unresponsive, the LB redirects traffic to a healthy service.
- Circuit Breaker Patterns. If a micro-service is experiencing latency or is otherwise unresponsive, a circuit breaker prevents further calls to the failing service and instead returns a default fallback response, maintaining availability for the overall system.
- Database Eventual Consistency. Databases may eventually become consistent to ensure availability during network partitions. Imagine a shopping cart service may allow users to add items to their cart even if the inventory database is delayed.
- Partition-Tolerant Systems. During downtime, services may rely on cached data to continue to be responsive to user requests. A good example of that is a content delivery network (CDN) serving static files.
High Availability Systems: Netflix or Amazon’s shopping cart (prioritizing availability to maintain user experience).
Consistency
Counter to Availability, Consistency ensures a system behaves predictably, data is delivered up-to-date and synchronized to all users. That is, all users/nodes see the same data at the same time after a transaction completes. This avoids stale or conflicting data. Consistency ensures atomic writes (either all parts of write operation succeeds or none, preventing partial or corrupted data) and synchronized reads (any read reflects the most recent write across all nodes in the system).
As another member of the CAP theorem mentioned above, a consistent system guarantees: 1. All read operations immediately reflect the results of the most recent write operation. 2. Atomicity is maintained, ensuring that all nodes agree on the same data state.
We implement consistency can be accomplished through:
- Strong Consistency. Guarantees that any read operation reflects the latest write operation, even if that requires waiting for all replicas to synchronize.
- Eventual Consistency. Allows temporary inconsistencies, but guarantees that all nodes will converge to the same state eventually.
- Causal Consistency. Ensures consistency for operations that are causally related, but does not guarantee order for unrelated operations.Here are some examples consistency in the real-world:
- Financial Systems (Strong Consistency). When you bank you want to ensure your account balances are always correct, especially within a distributed transactional network. Upon transferring money between accounts, you want both accounts to appear synchronized.
- Social Media App (Eventual Consistency). In a distributed social media network, you want posts to appear on a user’s timeline before data fully propagates to other nodes. Here we are prioritizing availability over strict consistency.
- Database Replication. Synchronous replication across databases ensure consistency by writing updates to all replicates before marking the transaction complete.
Key Tradeoffs
According to the CAP theorem, distributed systems must choose between availability and consistency during network partitions. Highly available systems sacrifice immediate data correctness, while strongly consistent systems may delay responses to ensure data synchronization.
Tools
According to The Geeky Minds, the following tools help achieve high availability in micro-services, 1) container orchestration platforms (such as Kubernetes), 2) Load Balancing tools such as HAProxy provide sophisticated algorithms for distributing workloads, and 3) Monitoring/Logging tools such as Prometheus and ELK Stack, providing the ability to monitor and log system activities.