How Distributed Systems Power Netflix, Amazon, and Global Apps

A distributed system is a network of independent computers (nodes) working together as one powerful system. These systems help big companies like Netflix, Amazon, Google, and Facebook to be scalable, reliable, and fast.
Key Principles of Distributed Systems:
Decentralization: Work is shared across nodes to avoid single points of failure.
Scalability: Easily grow by adding more nodes or splitting tasks.
Fault Tolerance: The system keeps running even if some nodes or network parts fail.
Consistency: Nodes coordinate to agree on the data state; consistency can vary by application needs (e.g., strong, eventual).
Performance Optimization: The system balances speed and resource use for quick, reliable answers.
CAP theorem is a fundamental concept in distributed system design that guides trade-offs between core system properties during network partitions. It states that any distributed system can guarantee at most two out of the following three properties at the same time when a network failure occurs:
Consistency (C): All nodes always return the most recent and correct data after an update.
Availability (A): Every request receives a non-error response, even if data is out-of-date or not consistent across nodes.
Partition Tolerance (P): The system continues to operate even if nodes cannot reliably communicate due to network failures.

Why It Matters
- Network partitions (communication breakdowns) are inevitable in distributed environments. When this happens, designers must choose which two of the three properties to guarantee.
CAP Properties in the world of AWS
Partition Tolerance Is Essential
All distributed AWS services are designed to handle network failures, as these can happen across servers, regions, and continents.
Consistency + Partition Tolerance (CP)
Services that choose CP focus on data accuracy, even if it means being temporarily unavailable.
Amazon RDS (Multi-AZ): Uses synchronous replication between zones to keep data consistent. If a network issue occurs, writes may pause to maintain accuracy, which is crucial for financial transactions.
Amazon Elastic Block Store (EBS): Stops reads/writes during network issues to prevent data corruption.
Availability + Partition Tolerance (AP)
Services that choose AP prioritize staying online, allowing some temporary data inconsistency for user convenience or scalability.
Amazon DynamoDB (default mode): Continues processing reads/writes during network issues, using eventual consistency. Ideal for shopping carts or IoT logs where immediate accuracy isn't critical.
Amazon S3: Keeps files accessible globally; updates might take time to spread, but files remain available.
Adjusting AWS Services for CAP Tradeoffs
Many AWS services let engineers adjust settings based on CAP needs:
DynamoDB offers “strongly consistent reads” if requested. For these operations, it reduces availability during network issues, acting as CP for those specific queries.
Amazon RDS mode selection: For read replicas (eventual consistency), RDS can act as AP; for primary databases with synchronous replication, it acts as CP.
