SYSTEM DESIGN - EX3
1. Amazon's sales rank by category feature
1.1. Requirement & scope
// Use cases
User: views lwmppbc
Service: calculates lwmppbc (last week's most popular products by category)
Service: high availability
// Out of scope
other e-commerce components
// Assumptions
Not even traffic
Items can be in multiple categories
Items cannot change categories
no subcategories
updated hourly / frequently update for more popular products
1.2. Calculation
10M products
1K categories
1b transactions / month
100b read / month
100:1 read to write ratio
40GB transaction / month = 1b * 40B
1.44TB transaction / 3yrs
400 transactions / second
40,000 read / second
5B created_at
8B product_id
4B categry_id
8B seller_id
8B buyer_id
4B quantity
5B total_price
total size ~ 40B
1.3. Design
1.4. Core
1.4.1 Use case: Service calculates lwmppbc
save Sales API server log files on S3
// sample log
timestamp product_id category_id qty total_price seller_id buyer_id
t1 product1 category1 2 20.00 1 1
t2 product1 category2 2 20.00 2 2
t2 product1 category2 1 10.00 2 3
Sales Rank Service -> MapReduce @ sale log files -> aggregate sale_rank table to SQL DB
-> MapReduce step 1 (category, product_id), sum(quantity)
-> MapReduce step 2 distributed sort
// sorted list
(category1, 1), product4
(category1, 2), product1
(category1, 3), product2
(category2, 3), product1
(category2, 7), product3
// sale_rank table
id int NOT NULL AUTO_INCREMENT
category_id int NOT NULL
product_id int NOT NULL
total_sold int NOT NULL
PRIMARY KEY(id)
FOREIGN KEY(category_id) REFERENCES Categories(id)
FOREIGN KEY(product_id) REFERENCES Products(id)
// index = O(N) -> O(logN)
1.4.2 Use case: User views lwmppbc
client -> Web Server -> Read API server -> reads from the SQL Database sales_rank table
1.5. Scale
Analytics DB = Amazon Redshift or Google BigQuery
-> limited time period in DB -> long term storage in S3
40K read / second
-> popular content by cache
-> SQL Read Replicas + DB for cache miss
-> SQL scaling
400 write / scond
-> SQL scaling
2. Scales to millions of users on AWS
2.1. Requirement & scope
// Use cases
User: read or write request
Service: processing, stores user data, then returns results
Service: scale & high availability
// Assumptions
Not even traffic
Users+ -> Users++ -> Users+++
2.2. Calculation
10M users
1b writes / month
100b reads / month
100:1 read to write ratio
1KB content per write
1 TB content / month = 1KB * 1b write
36 TB content / 3 years
400 writes / second
40,000 reads / second
2.3. Design
2.4. Core
2.4.1 User makes a read or write request
(1) Step 1
// Simple starting point -> Vertical scaling when needed -> Monitor bottleneck
Web server on EC2 + MySQL DB (need for relational)
Monitor = CPU, memory, IO, network, etc // CloudWatch, top, nagios, statsd, graphite, etc
// Assign a public static IP
Elastic IPs => a public endpoint whose IP doesn't change on reboot
Failover => point domain to a new IP
// Use a DNS
Route 53 => map domain to the instance's public IP.
// Secure the web server
Open only: HTTP:80 / HTTPS:443 / SSH:22 + whitelisted ips
Prevent the web server from initiating outbound connections
(2) Step 2
MySQL DB takes more and more memory and CPU resources; no disk space.
// spread // disadvantage (cost, complexity, security)
1. use S3 for static files (Server side encryption) (JS/CSS/images/videos/user files)
2. move MySQL to other machine (Amazon Relational Database Service (RDS)) (Multiple availability zones)
// Security
Virtual Private Cloud
-> public subnet for single Web Server -> send & receive from internet
-> private subnet for everything else -> preventing outside access
-> Only open ports from whitelisted IPs for each component
(3) Step 3
Slow web server
// Horizontal Scaling = increasing loads + single points of failure
Load Balancer = Amazon's ELB or HAProxy
1. Terminate SSL = (1) reduce backend server load (2) simple certificate admin
2. Multiple Web Servers across multiple zones
3. Multiple MySQL instances in Master-Slave Failover mode across multiple availability zones to improve redundancy
// Separate Web Servers from App Servers(Read / Write API)
// CDN: static (and some dynamic) content (CloudFront)
(4) Step 4
read-heavy (100:1 with writes) & poor DB performance
// spread read load
1. Cache = (1) configure MySQL cache if sufficient (2) Move frequent accessed to Cache (Elasticache)
2. MySQL read replica (Load Balancers in front of MySQL Read Replicas)
(5) Step 5
heavy load in office time => autoscaling
// autoscaling DevOps: Chef, Puppet, Ansible
// monitoring
Host level = Review a single EC2 instance
Aggregate level = Review load balancer stats
Log analysis = CloudWatch, CloudTrail
External site performance = Pingdom or New Relic
Handle notifications and incidents = PagerDuty
Error Reporting = Sentry
// AWS Autoscaling in multiple zones
one group for each Web Server
one group for each Application Server type
Set a min and max number of instances
Trigger to scale up and down through CloudWatch = CPU load / Latency / Network traffic
(6) Step 6
continue
// MySQL
1. storing a limited time period of data in the database
2. storing the rest in a data warehouse such as Redshift(easy 1TB/month) (async)
3. Cache = 40K read/sec + uneven traffic
4. SQL scaling = 400 write / sec
// high read/write to NoSQL
// computation in async
// example = photo service
Client upload -> App Server -> Queue (SQS) -> Worker Service on EC2
-> Creates thumbnail / Updates DB / Stores in S3
Last updated
Was this helpful?