SYSTEM DESIGN - Question
System Design
Harvard Youtube course
https://www.youtube.com/watch?v=-W9F__D3oY4
1. Trade off
Performance vs scalability
Latency vs throughput
Availability vs consistency1.1. Performance vs scalability
scalable = added resources ~ increased performance.
performance problem = slow for a single user.
scalability problem = fast for a single user / slow under heavy load.
1.2. Latency vs throughput
Latency = time to perform some actions
Throughput = requests / time
Aim = maximal throughput with acceptable latency.
1.3. Availability vs consistency
Consistency = read receives [1] the lastet write or [2] an error
Availability = [1] request always receives a response, [2] no guarantee of consistency
Partition Tolerance = [1] The system continues to operate [2] despite partitioning due to network failures
1.3.1 Consistency patterns
1.3.2 Availability patterns
2 CONS: [1] added hardware & complexity, [2] potential for loss of data because not in time
1.3.3 Availability in parallel vs in sequence
multiple components prone to failure
2. DNS
hierarchical & caching
EX = CloudFlare and Route 53
CONS = [1] delay (mitigated by caching), [2] complex (managed by ISP or government), [3] DDoS attack
3. CDN
static files = HTML/CSS/JS, photos
improve performance = [1] Users receive content at closer data centers, [2] It is other people's server
CONS: [1] cost, [2] content stale
4. Load Balancer
[1] receive request from client [2] pick a worker to forward request [3] waiting for response [4] forward response to client
EX = HAProxy
5. Reverse proxy
5.1 Load balancer vs reverse proxy
load balancer = multiple servers
Reverse proxies = single server
6. Application layer
CONS: complexity
7. Database
7.1 RDBMS
PROS of federation & sharding = [1] less read and write traffic [2] more cache hits [3] reduce index size
7.2 NoSQL
8. Cache
cache update strategy
8.1 MongoDB vs Redis vs Memcached
http://stackoverflow.com/questions/7888880/what-is-redis-and-what-do-i-use-it-for
When to use Redis: If you can map a case to Redis and discover you aren't at risk of running out of RAM.
MongoDB =
key-value+disk-based store.Redis =
key-value+built-in persistence.Memcached = Redis -
built-in persistence.
Persistence to disk = you can use Redis as a real DB instead of a volatile cache. The data won't disappear when you restart as with Memcached.
9. Asynchronism
Reduce request times for expensive operations
The user is not blocked, and the job is processed in the background.
Back pressure: limiting the queue size
10. Communication
10.1 HTTP
10.2 REST & RPC
RPC
REST
request-response
request-response
API
private
public
syntax
customized
general
implementation
tight coupling
minimizes coupling
stateless and cacheable
PROS
Better performance, customized syntx
general purpose, less couple
CONS
difficult to debug, tightly coupled
few verbs only, difficult to name for nested hierarchies
10.3 TCP & UDP
TCP/IP
UDP
connection-oriented
connectionless
order
receive in order
order not guarantee
IP, hankshake = ack + checksum
broadcast to subnet (in DHCP since no IP yet)
no correct response
[1] resend the packets [2] timeouts
X
data loss vs latency
intact data, less time critical
data may lose, real-time
example
web servers, database, and SSH.
video chat
11. Security
Appendix
A1. power of 2
A2. metrics
A3. per seconds
2.5M seconds per month
400 requests per second = 1 billion requests per month
A4. varchar & char
Last updated
Was this helpful?