SYSTEM DESIGN - Question

System Design

Harvard Youtube course

https://www.youtube.com/watch?v=-W9F__D3oY4

1. Trade off

Performance vs scalability
Latency vs throughput
Availability vs consistency

1.1. Performance vs scalability

scalable = added resources ~ increased performance.

  • performance problem = slow for a single user.

  • scalability problem = fast for a single user / slow under heavy load.

1.2. Latency vs throughput

  • Latency = time to perform some actions

  • Throughput = requests / time

Aim = maximal throughput with acceptable latency.

1.3. Availability vs consistency

  • Consistency = read receives [1] the lastet write or [2] an error

  • Availability = [1] request always receives a response, [2] no guarantee of consistency

  • Partition Tolerance = [1] The system continues to operate [2] despite partitioning due to network failures

1.3.1 Consistency patterns

1.3.2 Availability patterns

2 CONS: [1] added hardware & complexity, [2] potential for loss of data because not in time

1.3.3 Availability in parallel vs in sequence

multiple components prone to failure

2. DNS

  • hierarchical & caching

EX = CloudFlare and Route 53

CONS = [1] delay (mitigated by caching), [2] complex (managed by ISP or government), [3] DDoS attack

3. CDN

  • static files = HTML/CSS/JS, photos

  • improve performance = [1] Users receive content at closer data centers, [2] It is other people's server

CONS: [1] cost, [2] content stale

4. Load Balancer

[1] receive request from client [2] pick a worker to forward request [3] waiting for response [4] forward response to client

EX = HAProxy

5. Reverse proxy

5.1 Load balancer vs reverse proxy

load balancer = multiple servers

Reverse proxies = single server

6. Application layer

CONS: complexity

7. Database

7.1 RDBMS

PROS of federation & sharding = [1] less read and write traffic [2] more cache hits [3] reduce index size

7.2 NoSQL

8. Cache

cache update strategy

8.1 MongoDB vs Redis vs Memcached

http://stackoverflow.com/questions/7888880/what-is-redis-and-what-do-i-use-it-for

When to use Redis: If you can map a case to Redis and discover you aren't at risk of running out of RAM.

  • MongoDB = key-value + disk-based store.

  • Redis = key-value + built-in persistence.

  • Memcached = Redis - built-in persistence.

Persistence to disk = you can use Redis as a real DB instead of a volatile cache. The data won't disappear when you restart as with Memcached.

9. Asynchronism

  • Reduce request times for expensive operations

  • The user is not blocked, and the job is processed in the background.

Back pressure: limiting the queue size

10. Communication

10.1 HTTP

10.2 REST & RPC

RPC

REST

request-response

request-response

API

private

public

syntax

customized

general

implementation

tight coupling

minimizes coupling

stateless and cacheable

PROS

Better performance, customized syntx

general purpose, less couple

CONS

difficult to debug, tightly coupled

few verbs only, difficult to name for nested hierarchies

10.3 TCP & UDP

TCP/IP

UDP

connection-oriented

connectionless

order

receive in order

order not guarantee

IP, hankshake = ack + checksum

broadcast to subnet (in DHCP since no IP yet)

no correct response

[1] resend the packets [2] timeouts

X

data loss vs latency

intact data, less time critical

data may lose, real-time

example

web servers, database, and SSH.

video chat

11. Security

Appendix

A1. power of 2

A2. metrics

A3. per seconds

2.5M seconds per month

400 requests per second = 1 billion requests per month

A4. varchar & char

Last updated

Was this helpful?