# SYSTEM DESIGN - EX2

## 1. Mint.com

### 1.1. Requirement & scope

```
// Use cases
User: connects to a financial account
Service: extracts transactions from the account
Service: recommends a budget
Service: high availability

// Out of scope
Service performs additional logging and analytics
```

```
// Assumptions
Not even traffic
Update for active accounts (30 days)
Asynchronous budget notifications

10M users
30M financial accounts
100M budget items = 10 budget categories per user
50K sellers (determine transaction category)
5b transactions / month
500M read / month
10:1 write to read ratio (write-heavy)
```

### 1.2. Calculation

| 8B user\_id | 5B created\_at | 32B seller | 5B amount |
| ----------- | -------------- | ---------- | --------- |

Total size = \~50B

```
250G transaction content per month = 50B * 5b
9 TB content in 3 years = 250G * 36
2,000 transactions / second
200 read / second
```

### 1.3. Design

| ![](/files/-Lq83UMoi6h1NxQwfA9X) | ![](/files/-Lq83UMrw1Mt3BG3WInp) |
| -------------------------------- | -------------------------------- |

### 1.4. Core

#### 1.4.1. Use case: User connects to a financial account

```
10M User infor in RDBMS

Client -> Web Server
1. -> Accounts API server
        (1) update newly entered account info
```

```
id int NOT NULL AUTO_INCREMENT
created_at datetime NOT NULL
modified_at datetime NOT NULL

user_id int NOT NULL
PRIMARY KEY(id)
FOREIGN KEY(user_id) REFERENCES users(id)

index id, user_id, created_at // O(logn) instead of O(n)
```

```
$ curl --request POST 
    --data '{ "user_id": "foo", "account_url": "bar", "account_login": "baz", "account_password": "qux" }'
    https://mint.com/api/v1/account
```

#### 1.4.2. Use case: Service extracts transactions from the account

```
user first links the account / manually refreshes / daily automatic for active users

Client -> Web Server -> Accounts API server -> Queue (Amazon SQS or RabbitMQ) + async
1. ->  Transaction Extraction Service
        (1) pull from queue & extracts transactions
        (2) save raw log files in the Object Store
2. -> Category Service
        (1) categorize each transaction
        (2) update SQL transactions table
3. -> Budget Service
        (1) calculate aggregate monthly spending by category
        (2) update SQL monthly_spending table
        (2) -> Notification Service: exceeded budget -> send notification (Queue)
```

```
// transaction table
id int NOT NULL AUTO_INCREMENT
created_at datetime NOT NULL
seller varchar(32) NOT NULL
amount decimal NOT NULL
user_id int NOT NULL

PRIMARY KEY(id)
FOREIGN KEY(user_id) REFERENCES users(id)
```

```
// monthly_spending table
id int NOT NULL AUTO_INCREMENT
month_year date NOT NULL
category varchar(32)
amount decimal NOT NULL
user_id int NOT NULL

PRIMARY KEY(id)
FOREIGN KEY(user_id) REFERENCES users(id)
```

**Category service**

seller-to-category dictionary = 50,000 sellers (each < 255 bytes) = 12MB

#### 1.4.3. Use case: Service recommends a budget

```
budget template = based on income tiers
customed budget = budget_overrides table
[1] SOLUTION-1: SQL queries on transactions table -> monthly_spending aggregate table (less rows)
[2] SOLUTION-2: MapReduce [(user, category, month) amount]
[3] analyses on transaction files could significantly reduce the load on the database.
```

### 1.5. Scale

#### 1.5.1 Recent transactions in Cache

```
Client -> Web Server -> Read API 
    -> Object Store (S3, CDN) static content
    -> Cache content or SQL
```

```
[1] Analytics database = monthly_spending aggregate table SQL DB -> Amazon Redshift or Google BigQuery.
[2] Keep monthly sum data & all content in S3

[3] 2,000 read requests /s => Memory Cache keep popular content + SQL read replica for cache misses
[4] 200 writes / second = tough for single SQL Write Master-Slave => SQL scaling pattern
```

## 2. Data structures for social network

### 2.1. Requirement & scope

```
// Use cases
User: searches for someone & sees the shortest path to it.
Service: high availability

// Assumptions
Not even traffic
Some searches are popular.
Graph data won't fit on single machine; unweighted Graph
```

### 2.2. Calculation

```
100M users (50 friends / user)
5b relationships = 100M * 50 friends
1b friend searches / month
400 search requests per second
```

### 2.3. Design

| ![](/files/-Lq83UMxB-tIioxtAupK) | ![](/files/-Lq83UMzKq4-Cz5w4HOq) |
| -------------------------------- | -------------------------------- |

### 2.4. Core

#### 2.4.1 User searches for someone & the shortest path.

BFS: solve unweighted shortest path

shard users @ user DB

```
Client -> Web Server -> Search API server
    -> User Graph Service
        -> Lookup Service = 
            (1) find the current user's info @ Person Server
            (2) current user's friend_ids
            (3) BFS: (source = current user), (adjacent_node = current user's friend_ids)
                loop adjacent_node to get the target
```

### 2.5. Scale

400 read / second = \[1] person data @ Cache \[2] relation @ Cache \[3] BFS in Cache

```
// BFS improvement
(1) offline process & save partial BFS traversals in NoSQL
(2) reduce machine jumps = Shard Person Servers by geo-location
(3) 2 BFS searches at the same time => source & target => merge paths
(4) BFS search from people with many friends => likely to reduce degrees of separation
```

A search limit => asking the user if they want to continue => search or stop

Graph DB: Neo4j or GraphQL language

## 3. key-value cache to save the results of the most recent web server queries

### 3.1. Requirement & scope

```
// use cases
User: request resulting in a cache hit
User: request resulting in a cache miss
Service: high availability

// Assumptions
Not even traffic
Fast lookups in Cache
Limited cache memory
Determine keep/remove
Determine expire/refresh
```

### 3.2. Calculation

```
10M users
10b query / month

2.7TB cache / month = 270B * 10b
4000 query / second
```

| Key       | Value     |              |
| --------- | --------- | ------------ |
| 50B query | 20B title | 200B snippet |

Total size = \~270B

### 3.3. Design

| ![](/files/-Lq83UN0E6H1PWjoRM13) | ![](/files/-Lq83UN26gSbsVdjyj-J) |
| -------------------------------- | -------------------------------- |

### 3.4. Core

#### 3.4.1 User request -> cache hit

reduce read latency & avoid overloading DB

least recently used (LRU) to expire older entries.

```
Client -> Web Server -> Query API server
    -> PRBFN -> boolean fingerprint -> chech Cache for matching
        -> hit -> update entry's position to the front(LRU) -> return
        -> miss
            -> Reverse Index Service query = return top ones
            -> Document Service = return titles and snippets
            -> update cache & place at front
```

```
Cache implementation
    -> doubly-linked list
    -> added to the head
    -> expired from the tail
    -> hash table = fast lookups
```

When to update the cache (1) contents change (2) added / removed page (3) rank changes => TTL

### 3.5. Scale

```
Expanding Cache to many machines (1) handle heavy request (2) great memory needed
    -> Each machine = own cach => low cache hit rate
    -> Each machine = copy cach => efficient memory usage
    -> sharding = use hashing to find cache location
```


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://huang-jason.gitbook.io/complat-software-training-101/system-design-ex2.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
