CP = [1] atomic read & write [2] response from the partitioned node -> timeout error.
AP = [1] eventual consistency [2] response the most recent data, but not the latest
Weak consistency = [1] data may lose [2] video chat
Eventual consistency = [1] asynchronous data [2] email
Strong consistency = [1] synchronous data [2] RDBMSes
Distribute methods:
1. Random
2. Round Robin
3. Least loaded
4. Session/cookies/request params
5. Layer 4 = ip (faster)
6. Layer 7 = content (direct video traffic to video servers)
a web server = [1] centralizes internal services [2] unified public interfaces
PROS:
1. Increased security (Hide backend information & IPs)
2. Increased scalability (Clients only see the reverse proxy's IP)
3. SSL termination
4. Caching
5. Serve Static content
CONS:
1. complexity
2. SPOF
Microservices = independently deployable, small services
Service Discovery = [1] Consul, Zookeeper [2] find services by keeping track of registered names
ACID RDBMS transactions.
1. Atomicity = transaction is all or nothing
2. Consistency = transaction brings DB from one state to another
3. Isolation = exec transactions concurrently or serially is the same
4. Durability = Once committed, it will remain so
Scale RDBMS
1. master-slave (replicate writes to more read slaves) (CONS: [1] more logics [2] CONS Replication)
2. master-master (CONS: [1] more logics, [2] CONS Replication, [3] LBs, [4] Sync & conflict resolution )
3. federation (split by functions) (CONS: complex join & logic)
4. sharding (split by word order) (CONS: complex join & logic)
5. denormalization (Redundant copies in multi-tables to avoid expensive joins) (CONS: [1] sync duplication, [2] add write loading)
6. SQL tuning (benchmark and profile)
CONS replication
1. loss data before ready
2. more slaves more replication lag
3. replaying writing stuck in slaves
4. complexity
BASE
1. Basically available = guarantees availability.
2. Soft state = DB state may change over time, even without input.
3. Eventual consistency = Async
Types
1. Key-value store = O(1) by word order
2. Document store = JSON, MongoDB (collections of key-value)
3. Wide column store = ColumnFamily (Bigtable, HBase, Cassandra)
4. Graph database = complex relationships, such as a social network
Reasons for SQL:
1. Structured data
2. Strict schema
3. complex joins
4. Transactions
Reasons for NoSQL:
1. Semi-structured data
2. flexible schema
3. big data
4. Very data intensive / high throughput workload
Well-suited for NoSQL:
1. log data
2. Temporary data, such as a shopping cart
3. Frequently accessed ('hot') tables
[1] lookup cache [2] found & return [3] goto DB, save in cache, and return
CONS cache: [1] consistency between cache & DB [2] expiration [3] add logics
Cache-aside = cache<-clien->DB
STEP: [1] look cache [2] miss & read DB [3] save to cache and return
CONS: [1] three trips in cache miss [2] stale data [3] replace new node without memory
Write-through = client->cache->db
STEP: [1] read & write to cache [2] sync to DB [3] return
CONS: [1] slow write fast read [2] replace new node without memory
Write-behind = client->cache~~>db
STEP: [1] read & write to cache [2] async to DB & return
CONS: [1] loss data before ready [2] complex
Refresh-ahead = automatically refresh
CONS: [1] Difficult to predict which to refresh
Message queues = RabbitMQ
Task queues = Celery
encoding and transporting data between a client and a server
request/response protocol
verb + resource
Level 7 on top of TCP/IP
1. XSS and SQL injection = Sanitize all user inputs
2. SQL injection = parameterized queries
3. least privilege.
Power Exact Value Approx Value Bytes
---------------------------------------------------------------
8 256
10 1024 1 thousand 1 KB
16 65,536 64 KB
20 1,048,576 1 million 1 MB
30 1,073,741,824 1 billion 1 GB
40 1,099,511,627,776 1 trillion 1 TB
Read sequentially from disk at 40 MB/s (100)
Read sequentially from 1 Gbps Ethernet at 100 MB/s (40)
Read sequentially from SSD at 1 GB/s (4)
Read sequentially from main memory at 4 GB/s (1)
Reading 1 MB sequentially from memory takes about 250 us
SSD 4x
Disk 80x
6-7 world-wide round trips per second
2,000 round trips per second within a data center
L1 cache reference 0.5 ns
L2 cache reference 7 ns 14x L1 cache
Compress 1K bytes with Zippy 10,000 ns 10 us
Round trip within same datacenter 500,000 ns 500 us
Disk seek 10,000,000 ns 10,000 us 10 ms 20x datacenter roundtrip
Send packet CA->Netherlands->CA 150,000,000 ns 150,000 us 150 ms