System Design

Posted on 2023-02-28 Edited on 2023-03-02 In Programming , Design

Knowledge about system design.

Steps

Requirements engineering
- Functional requirements
- Non-functional requirements
Capacity estimation
- Requests (Write/Read, RPS)
- Bandwidth: (KB/sec)
- Storage size: (TB)
Data model
- entities, attribute, relation
API design
- functions, arguments, response
System design
- System components
  - Client application
  - Service: provides a clear function
    - Multiple instances
    - Load Balancer
  - Relational database (SQL)
    - Federation (functional partitioning)
  - Message queue
  - File storage
  - Cache
- Techniques
  - Pull/push
  - Chunks
Design discussion

Requirements engineering

Initial analysis

Read- or write- heavy?
Single server or distributed service?
Availability or data consistency?
What is the actual scale of the system

Functional requirements

What the system is supposed to do? To be specific, from user’s perspective. Define the

Core features
Support features

Non-functional requirements

How the system is expected to work? Following are common considered non-functional requirements

Availability: how long the system is up and running per year
- e.g. 99.999% means down for maximum 5.15 mins a year
- need to make it reliability, redundancy, fault tolerance
Data consistency (lineralizability): it appears to be the same in all corresponding nodes at the same time

Scalability: an ability of a system to handle a fast growing user base and temporary load spikes seamlessly

~	Vertical scalability	Horizontal scalability
Meaning	add more hardware resources to single machine	scale by adding more servers to the cluster
Advantages	- Fast inter-process communication - Data consistency	- Higher availability - Scales linearly
Disadvantages	- Single point of failure - Economical limit of scalability	- Data inconsistency - Complex architecture

Latency: time to wait for data requested and delivered
- experienced latency = network latency + system latency
- To reduce the latency
  - High query performance of database and data model
  - Using caching
Compartibility: the ability of a system to operate seamlessly with other software, hardware and system

Capacity estimation

Given the numbers of

Daily Active Users (DAU) or Active users per month (MAU)
- Key metric to describe scale
Peak Active Users
- Triggered by?
- Leads to x times of DAU?
Interactions per users
- How many times each user interact with it
- Read-write ration for core features
Request size
- Size of payload of these requests
Replication factor
- Database is replicate data to multiple nodes
- Increase read scalability
- Creates redundancy
- Takes up more storage capacity

We could calculate the metrics for

Throughput, RPS
Bandwidth, Kbps, Mbps
- Usually: 80 Kbps for VoIP calling, 150 Kbps for screen sharing, 0.5 Mbps for live streaming, 3 Mbps for 720p video, 25 Mbps for 4k HD video
Storage: GB, TB

Common used rough numbers

Seconds per day: 10^5
- Requests per day -> requests per second: /10^5
Days per year: 4 * 10^2
- Storage per second -> storage in 5 years: *(2 * 10^3)
Thousands, KB: 10^3
Million, MB: 10^6
Billion, GB: 10^9
Trillion, TB: 10^12
Quadrillion, PB: 10^15
Average images: 200 KB
Standard videos for streaming: 50 MB per min

Data model

Entities: each entity needs a separate table
Attributes: what attributes each entity requires
Connections: 1-to-1, 1-to-many

e.g.

User
- UserId
- ResourceId
Resources
- Key
- Resource content
- Due date
Pre-defined assets
- Key range
- Whether in Use

Choose database based on data model built. Consider the following

ACID needed?
Scalability is a concern?
Data query needed?

API Design

Create a binding contract between clients and servers.

Function signature from requirements
Arguments
Response body

System components

Database

~	Relationship database	Non-relationship database
Support operations	create, read, update, delete (CRUD)	same
Features	- Support complex query - Transactions simplify error handling - Block until all nodes committed	Store key-value pairs
Retrieving methods	SQL query	retrieve by keys: starting with a certain letter, within a range of numbers, less than or greater than a certain number, within a certain period of time if the key is a timestamp
Principles	ACID - Atomicity: translation is the smallest behavior - Consistency: only allow specific format of data - Isolation: multi-users can read the same record - Durability: once a transaction is committed to the database, it will preserve there	BASE - Basically available: it’s always possible to read and write data, even though might not consistent - Soft state: data state could change without interactions with the application - Eventual consistency: data will eventually become consistent once input stops. Allow for high availability and scalability
Emphasis on	Consistency	Scalability
Using scenarios	- Well-structured data - Requires lots of complex querying - Cannot tolerant any data inconsistency	- A very large amount of data that easily fit into a tabular form - Requires high availability and performance, can tolerant temp inconsistency - In-memory user cases (build caching layer, extreme access speed) - Handle lots of small continuous reads and writes (shopping carts, user preference of specific users)
Disadvantages	requires more effort to be scalable	- temporary inconsistency data across node - No standard query language (query needs to be implemented in application layer) - No data normalization (leads to duplicate data)
Examples	oracle MySQL	LevelDB (Google), RocksDB & ZippyDB (Meta), DynamoDB (AWS)

Message queue

Message broker could provide async operations, which could benefit

Decouple internal services
Avoid single point of failures
Reduces latency since services not blocking
Very useful for long-running jobs

Messaging ways

Point-to-point messaging
- only send 1 message single time
- could handle if the receiver’s lost
Publish/subscribe messaging
- Distribute to all listeners

Examples

RabbitMQ: messages not send to queues directly but published to exchanges
Kafka: handles massive loads for a long period of time
Redis: in-memory data store
Azure Service Bus
AmazonMQ: Amazon managed RabbitMQ
Cloud Tasks (Google)

File storage

File storage
- Centralized, high accessible location for files. Lower cost
- Windows, Linux, MacOS OS
Object storage
- Unstructured data, scalability. Provides context about the data. Can only update the whole object
- AWS S3, Azure Blob, Google storage
Block storage
- Store chunks of data, high structured, low data transfer overheads, low latency
- Amazon EBS, Azure Disk Storage, Google Cloud Persistent Disk

Cache & CDN

Hard disk -> (15x faster) SSD -> (300x faster) RAM
Cache keep often used data into RAM
Methods to keep cache sync
- Set a expiration time
- Invalidation service
Use cases
- handle many small continuous reads and writes
- Need to temporarily storing basic information
- Don’t require frequent data updates
Examples
- Redis
- Memcache: in-memory key-value store for small data chunkcs
- Hazlecast: distributed in-memory object store

Content delivery network (CDN)

CDN server holds a cache closing to the clients after the first time query
To cache new modified data
- Don’t store time sensitive data
- Set a maximum duration in case the browser caches
- Cache busting. Make all cached data invalid and cache them again

Search engine database

Search engines are NoSQL databases that excel at providing relevant results evn with typos
Indexing allows for faster search performance
Components
- Document: represents a specific entity, containing data to return to clients
- Inverted Index: split each document into individual search terms and makes each word point to the actual documents
- Ranking algorithm: produces a ranked list of documents
Examples
- Lucene
- Elastic Search
- Atlas Search (MongoDB)
- Redis Search

Some techniques

Base64 encoding: allow to transmit characters. Increase 2 bytes to 3 bytes.
MD5 hash: map a string to fixed-size (128) string
Websocket: real-time communication between 2 clients
- Chat apps
- Live-collaboration apps
- Social feeds
- Tracking apps
- IoT solutions
Polling: request -> response -> wait to request again
- Updates might get delayed by the time period between pools
Long polling: request -> waiting until time out -> request immediately
- Updates are sent to clients immediately
Push: server send events
- Twitter updates
- Updating statues
- Push notifications
- Newsletters
- Sports results
- Disadvantages
  - Unidirectional communication
  - Server don’t know when connection is lost
Chunk: break down a whole file into several chunks
- Easy for transmit
- Reduce the risk of network failures
Streaming service pipeline
- File chunker
- Content filtering
- Transcoder: transcode into target format
- Quality conversion: choose resolution
TCP: for reliable data delivery. Suitable for media streaming
UDP: for fast data delivery. Suitable for online gaming, video chat

Reference

https://www.udemy.com/course/the-bigtech-system-design-interview-bootcamp/