Data Structure and Algorithm Summary
This post summarizes the common data structure and algorithm knowledge and questions.
Data Structure
There are several data structure and some structure which may not belongs to any data structure but very essential. This part will include the following content:
- Linked List (single, cycle, LRU)
- String (palindrome, substring, subsequence)
- Array (sliding window, subarray)
- Summary: Two Pointers
- Stack, Queue (priority queue, double-ended queue), heap
- Tree (BST, Trie Tree)
- Graph
- Two pointers
- Union Find
Linked List
- Features:
- new insert / handle item is in the end
- $O(1)$ insert and delete
- dummy node, tail pointer
- double-linked list
- Traverse
- $k$-th node from the tail $\Rightarrow$ Fast pointer traverses $k$ firstly
- Delete a node
- Record the previous node. e.g.
keyToPrevNode
- Record the previous node. e.g.
- Reverse
- Iteration
- Recursion
- Copy
- Circle Linked list
Question Answer Check existed Fast Slow pointers Check the entry 2 pointers at bg and encounter Joseph ring Construct a cycle linked list - Y Shape linked list
- Common node $\Rightarrow$ Traverse $\text{abs}(m - n)$ nodes in longer list firstly
- Flat multi-level doubly linked list
- LRU (Least Recently Used) cache implement $\Rightarrow$ linked list + map (keyToPrevNode)
String
Palindrome
Question Answer Longest Palindrome Substring, code Recursion from the center or 2-d DP Valid Palindrome remove characters II, III Left right pointers + Recursion or 2-d DP Substring
Question Answer Longest substring without repeating characters Fast slow pointers + set Longest substring with at most $K$ distinct characters Fast slow pointers + dict Longest Substring with at Least $K$ Repeating Characters Fast slow pointers + dict + recursion Subsequence
Question Answer LPS (Longest Palindromic subsequence), code 2-d DP LCS (Longest Common Subsequence) 2-d DP Parentheses - Backtracking
Question Answer Generate parentheses Count the number of left and right parentheses Remove invalid parentheses Count the number of left and right invalid parentheses
Array (List, Nums)
Consecutive Subsequence, sliding window
Question Answer Find a sliding windows of consecutive sequence sums to $K$ Fast slow pointers Find a consecutive sequence has max sum DP Find a consecutive sequence has max product DP, keep tracking min/max ahead Calculate max value in moving fixed-length sliding window dq, store the idx Subsequence
Question Answer LIS (Longest increasing subsequence) DP 2-d Matrix
Question Answer Clockwise printing matrix 2-d Scan
Question Answer Build product array Two scan Contain water Two scan Index - value
Question Answer Find missing value from $1$ to $N$ in array Use the value as the index, change the sign of the index. OR Use the binary form of an integer to be the represents
Summary: Two Pointers
- Features:
- Not a data structure or a algorithm, just a kind of method to solve string, array related problems
- Left, right pointers
- Valid palindromic I, II, III
- 2 sum
Type Question Equals a target value 2 Sum, 3 Sum Max value smaller than $K$ 2 Sum smaller than $K$ Closest solution to $K$ 2 Sum closest, 3 Sum closest Count of solutions Count of 2 Sum smaller than $K$, Count of 3 Sum smaller than $K$
- Fast, slow pointers
Question Solution Move zeros replace fast and slow Sliding window, substring move fast to right, then move left to right Linked List related problem fast moves 2 steps, slow moves 1 step - From middle to two sides
- Palindromic string
- Find K Closest Elements $\Rightarrow$ find the smallest, then from middle to two sides
- Others, 3 pointers
Set, Map
- Features
- Considered in all $O(1)$ scenarios
- Count the frequency of each elements
- The first unrepeated element
- Use map with other data structure
Question Answer LRU Cache + Linked List, Key to prev nodes First Unique Number in Data Stream + Linked List, Key to prev nodes Insert, delete and getRandom in $O(1)$ + List, Key to index
Stack, Queue (dq) and Heap (PriorityQueue)
- Double-ended (dq) queue
deque([])
,pop()
,popleft()
- Calculate max value of sliding window
- Stack
Question Answer Stack with getMin()
use another stack store smaller elements than top The push and pop list of stack - Top $K$, $O(n\text{log(k)})$
- $K$ smallest elements
- $K$ Closest Points to Origin
- Top $K$ Frequent Elements
- High five
- Heap
Question Answer The median in data stream $1$ min Heap + $1$ max Heap Sliding window median Similar to median in data stream, solution Ugly Number II Iterate k times Merge K Sorted Lists make pair of value and ListNode
Tree
- Features
- usually consider recursion
- recursion can have a
return value
and modify theself.ans
at the same time to solve some problem that need to regardeach node as the root
node of subtree - can transfer to a graph question
Categories of trees
- Binary tree
- Complete binary tree
- There is $2^{k-1}$ nodes in the $k$ layer
- There is $1\sim2^{k-1}$ nodes in the last layer
- Full binary tree
Binary search tree (BST)
- AVL tree
- (Balanced) The difference of two sub-tree is less than $1$ for each node
- Search, Add, Delete a node. Time complexity: $O(log(n))$
- Rotate to keep balanced when insert a new node
- Single Rotate Right(left) when new node is in the left(right) subtree of left(right) subtree
- Double Rotate
- AVL tree
- Complete binary tree
- Huffman tree
- Nodes with larger weights are near the top
- Hash tree
- Value -> Value
- Use prime number to mod the num, use mod results to divide the tree children. Each node has a attribute, which can represent whether this is a real number
Tire tree
- String -> Value
- From root to each node represent a word, each node has a attribute of
is_word
andword
, andchildren {}
. Use symbol to find the next children
- B tree
- Binary tree
- Traverse a tree
- BFS, Level order traversal
- DFS, Pre-order, In-order, Post-order traversal
- Recursion
Question Answer Get the min/max depth of tree Change self.ans
at leaf nodeFind $k$-th smallest element in BST in-order traversal Flattern Binary tree to Linked List in-order traversal Check whether list is the post-order of BST - Iteration
- Pre-order, In-order, Post-order $\Rightarrow$ Use stack, iterate to push left child
- Tree iterator
- Recursion
- Print Vertically (Column by column) $\Rightarrow$ traverse with a map, key is column idx
- Serialize and Deserialize
Question Answer Serialize and deserialize a tree BFS Use $2$ traverse lists to construct a tree, I, II Select a current root and use Recursion construct sub-tree
- Paths
Question Direction Answer Count of paths From root to leaf Pass a string parameters to deeper recursion Count of paths suming to a target value From root to leaf Pass a string and sum parameters to deeper recursion Count of paths suming to a target value From any node going downwards Start recursion from each node Longest path Between any two nodes Return a value and change the self.ans at each node Path suming to maximum Between any two nodes Return a value and change the self.ans at each node - Value Related
- Maximum average subtree $\Rightarrow$ Return value of sum number and count of nodes of subtree
- LCA (Lowest Common Ancestor)
Question Answer Binary Tree Recursion, check whether return current node or answers of left subtree or answers of right subtree Binary Search Tree Recursion, check the values with current value - Mirror Tree
- BST
Question Answer Validate Binary Search Tree Return the min and max value and change self.ans
at each nodeSearch / Insert a node Recursion with a parent node Delete a node Recursion, turn the problem to delete a leaf node 2nd/Kth smallest element in BST Iteration in-order traversal Closest 1/K binary tree value Binary search OR
In-order traversalNext larger element in BST
Graph
- Features
- This part only talks about the basic graph problems
- Graph is always related to serach problem. Some problems is implicit graph problems (can convert to a graph problem). See more of these problems in BFS / DFS part.
Implement a graph
- Adjacency Matrix: Used for dense graphs
- Implement: 2-dimensional array; Replace
1
by weights if needed; Map alphabet vertexes to index(integer) - Easy, simple
- Good at checking wether there is a path between two nodes
- Waste memory
- Implement: 2-dimensional array; Replace
- Adjacency list: Used for sparse graphs
- Implement: [{1, 3}, {5, 3, 0, 2} …] or {0:{1, 3}, 1:{5, 3, 0, 2},…} or {0:{1: xx, 3:xx}, 1:{5: xx, 3: xx, 0: xx, 2: xx}}
- Good at checking the neighbors of one node, Save memory
- Not good at checking wether there is a path between two nodes
- Vertexes and edges: Often used as a middle tools
- Vertexes: set, Edges: set of <Weight, Src, Dest>
- Adjacency Matrix: Used for dense graphs
- Traverse a graph
- BFS, Queue (open list), Set (closed list)
- DFS, Recursion with Set (closed list)
- Time Complexity: Adjacency matrix: $O(v^2)$, Adjacency list: $O(v + e)$
Question Answer Graph whether a tree BFS with a parent dictionary for each node Clone graph Clone nodes and then clone edges
Min sum of edges that connected nodes together (MST)
- code
- Prim
- Input: Adjacency Matrix / Adjacency list
- Get the Vertex + Edges set
- Select a start_base and put start_base in the visited set, set the edges set into null
- For each vertex in visited set, check the other vertex not in visited set, and check wether they have a edge
- Find the min weight, put that edge into edges set, and put that vertex into visited set, until visited set has all the vertexes
- Output: Edges set
- Time Complexity: Graph Matrix: $O(v^2)$, Adjacency list: $O(elog_2 v)$
- Kruskal
- Input: Adjacency Matrix / Adjacency list
- Get the Vertex + Edges set
- Sorted all the edges with their weight
- Added the edge to a set if the two vertex haven’t been connected together
- Use
Union Find
can be quicker to check wether two nodes are already connected. Time Complexity $O(\alpha (n))$, where $\alpha (n)$ is the inverse function of $n = f(x) = A(x, x)$ - Time Complexity: $O(elog_2 e)$
Find the shortest path from one nodes to others
- Dijstra
- Input: Adjacency Matrix / Adjacency List + start_base
- Initialize a dist list to represent the distance to other vertexes, put start_base into visited set
- Find the direct edges between start_base and other nodes, and put that value into dist list
- Find a min value, put that vertex into visited vertex
- Use the new vertex as middle node to update the distance from start_base to other vertexes
- Keep until no vertexes available or all vertexes are put into visited set
- Floyd
- Initialize two array. D[i][j] means the min path from node i to node j. P[i][j] means the last node before j in the path from node i to node j (default -1)
- $D^k[i][j] = min(D^{k-1}[i][j], D^{k-1}[i][k] + D^{k-1}[k][j])$, $P[i][j] = k$
- Time Complexity: $O(n^3)$, Space: $O(n^2)$
- Dijstra
- Tarjan Algorithm
- Based on DFS, Stack, find the strong connected component in a graph
- Find the cutting edge (critical connections) in a graph
Union Find
1 | def get_parent(n): |
Algorithm
This part will include the following content:
- Binary search
- BFS (Breathe first search)
- DFS (Depth first search) and Backtracking
- Sort (traditional, topological sort)
- Dynamic programming (top down with memoization, bottom top)
- Summary: DP Array
- Greedy
Below is a table showing the search problems and the common used algorithms
Algorithm | whether exist | all solutions count | all solutions | min/max distance | min/max solution |
---|---|---|---|---|---|
BFS | $\surd$ | $\surd$ | $\surd$ | $\surd$ | |
DFS | $\surd$ | ||||
DP | $\surd$ | $\surd$ | $\surd$ |
Binary search
- Features
- $log(n)$ time complexity
- for sorted input
- Template
- Potential field:
[left, right]
- Loop condition:
left + 1 < right
- After stop: check both
left
andright
- Potential field:
- Search for
- target value
- min/max value
- closest value / $k-th$ closest value
Question Answer Search value/frequency of value in sorted array Find k closest value(s) in sorted array Binary search + two pointers Search value in rotated array I, II compare with the right element, right minus 1 when equals Search smallest value in rotated array I, II compare with the right element, right minus 1 when equals Find in Mountain Array Fight the highest index, then do two binary search
- Implicit binary search
Question Answer Median of two sorted array Convert to a binary search problem Smallest Rectangle Enclosing Black Pixels Squeeze 2-d array to 1-d array in two dimensional - Top $K$ Questions
- $K$ Largest Element in an Array $\Rightarrow$ sort + binary search
OR
divide and conquer
- $K$ Largest Element in an Array $\Rightarrow$ sort + binary search
BFS
- Features
- Tree traversal, layer order traversal
- Graph traversal
- Implicit graph questions focusing on
shortest distance/path
,whether existed
- Find
all solutions
with anon-recursion method
- Select one condition from the given conditions
OR
Generate the next condition yourself
- Grid matrix
Question Type The maze I Whether could reach destination The maze II Shortest distance to destination The maze III Shortest path to destination (BFS + DFS) Number of Island BFS is better than DFS - Implicit Graph
Question Type Word Ladder I Shortest distance, Generate the next condition yourself Word Ladder II All shortest paths, BFS + DFS
DFS / Backtracking
- Features
- Tree, Graph Traversal, use recursion
- Implicit graph questions focusing on
all solutions
- Could use memo to reduce time complexity, see Dynamic Programming part
- Needs a lot of calculation, may StackOverFlow
- Backtracking
- A kind of though that requires to return to previous state after a deeper recursive search
- Each step has several choices. Choose one and modify that parameter
before and after
calling the deeper recursion function. Then choose other choices.
- Requirements
Searching area
,Search target
,Visited / Prev
- Terminate condition
- How to select
OR
generate next condition according to search target - How to make a choice and modify searching area OR visited
before and after
get into the deeper recursion toavoid duplicate
searching
- Grid matrix
Question Type Word Search I Whether exist, Pattern search, some points may be searched several times, thus needs Add
andRemove
values from closed list (Backtracking)Word Search II All solutions. Use Trie Tree to get next conditions N queens, code All solutions. Modify the prev_rows
andcur_row
after each deeper recursionN queens II Count of all solutions Sudoku Solver Iterate every available choices of current position Available area of robot Count of solutions - Implicit Graph
Question Answer Word Ladder II All shortest paths. Use BFS to generate a distance dict served as the directions of DFS Letter Combinations of a Phone Number All solutions - Combinations, Time Complexity $2^n$
Question Answer Subsets I In each recursion layer, add remained nodes into prev one by one (remove the previous one before add a new one), and do the deeper recursion for each one Subsets II Skip the same num in for loop of recursion Combination Sum Template of combination DFS N numbers sum to K Template of combination DFS, bag problem (put or not put current, then recursion) - Permutations, Time Complexity $n!$
Question Answer Permutation I, II DFS with a visited, see the permutation template Permutations of string
Sort
Basic Sort
Topological Sort:
Used for the list of tasks with order
Steps
- Establish the graph
- Calculate the in degree for each node, put the $0$ in degree node into queue
- Put the $0$ in degree node into answer, decrease the in degree for its neighbor nodes, put new $0$ in degree node into queue
- Check the length of answer with number of graph nodes
Dynamic Programming
- Features:
- From divide and conquer but have overlapping sub-problems and optimal substructure
- Implicit graph questions focusing on find the
max/min
distance, or thecount
of all solutions, orwhether valid
- If the graph can be divided into several layers (to get into a position i, you must travel from several specific positions)
Not good at
find all solutions, non-ordered input (sometimes needsort the input firstly
) except backpack problem- Bottom-up or Top-down (memoization search)
Memoization Search
- Format
dfs(sub_s, memo) -> sub_ans
- Search some points several times $\Rightarrow$
add
andremove
points from the closed list, back-tracking - Update memo before return
- Category
- 1 list, 2 lists
- Interval of 1 list
- Backpack
- Coordinate 2-d
- Whether Valid
Question Answer Wildcard Matching 2 lists, $memo[(i, j)]$ means whether $s[i:]$ can match $p[j:]$ Regular Expression Matching 2 lists, memo is similar to above, check $p[j+1]$ whether is $*$ Word Break 1 list, $memo[i]$ means whether $s[i:]$ can be broke - All solutions (DFS + Memoization Search)
Question Answer Word Break II 1 list, $memo[i]$ means the broken answers of $s[i:]$ Palindrome Partitioning 1 list, $memo[i]$ means the palindrome partitioning of $s[i:]$ - Max/min distance
Question Answer Triangle Coordinate, $memo[i]$ means the minimum distance from i to the end Edit Distance (Distance between strs) $memo[(str1, str2)]$ means distance between str1 and str2 Minimum cost to merge stones Interval, $memo[i][j][k]$ means merge from $i$ to $j$ with result of $k$ groups
Bottom-up DP
- Category
- 1 list, 2 lists
- Interval of 1 list
- Backpack
- Coordinate 2-d
- Whether Valid
Question Answer Word Break 1 list, $memo[i]$ means whether $s[i:]$ can be broke - Count of all solutions
Question Answer Unique Paths I, II How many paths can be found from A to B in a grid with only right, down move. Follow up: With obstacle, must visit some positions Coordinate, DP[i][j]
means the count of paths from beginning to point $(i,j)$One’s subsequence is another str 2 lists, DP[i][j]
means the count of subsequence in T[..j] appearing in S[..i]Combination Sum IV 1 list, DP[i]
means the count of solutions summing to numberi
Rectangular cover $\Rightarrow$ use $N$ $2\times 1$ squares to cover a $2\times N$ big square TBA - Max/min distance
Question Answer Triangle; Min paths sum Coordinate, $dp[i][j] = \min(dp[i-1][j], dp[i-1][j-1])$ Longest Increasing Subsequence (LIS) 1 list, $dp[i] = \max(dp[j]) + 1, (j < i)$, including $i$th itself OR
greedy + binary searchRussian Doll Envelopes (LIS) 1 list, including $i$th itself OR
greedy + binary searchLongest Common Subsequence (LCS) 2 sequence, $dp[i][j] = \max(dp[i][j-1], dp[i-1][j]) \ or \ dp[i-1][j-1] + 1$ Longest palindromic subsequence (LPS) Interval, $dp[i][j] = \max(dp[i][j-1], dp[i-1][j]) \ or \ dp[i-1][j-1] + 2$ Largest Divisible Subset 1 list, return solution, sort firstly. DP[i]
is the max count from 0 toi
position. Useprev
to store the previous oneMax profit of buy and sell stocks single time; at most $k$ times 1 list, $dp[i][k] = \max(dp[i-1][k], dp[m][k-1] + num[i] - num[m]) (m < i)$ House robber 1 list, maximum money could get from beginning to position i
Max product of a consecutive subsequence 2 dp arrays Jump stairs 1 list Fib list 1 list
Summary: DP Array
DP[i]
Traget | Range |
---|---|
Largest sum, largest product, minimum product, num of increasing elements and including i-th element | list[..i] |
Count of solutions sum to number i |
|
Count of solutions when reaching at position i |
list[..i] |
Can be broken into words, broken answers, palindromic brokens | list[i…] |
DP[i][j]
Traget | Range |
---|---|
is palindromic, longest subsequence, cnt need to remove to make palindrome | string[i…j] |
longest common subsequence | str1[…i] and str2[…j], first i elements of str1 and first j elements of str2 |
Count of solutions, Min/max cost when arriving position $(i,j)$ | From $(0, 0)$ to $(i, j)$ |
Max profix at day $i$ if do at most $k$ transactions | list[..i] |
Greedy
- Features:
- Always keeps optimal substructure
- Buy and sell stocks
- As many as possible
- With fee (TBA)
- Exchange coins