Data Structure and Algorithm Summary

This post summarizes the common data structure and algorithm knowledge and questions.

Data Structure

There are several data structure and some structure which may not belongs to any data structure but very essential. This part will include the following content:

  • Linked List (single, cycle, LRU)
  • String (palindrome, substring, subsequence)
  • Array (sliding window, subarray)
  • Summary: Two Pointers
  • Stack, Queue (priority queue, double-ended queue), heap
  • Tree (BST, Trie Tree)
  • Graph
  • Two pointers
  • Union Find

Linked List

String

Array (List, Nums)

Summary: Two Pointers

  • Features:
    • Not a data structure or a algorithm, just a kind of method to solve string, array related problems
  • Left, right pointers
  • Fast, slow pointers
    Question Solution
    Move zeros replace fast and slow
    Sliding window, substring move fast to right, then move left to right
    Linked List related problem fast moves 2 steps, slow moves 1 step
  • From middle to two sides
  • Others, 3 pointers

Set, Map

  • Features
    • Considered in all $O(1)$ scenarios
  • Count the frequency of each elements
    • The first unrepeated element
  • Use map with other data structure
    Question Answer
    LRU Cache + Linked List, Key to prev nodes
    First Unique Number in Data Stream + Linked List, Key to prev nodes
    Insert, delete and getRandom in $O(1)$ + List, Key to index

Stack, Queue (dq) and Heap (PriorityQueue)

Tree

  • Features
    • usually consider recursion
    • recursion can have a return value and modify the self.ans at the same time to solve some problem that need to regard each node as the root node of subtree
    • can transfer to a graph question
  • Categories of trees

    • Binary tree
      • Complete binary tree
        • There is $2^{k-1}$ nodes in the $k$ layer
        • There is $1\sim2^{k-1}$ nodes in the last layer
        • Full binary tree
      • Binary search tree (BST)
        • AVL tree
          • (Balanced) The difference of two sub-tree is less than $1$ for each node
          • Search, Add, Delete a node. Time complexity: $O(log(n))$
          • Rotate to keep balanced when insert a new node
            • Single Rotate Right(left) when new node is in the left(right) subtree of left(right) subtree
            • Double Rotate
    • Huffman tree
      • Nodes with larger weights are near the top
    • Hash tree
      • Value -> Value
      • Use prime number to mod the num, use mod results to divide the tree children. Each node has a attribute, which can represent whether this is a real number
    • Tire tree
      • String -> Value
      • From root to each node represent a word, each node has a attribute of is_word and word, and children {}. Use symbol to find the next children
    • B tree
  • Traverse a tree
  • Paths
    Question Direction Answer
    Count of paths From root to leaf Pass a string parameters to deeper recursion
    Count of paths suming to a target value From root to leaf Pass a string and sum parameters to deeper recursion
    Count of paths suming to a target value From any node going downwards Start recursion from each node
    Longest path Between any two nodes Return a value and change the self.ans at each node
    Path suming to maximum Between any two nodes Return a value and change the self.ans at each node
  • Value Related
  • LCA (Lowest Common Ancestor)
    Question Answer
    Binary Tree Recursion, check whether return current node or answers of left subtree or answers of right subtree
    Binary Search Tree Recursion, check the values with current value
  • Mirror Tree
  • BST
    Question Answer
    Validate Binary Search Tree Return the min and max value and change self.ans at each node
    Search / Insert a node Recursion with a parent node
    Delete a node Recursion, turn the problem to delete a leaf node
    2nd/Kth smallest element in BST Iteration in-order traversal
    Closest 1/K binary tree value Binary search OR In-order traversal
    Next larger element in BST

Graph

  • Features
    • This part only talks about the basic graph problems
    • Graph is always related to serach problem. Some problems is implicit graph problems (can convert to a graph problem). See more of these problems in BFS / DFS part.
  • Implement a graph

    • Adjacency Matrix: Used for dense graphs
      • Implement: 2-dimensional array; Replace 1 by weights if needed; Map alphabet vertexes to index(integer)
      • Easy, simple
      • Good at checking wether there is a path between two nodes
      • Waste memory
    • Adjacency list: Used for sparse graphs
      • Implement: [{1, 3}, {5, 3, 0, 2} …] or {0:{1, 3}, 1:{5, 3, 0, 2},…} or {0:{1: xx, 3:xx}, 1:{5: xx, 3: xx, 0: xx, 2: xx}}
      • Good at checking the neighbors of one node, Save memory
      • Not good at checking wether there is a path between two nodes
    • Vertexes and edges: Often used as a middle tools
      • Vertexes: set, Edges: set of <Weight, Src, Dest>
  • Traverse a graph
    • BFS, Queue (open list), Set (closed list)
    • DFS, Recursion with Set (closed list)
    • Time Complexity: Adjacency matrix: $O(v^2)$, Adjacency list: $O(v + e)$
      Question Answer
      Graph whether a tree BFS with a parent dictionary for each node
      Clone graph Clone nodes and then clone edges
  • Min sum of edges that connected nodes together (MST)

    • code
    • Prim
      • Input: Adjacency Matrix / Adjacency list
      • Get the Vertex + Edges set
      • Select a start_base and put start_base in the visited set, set the edges set into null
      • For each vertex in visited set, check the other vertex not in visited set, and check wether they have a edge
      • Find the min weight, put that edge into edges set, and put that vertex into visited set, until visited set has all the vertexes
      • Output: Edges set
      • Time Complexity: Graph Matrix: $O(v^2)$, Adjacency list: $O(elog_2 v)$
    • Kruskal
      • Input: Adjacency Matrix / Adjacency list
      • Get the Vertex + Edges set
      • Sorted all the edges with their weight
      • Added the edge to a set if the two vertex haven’t been connected together
      • Use Union Find can be quicker to check wether two nodes are already connected. Time Complexity $O(\alpha (n))$, where $\alpha (n)$ is the inverse function of $n = f(x) = A(x, x)$
      • Time Complexity: $O(elog_2 e)$
  • Find the shortest path from one nodes to others

    • Dijstra
      • Input: Adjacency Matrix / Adjacency List + start_base
      • Initialize a dist list to represent the distance to other vertexes, put start_base into visited set
      • Find the direct edges between start_base and other nodes, and put that value into dist list
      • Find a min value, put that vertex into visited vertex
      • Use the new vertex as middle node to update the distance from start_base to other vertexes
      • Keep until no vertexes available or all vertexes are put into visited set
    • Floyd
      • Initialize two array. D[i][j] means the min path from node i to node j. P[i][j] means the last node before j in the path from node i to node j (default -1)
      • $D^k[i][j] = min(D^{k-1}[i][j], D^{k-1}[i][k] + D^{k-1}[k][j])$, $P[i][j] = k$
      • Time Complexity: $O(n^3)$, Space: $O(n^2)$
  • Tarjan Algorithm
    • Based on DFS, Stack, find the strong connected component in a graph
    • Find the cutting edge (critical connections) in a graph

Union Find

1
2
3
4
5
6
7
8
9
10
11
12
def get_parent(n):
if parent[n] != n:
parent[n] = get_parent(parent[n])
return parent[n]

def connected(a, b):
return get_parent(a) == get_parent(b)

def add(a, b):
parent_a = get_parent(a)
parent_b = get_parent(b)
parent[parent_a] = parent_b

Algorithm

This part will include the following content:

  • Binary search
  • BFS (Breathe first search)
  • DFS (Depth first search) and Backtracking
  • Sort (traditional, topological sort)
  • Dynamic programming (top down with memoization, bottom top)
  • Summary: DP Array
  • Greedy

Below is a table showing the search problems and the common used algorithms

Algorithm whether exist all solutions count all solutions min/max distance min/max solution
BFS $\surd$ $\surd$ $\surd$ $\surd$
DFS $\surd$
DP $\surd$ $\surd$ $\surd$
  • Features
    • $log(n)$ time complexity
    • for sorted input
  • Template
    • Potential field: [left, right]
    • Loop condition: left + 1 < right
    • After stop: check both left and right
  • Search for
    • target value
    • min/max value
    • closest value / $k-th$ closest value
      Question Answer
      Search value/frequency of value in sorted array
      Find k closest value(s) in sorted array Binary search + two pointers
      Search value in rotated array I, II compare with the right element, right minus 1 when equals
      Search smallest value in rotated array I, II compare with the right element, right minus 1 when equals
      Find in Mountain Array Fight the highest index, then do two binary search
  • Implicit binary search
    Question Answer
    Median of two sorted array Convert to a binary search problem
    Smallest Rectangle Enclosing Black Pixels Squeeze 2-d array to 1-d array in two dimensional
  • Top $K$ Questions

BFS

  • Features
    • Tree traversal, layer order traversal
    • Graph traversal
    • Implicit graph questions focusing on shortest distance/path, whether existed
    • Find all solutions with a non-recursion method
    • Select one condition from the given conditions OR Generate the next condition yourself
  • Grid matrix
    Question Type
    The maze I Whether could reach destination
    The maze II Shortest distance to destination
    The maze III Shortest path to destination (BFS + DFS)
    Number of Island BFS is better than DFS
  • Implicit Graph
    Question Type
    Word Ladder I Shortest distance, Generate the next condition yourself
    Word Ladder II All shortest paths, BFS + DFS

DFS / Backtracking

  • Features
    • Tree, Graph Traversal, use recursion
    • Implicit graph questions focusing on all solutions
    • Could use memo to reduce time complexity, see Dynamic Programming part
    • Needs a lot of calculation, may StackOverFlow
    • Backtracking
      • A kind of though that requires to return to previous state after a deeper recursive search
      • Each step has several choices. Choose one and modify that parameter before and after calling the deeper recursion function. Then choose other choices.
  • Requirements
    • Searching area, Search target, Visited / Prev
    • Terminate condition
    • How to select OR generate next condition according to search target
    • How to make a choice and modify searching area OR visited before and after get into the deeper recursion to avoid duplicate searching
  • Grid matrix
    Question Type
    Word Search I Whether exist, Pattern search, some points may be searched several times, thus needs Add and Remove values from closed list (Backtracking)
    Word Search II All solutions. Use Trie Tree to get next conditions
    N queens, code All solutions. Modify the prev_rows and cur_row after each deeper recursion
    N queens II Count of all solutions
    Sudoku Solver Iterate every available choices of current position
    Available area of robot Count of solutions
  • Implicit Graph
    Question Answer
    Word Ladder II All shortest paths. Use BFS to generate a distance dict served as the directions of DFS
    Letter Combinations of a Phone Number All solutions
  • Combinations, Time Complexity $2^n$
    Question Answer
    Subsets I In each recursion layer, add remained nodes into prev one by one (remove the previous one before add a new one), and do the deeper recursion for each one
    Subsets II Skip the same num in for loop of recursion
    Combination Sum Template of combination DFS
    N numbers sum to K Template of combination DFS, bag problem (put or not put current, then recursion)
  • Permutations, Time Complexity $n!$
    Question Answer
    Permutation I, II DFS with a visited, see the permutation template
    Permutations of string

Sort

Basic Sort

Topological Sort:

Used for the list of tasks with order

Steps

  • Establish the graph
  • Calculate the in degree for each node, put the $0$ in degree node into queue
  • Put the $0$ in degree node into answer, decrease the in degree for its neighbor nodes, put new $0$ in degree node into queue
  • Check the length of answer with number of graph nodes

Course Schedule I, II

Dynamic Programming

  • Features:
    • From divide and conquer but have overlapping sub-problems and optimal substructure
    • Implicit graph questions focusing on find the max/min distance, or the count of all solutions, or whether valid
      • If the graph can be divided into several layers (to get into a position i, you must travel from several specific positions)
    • Not good at find all solutions, non-ordered input (sometimes need sort the input firstly) except backpack problem
    • Bottom-up or Top-down (memoization search)
  • Format
    • dfs(sub_s, memo) -> sub_ans
    • Search some points several times $\Rightarrow$ add and remove points from the closed list, back-tracking
    • Update memo before return
  • Category
    • 1 list, 2 lists
    • Interval of 1 list
    • Backpack
    • Coordinate 2-d
  • Whether Valid
    Question Answer
    Wildcard Matching 2 lists, $memo[(i, j)]$ means whether $s[i:]$ can match $p[j:]$
    Regular Expression Matching 2 lists, memo is similar to above, check $p[j+1]$ whether is $*$
    Word Break 1 list, $memo[i]$ means whether $s[i:]$ can be broke
  • All solutions (DFS + Memoization Search)
    Question Answer
    Word Break II 1 list, $memo[i]$ means the broken answers of $s[i:]$
    Palindrome Partitioning 1 list, $memo[i]$ means the palindrome partitioning of $s[i:]$
  • Max/min distance
    Question Answer
    Triangle Coordinate, $memo[i]$ means the minimum distance from i to the end
    Edit Distance (Distance between strs) $memo[(str1, str2)]$ means distance between str1 and str2
    Minimum cost to merge stones Interval, $memo[i][j][k]$ means merge from $i$ to $j$ with result of $k$ groups

Bottom-up DP

  • Category
    • 1 list, 2 lists
    • Interval of 1 list
    • Backpack
    • Coordinate 2-d
  • Whether Valid
    Question Answer
    Word Break 1 list, $memo[i]$ means whether $s[i:]$ can be broke
  • Count of all solutions
    Question Answer
    Unique Paths I, II How many paths can be found from A to B in a grid with only right, down move. Follow up: With obstacle, must visit some positions Coordinate, DP[i][j] means the count of paths from beginning to point $(i,j)$
    One’s subsequence is another str 2 lists, DP[i][j] means the count of subsequence in T[..j] appearing in S[..i]
    Combination Sum IV 1 list, DP[i] means the count of solutions summing to number i
    Rectangular cover $\Rightarrow$ use $N$ $2\times 1$ squares to cover a $2\times N$ big square TBA
  • Max/min distance
    Question Answer
    Triangle; Min paths sum Coordinate, $dp[i][j] = \min(dp[i-1][j], dp[i-1][j-1])$
    Longest Increasing Subsequence (LIS) 1 list, $dp[i] = \max(dp[j]) + 1, (j < i)$, including $i$th itself OR greedy + binary search
    Russian Doll Envelopes (LIS) 1 list, including $i$th itself OR greedy + binary search
    Longest Common Subsequence (LCS) 2 sequence, $dp[i][j] = \max(dp[i][j-1], dp[i-1][j]) \ or \ dp[i-1][j-1] + 1$
    Longest palindromic subsequence (LPS) Interval, $dp[i][j] = \max(dp[i][j-1], dp[i-1][j]) \ or \ dp[i-1][j-1] + 2$
    Largest Divisible Subset 1 list, return solution, sort firstly. DP[i] is the max count from 0 to i position. Use prev to store the previous one
    Max profit of buy and sell stocks single time; at most $k$ times 1 list, $dp[i][k] = \max(dp[i-1][k], dp[m][k-1] + num[i] - num[m]) (m < i)$
    House robber 1 list, maximum money could get from beginning to position i
    Max product of a consecutive subsequence 2 dp arrays
    Jump stairs 1 list
    Fib list 1 list

Summary: DP Array

DP[i]

Traget Range
Largest sum, largest product, minimum product, num of increasing elements and including i-th element list[..i]
Count of solutions sum to number i
Count of solutions when reaching at position i list[..i]
Can be broken into words, broken answers, palindromic brokens list[i…]

DP[i][j]

Traget Range
is palindromic, longest subsequence, cnt need to remove to make palindrome string[i…j]
longest common subsequence str1[…i] and str2[…j], first i elements of str1 and first j elements of str2
Count of solutions, Min/max cost when arriving position $(i,j)$ From $(0, 0)$ to $(i, j)$
Max profix at day $i$ if do at most $k$ transactions list[..i]

Greedy

Reference

[1] Data Structure and Algorithm Overview