SLAM Learning Resources

Posted on 2022-01-19 Edited on 2023-07-16 In SLAM , 3. Theory

This post organizes the resources that I used to learn SLAM, also serves as a reference to other post in this blog.

SLAM Knowledge Graph

Book

English Version	Chinese Version	Description
Basic Knowledge on Visual SLAM: From Theory to Practice	视觉SLAM十四讲	Visual SLAM architecture, including frontend, backend, loop closing detection, mapping, Lie Group and Lie Algebra and optimization
SLAM in Autonomous Driving book	自动驾驶中的SLAM技术	IMU Kinematics, pre-integration, ESKF, IEKF, GINS, LO, Loosely-coupled / Tightly-coupled LIO, High-fi map
Visual Inertial SLAM: Theory and Source Code	视觉惯性SLAM：理论与源码解析	SLAM engineering details based on ORB SLAM 2 & 3. VIO, multi-map theory
State Estimation for Robotics	机器人学的状态估计	SLAM backend mathematic theory. Recursive methods and batch estimations methods for MAP estimation. Lie Group and Lie Algebra for 3D space optimization
Multiple View Geometry in Computer Vision	计算机视图中的多视图几何	2D/3D Euclidean/Projective Transformations, Camera projection model, monocular, 2-view, 3-view and multi-view geometry
Convex Optimization	凸优化	Methods to analyze and solve optimization problem
OpenCV related books	~	Some books focus on the application of OpenCV like AR, SfM, Object Detection

Course

Course Name	Description
Online Lectures: Basics for Robotics & Photogrammetric Computer Vision (Cyrill Stachniss, 2020)	Introduces the 3d space movement and basic Mathematics theory used in SLAM like homogeneous coordinates, RANSAC
Online Course: Photogrammetric Computer Vision Block - Course Introduction (Cyrill Stachniss, 2020)	Covers the details about different CV algorithm, including feature extraction and pose calculation from images
运筹学相关课程	Covers lots of content, but could only focus on how to solve non-linear optimization problem for learning

Classic Paper

For visual SLAM, refer to this blog
For visual inertial SLAM, refer to this blog
Unified Inverse Depth Parametrization for Monocular SLAM

https://psarlin.com/orienternet/

Blogs

Knowledge Index

Describe camera pose in space

Pose change matrix $T$
SO(3), SE(3), so(3), se(3)
Homogeneous coordinate
Different projection types
- 3D - in homogeneous coordinate
  - Euclidean: rotation + translation, 6DoF
  - Similarity: rotation + translation + scale: 7DoF
  - Affine: 12DoF
  - Projective: 15DoF
- 2D - in homogeneous coordinate
  - Rigid / Euclidean: rotation + translation, 3DoF
  - Linear: rotation + scale, 2DoF
  - Similarity: rotation + translation + scale, 4DoF
  - Affine: rotation + translation + scale + shear, 6DoF
  - Projective: 8DoF

Camera imaging model

Pinhole camera model
- Focal length x, y, optical center, skew coefficient, all in pixels
- Coordinate in camera frame, normalized imaging plane, pixel frame
Lens distortion
- radial distortion, tangential distortion
Calibration - Zhang’s method

Frontend - Feature-based method

Feature extraction
- Keypoints and descriptors
- ORB feature (Oriented FAST and Steer BRIEF)
Feature matching
- Brute force
- FLANN (Fast library of approximate nearest neighbors)
- Optical flow
  - Pixel patch
2D-2D
- Fundamental and essential matrix (vector in same plane)
  - 8 point pairs, 8 unknown
  - Singular value $(\sigma, \sigma, 0)^T$
  - Recover $R$ and $t$ from $E$
- Homography matrix (pixels relationship assuming there are in same plane)
  - 4 point pairs, 8 unknown
- Triangulation
3D-2D
- DLT
  - 6 point pairs, 12 unknown
- P3P
  - 3 points + 1 point for validation
- Bundle adjustment (similar to backend)
3D-3D
- ICP, SVD
- Bundle adjustment (similar to backend)

Frontend - Direct method

Minimizing photometric error
Similar to backend optimization

Backend - Filter (MAP, Incremental)

Kalman filter
Extended Kalman filter

Backend - Optimization (Usually MLE, Batch)

Optimization target
- Put poses and mappoints together as the state vector
- Derivate of error term on pose
- Derivate of error term on mappoint’s position

$$
\frac{\partial e_{ij}}{\partial \xi_i} = -\left[\begin{matrix}
\frac{f_x}{Z_j^{\prime}} & 0 & -\frac{f_x X_j^{\prime}}{(Z_j^{\prime})^2} & -\frac{f_x X_j^{\prime} Y_j^{\prime}}{(Z_j^{\prime})^2} & f_x+\frac{f_x(X_j^{\prime})^2}{(Z_j^{\prime})^2} & -\frac{f_x Y_j^{\prime}}{Z_j^{\prime}} \\
0 & \frac{f_y}{Z_j^{\prime}} & -\frac{f_y Y_j^{\prime}}{(Z_j^{\prime})^2} & -f_y - \frac{f_y(Y_j^{\prime})^2}{(Z_j^{\prime})^2} & \frac{f_y X_j^{\prime} Y_j^{\prime}}{(Z_j^{\prime})^2} & \frac{f_y X_j^{\prime}}{Z_j^{\prime}}
\end{matrix}\right]
$$

$$
\frac{\partial e_{ij}}{\partial p_j} = -\left[\begin{matrix}
\frac{f_x}{Z_j^{\prime}} & 0 & -\frac{f_x X_j^{\prime}}{(Z_j^{\prime})^2} \\
0 & \frac{f_y}{Z_j^{\prime}} & -\frac{f_y Y_j^{\prime}}{(Z_j^{\prime})^2}
\end{matrix}\right] R_i
$$

Solve delta element $\mathbf{H} \mathbf{\Delta x} + \mathbf{b} = 0$
- Schur complement
Iterative solution
- Gradient Descent - first derivate of $F(\mathbf{x})$
- Newton - second derivate of $F(\mathbf{x})$
- Gauss-Newton - first derivate of $f(\mathbf{x})$
- Levengerg-Marquardt - combination of gradient descent and gauss-newton
Acceleration
- Sliding window
- Marginalization using Schur complement
- Pose graph

Loop Closure

Use K-means to build the dictionary, which is a d level k-ary tree
Use vector to represent the words frequency appearing in the frame