How to design a SLAM system

Posted on 2023-11-15 Edited on 2024-05-18 In SLAM , 4. Implementation

The SLAM algorithm based on traditional computer vision geometry and state estimation has been mature. There are lots of open source SLAM architecture, and different commercial products have their own implementation. Every SLAM implementation has its pros and cons. This post is to provide the high level guidance for how to design a practical SLAM system for products.

Is SLAM needed?

The first question to ask is whether SLAM is really needed. SLAM algorithm has 2 parts, localization and mapping. It could provide very accurate 6DoF pose relative to the starting point, and build 3D point cloud of the nearby environment without any prior info. However, SLAM system usually cost lots of compute resources and memory.

According to the requirements and available resources, SLAM may not needed. e.g. If the system just want to achieve a coarse outdoor localization, GNSS might be enough. If there are some outside-in tracking systems like the base stations of HTC Vive, SLAM is not required either.

A SLAM system is needed if

Long period, realtime, accurate 6DoF pose needed
And, not enough knowledge of the environment
And, no outside-in tracking support or not enough

Usually, SLAM focuses on the realtime localization, and the mapping is mainly to help for the localization. Tasks focusing on mapping like the large scale reconstruction are usually offline, where a similar technique called Structure from Motion may be used instead of SLAM.

Metrics: Accuracy, Efficiency, Robustness

If a SLAM is needed, the next thing is to consider which 2 of 3 metrics are most important to your system since achieving the 3 metrics is difficult and may not necessary.

Accuracy + Efficiency: this means you could optimize your system for some specific environments, so system may degrade in other environments
Accuracy + Robustness: this means you could perform larger scale computation, so the output is not that realtime
Efficiency + Robustness: this means you performs very fundamental computations, so the accuracy is kind of low

Sensor Choices

According to the requirements and the target metrics, you could choose what types of the general sensors you want to use.

IMU: usually needed to provide motion info, but the price range is very large, thus the accuracy varies
Cameras: low lost but not very robust. You could also choose how many cameras you need
Lidar: high cost but accurate and could decrease the compute needs. You could also choose how many lidars you need

For specific product, you could also select other sensors. Some examples choices for different products are as follows.

Autonomous vehicle: IMU, Cameras, Lidars, GNSS, Speedometer, Radars
AR/VR devices: IMU, Cameras
Robot vacuum: IMU, Lidar, Camera
Drone: IMU, Cameras

Every sensor has their processing method.

IMU
- Usually modeled as orientation, position, velocity, accelerometer bias, gyroscope bias
- Use kinematic equations to propagate states
- Could be fused with other info with ESKF (Error State Kalman Filter), or graph optimization with pre-integration techniques
Camera
- Has different model imaging model to use. Usually is pinhole model plus different distortion correction
- Usually needs feature extraction (with image pyramid) & feature matching to build data association
- Optimize to minimize re-projection errors
- Triangulation to get 3D landmarks
Lidar
- Store the point clouds in k-d tree or voxel for nearest neighbor searching
- Optimize to minimize point-to-point, point-to-line, point-to-plane errors

Optimization Choices

Tradeoff between accuracy vs efficiency.

What states to optimize

According to the requirements and the quality of the sensor, we could choose how many past states (pose and landmark position) we want to optimize. If choose not to optimize all states, the older states need to be marginalized.

The more states to be optimized, the more accurate of the system, but more computing.

Choices include

Only optimize current state: filter method (EKF, IEKF). Need to marginalize all past info
- If also choose to marginalize all 3D landmarks, MSCKF needs to be used
Optimize current state and some past states: sliding window optimization method. Need to marginalize oldest info
Optimize current state and all past states: full optimization method. No marginalization.

How to choose linearization point

Linearization point is the value for the state which is used to compute Jacobian matrix. We could choose which states needs to use the updated value to recompute the Jacobian matrix, and which states could just use the FEJ (First Estimate Jacobian), which means only computes the Jacobian matrix at the initial linearization point.

The more relinearization, the more accurate of the system, but more computing.

For marginalized states, they use FEJ
For optimizing states, we could choose to fix their linearization point dynamically
- Stop relinearization for some landmarks after some iterations since the gain to relinearize them is small
- For states update: EKF vs IEKF

Implementation Tricks

Summary	Description	Examples
Base on practice	To determine which result is the best, calculate their performance on some metrics and choose the best one	1. Choose H or F matrix during initialization 2. Use ransac to get results from inliers
Relative best	Choosing best result needs both absolute and relative threshold	1. Determine loop closure 2. Relocalization
Coarse to fine	Step 1 to calculate rough result quickly, then step 2 to refine it	1. 2-stage tracking 2. Add lots of mappoints and keyframes, then cull them
Image pyramid	Extract features from different scale, and always match features in different scale	~

Specific Methods

Active feature search for fast feature matching: active feature search means to search in an area on the image to get matched feature with previous mappoint. Compared to feature extraction and then brute-force match, active feature search is more efficient.
1. Search in a region by grid
2. Search with BoW tree
Set bag flag for mappoint and keyframe
1. When to set
2. Always check the flag when using them
Mappoint properties
1. Distinct descriptor
2. Normalized observation direction (could deal with occlusion)
Mappoint fusion

Reference

视觉惯性SLAM：理论与源码解析