Visual Inertial SLAM Introduction

This post talks about the theory for Visual Inertial SLAM. Notes of book SLAM in Autonomous Driving.

IMU

In Visual SLAM, only observation equation exists due to the lack of input signal. Though constant velocity or constant accelerations could be used for initial guess of the pose at new frame time, it cannot provide the uncertainty/covariance propagation, so the Visual SLAM usually degrades to Bundle Adjustment, or MLE estimation.

Visual Inertial SLAM could use the IMU measurements as the input signal, which could provide the control equation to propagate the system state and covariance. This could turn the SLAM as MAP estimation and could provide higher accuracy.

Since IMU measurements also contains noise like accel (accelerometer) bias, gyro (gyroscope) bias, and random noise, the system state needs to be augmented to not only include 6DoF pose, but also the IMU parameters. This section talks about the IMU Kinematics model, and how to construct the system state in Visual Inertial SLAM system.

IMU Kinematics

An IMU outputs the angular velocity from the gyro and the specific force from the accelerometer, both in the body (IMU) frame and both corrupted by a slowly-varying bias and white noise. Let $R = R_{wb} \in SO(3)$ be the body to world rotation, $p, v$ the position and velocity of the body in the world frame, $g$ the gravity vector in the world frame (e.g. $[0, 0, -9.81]^T$), and $\omega, a$ the true body angular velocity and the body acceleration in the world frame.

Measurement model

$$
\begin{aligned}
\tilde\omega &= \omega + b_g + \eta_g \\
\tilde a &= R^T(a - g) + b_a + \eta_a
\end{aligned}
$$
where

  • $\tilde\omega, \tilde a$ are the raw gyro / accel readings
  • $b_g, b_a$ are the gyro / accel biases, modeled as random walks $\dot b_g = \eta_{bg}$, $\dot b_a = \eta_{ba}$
  • $\eta_g, \eta_a$ are zero-mean white noises

The accelerometer senses the specific force $a - g$, not the acceleration. At rest $a = 0$, so $\tilde a = -R^T g + b_a$, i.e. it measures the reaction to gravity expressed in the body frame. This coupling to gravity is what lets the accel observe roll and pitch.

Continuous kinematics

Inverting the measurement model gives the true rates that drive the state. The gyro integrates the orientation, and the gravity-compensated accel integrates the velocity and position
$$
\begin{aligned}
\dot p &= v \\
\dot v &= R(\tilde a - b_a - \eta_a) + g \\
\dot R &= R(\tilde\omega - b_g - \eta_g)^\wedge
\end{aligned}
$$

Discrete integration

Treating the readings as constant over a small interval $\Delta t$ (zero-order hold) and writing the bias-corrected mean values $\hat a = \tilde a - b_a$, $\hat\omega = \tilde\omega - b_g$
$$
\begin{aligned}
R_{t+1} &= R_t\mathrm{Exp}(\hat\omega\Delta t) \\
v_{t+1} &= v_t + (R_t\hat a + g)\Delta t \\
p_{t+1} &= p_t + v_t\Delta t + \tfrac{1}{2}(R_t\hat a + g)\Delta t^2
\end{aligned}
$$

The orientation is propagated first by right-multiplying the body rotation increment (the same body to world convention used in the attitude filter), then the updated rotation maps the body specific force back into the world frame so that velocity and position can be integrated. Using the midpoint rotation between $R_t$ and $R_{t+1}$ in the velocity / position terms gives a more accurate integration than the plain $R_t$.

Visual Inertial System Equations

TBD

IMU Pre-integration

TBD

Loosely Coupled

Loosely coupled method are usually recursive estimation like EKF. Where the IMU measurements are used to predict the system state with the system equations, and camera measurements are used to perform Visual SLAM, then the pose is used to update the system state.

Attitude Filter with ESKF

IMU could be used to estimate orientation. It is usually used together with magnetometer to reduce the yaw drifting. Usually ESKF (error state Kalman filter) is used to fuse the measurements from different sensors. ESKF shares the predict–update recursion of the EKF, but it tracks a small error state on the manifold rather than the full state directly.

In terms of the attitude filter with IMU and mag, the real states $\hat{x}$ could be modeled as following
$$
\hat{x} = [\hat\phi, \hat{b}_g, \hat{s}, \hat{b}_a, \hat{b}_m, \hat{f}]^T \in \mathbb{R}^{18}
$$
where

  • $\hat\phi$ is the transformation of world from IMU, i.e. the orientation of IMU in world space. It’s in Lie Algebra form and $\hat R_{wi} = \exp(\hat{\phi}^{\wedge}) = \mathrm{Exp}(\hat\phi)$
  • $\hat{b}_g$ is the gyro bias
  • $\hat{s}$ is the gyro scale factor
  • $\hat{b}_a$ is the accel bias
  • $\hat{b}_m$ is the mag bias
  • $\hat{f}$ is the mag field in IMU reference frame

The real state $\hat{x}$ is constructed from the nominal state $x$ and the error state $\delta x$ as $\hat{x} = x \oplus \delta x$. Component-wise we have
$$
\begin{aligned}
\hat\phi &= \mathrm{Log}((\mathrm{Exp}(\delta\phi)) \mathrm{Exp}(\phi)) \\
\hat{b}_g &= b_g + \delta b_g \\
\hat{s} &= s + \delta s \\
\hat{b}_a &= b_a + \delta b_a \\
\hat{b}_m &= b_m + \delta b_m \\
\hat{f} &= f + \delta f
\end{aligned}
$$

where the nominal state is
$$
x = [\phi, b_g, s, b_a, b_m, f]^T \in \mathbb{R}^{18}
$$

and the error state is
$$
\delta x = [\delta\phi, \delta b_g, \delta s, \delta b_a, \delta b_m, \delta f]^T \in \mathbb{R}^{18}
$$

Orientation convention

The orientation uses a left perturbation so that the error of the orientation is expressed in world coordinate space.

The remaining five components are ordinary vectors, so their errors are additive. The error state $\delta x$ stays small and near zero, which is exactly where the Gaussian assumption and the covariance $P$ are well defined.

The ESKF steps are as follows

Predict
$$
x_k^- = f(x_{k-1}^+, \tilde \omega_k), \quad \delta x_k^- = 0, \quad P_k^- = F_k P_{k-1}^+ F_k^T + R_k
$$

Update
$$
K_k = P_k^- H_k^T (H_k P_k^- H_k^T + Q_k )^{-1}
$$
$$
\delta x_k^+ = K_k (\tilde z_k - h(x_k^-)), \quad P_k^+ = (I - K_k H_k) P_k^-
$$

Merge and reset
$$
x_k^+ = x_k^- \oplus \delta x_k^+, \quad \delta x_k^+ \leftarrow 0
$$

Explanation

  • Gyro propagates the nominal state and the error-state covariance
  • Accel and mag update the error state through the Kalman gain; the corrected error state is then injected into the nominal state and reset back to zero
  • $F_k = \dfrac{\partial (f)}{\partial (\delta x)}\Big|_{x_{k-1}^+}$ is the error-state transition Jacobian: the predict $f$ linearized with respect to the error state and evaluated at the nominal state
  • $H_k = \dfrac{\partial (h)}{\partial (\delta x)}\Big|_{x_k^-}$ is the measurement Jacobian: the observation $h$ linearized with respect to the error state and evaluated at the nominal state

Predict with gyro

Gyro measures the angular velocity in IMU reference frame. With the measurement $\tilde\omega_{k-1} \in \mathbb{R}^3$, we propagate the state from step $k-1$ to step $k$ over the interval $\Delta t$.

Nominal state

Only the orientation is driven by the gyro; the remaining five components carry over unchanged.
$$
\begin{aligned}
\phi_k &= \mathrm{Log}\big(\mathrm{Exp}(\phi_{k-1})\mathrm{Exp}(S_{k-1}(\tilde\omega_{k-1} - b_{g,k-1}) \Delta t)\big) \\
b_{g,k} &= b_{g,k-1} \\
s_k &= s_{k-1} \\
b_{a,k} &= b_{a,k-1} \\
b_{m,k} &= b_{m,k-1} \\
f_k &= f_{k-1}
\end{aligned}
$$
where $S_{k-1} = \mathrm{diag}(1 + s_{k-1})$ is the diagonal gyro scale-factor matrix.

Error state

Component-wise, only the gyro and accel biases carry random-walk noise; the scale factor, mag bias and mag field stay constant, while the orientation error couples to the gyro bias and scale errors
$$
\begin{aligned}
\delta\phi_k &= \delta\phi_{k-1} + R_{k-1}J_l(a_{k-1})\big(-S_{k-1}\delta b_{g,k-1} + \mathrm{diag}(\tilde\omega_{k-1} - b_{g,k-1})\delta s_{k-1} - S_{k-1}\eta_g\big)\Delta t \\
\delta b_{g,k} &= \delta b_{g,k-1} + \eta_{bg} \\
\delta s_k &= \delta s_{k-1} \\
\delta b_{a,k} &= \delta b_{a,k-1} + \eta_{ba} \\
\delta b_{m,k} &= \delta b_{m,k-1} \\
\delta f_k &= \delta f_{k-1}
\end{aligned}
$$
where

  • $\eta_g \sim \mathcal{N}(0, \Sigma_g)$ is the gyro white noise
  • $\eta_{bg} \sim \mathcal{N}(0, \Sigma_{bg})$ is the gyro bias random-walk noise
  • $\eta_{ba} \sim \mathcal{N}(0, \Sigma_{ba})$ is the accel bias random-walk noise
  • $J_l(a_{k-1})$, with $a_{k-1} = S_{k-1}(\tilde\omega_{k-1} - b_{g,k-1})\Delta t$, is the left Jacobian of $SO(3)$ evaluated at the nominal rotation increment (derived below)

How to compute the left and right Jacobians

For a rotation vector $\phi$ with $\theta = |\phi|$, the left and right Jacobians of $SO(3)$ are
$$
\begin{aligned}
J_l(\phi) &= I + \frac{1-\cos\theta}{\theta^2}\phi^\wedge + \frac{\theta - \sin\theta}{\theta^3}(\phi^\wedge)^2 \\
J_r(\phi) &= I - \frac{1-\cos\theta}{\theta^2}\phi^\wedge + \frac{\theta - \sin\theta}{\theta^3}(\phi^\wedge)^2
\end{aligned}
$$
They differ only in the sign of the $\phi^\wedge$ term, and are related by
$$
J_l(\phi) = J_r(-\phi) = J_r(\phi)^T = \mathrm{Exp}(\phi)J_r(\phi)
$$
The first two equalities follow from $(\phi^\wedge)^T = -\phi^\wedge$. The last is the adjoint relation, which comes from the defining property of the two Jacobians: a perturbation $\delta$ of the tangent vector maps through $\mathrm{Exp}$ either from the left or from the right,
$$
\mathrm{Exp}(\phi + \delta) = \mathrm{Exp}\big(J_l(\phi)\delta\big)\mathrm{Exp}(\phi) = \mathrm{Exp}(\phi)\mathrm{Exp}\big(J_r(\phi)\delta\big)
$$
to first order in $\delta$. Equating the two right-hand sides and isolating the left factor,
$$
\begin{aligned}
\mathrm{Exp}\big(J_l(\phi)\delta\big) &= \mathrm{Exp}(\phi)\mathrm{Exp}\big(J_r(\phi)\delta\big)\mathrm{Exp}(-\phi) \\
&= \mathrm{Exp}\big(\mathrm{Ad}(\mathrm{Exp}(\phi))J_r(\phi)\delta\big) \\
& = \mathrm{Exp}\big(\mathrm{Exp}(\phi)J_r(\phi)\delta\big)
\end{aligned}
$$

The last line is because of conjugation identity
$$
\mathrm{Ad}(\mathrm{Exp}(\phi)) = \mathrm{Exp}(\phi)
$$
for $\mathrm{Exp}(\phi) \in SO(3)$

At typical IMU rates the per-sample rotation $|a_{k-1}|$ is small, so
$$
J_l(a_{k-1}) = I + \tfrac12 a_{k-1}^\wedge + O(|a_{k-1}|^2) \approx I
$$
and the factor is often dropped.

Therefore, by stacking the Jacobian of $\delta \phi_k$ on $\delta \phi_{k-1}, \delta b_{g, k-1}, \delta s_{k-1}, \eta_g$ into matrix
$$
F_k = \begin{bmatrix}
I & -R_{k-1} J_l(a_{k-1}) S_{k-1} \Delta t & R_{k-1} J_l(a_{k-1}) \mathrm{diag}(\tilde\omega_{k-1} - b_{g,k-1})\Delta t & 0 & 0 & 0 \\
0 & I & 0 & 0 & 0 & 0 \\
0 & 0 & I & 0 & 0 & 0 \\
0 & 0 & 0 & I & 0 & 0 \\
0 & 0 & 0 & 0 & I & 0 \\
0 & 0 & 0 & 0 & 0 & I
\end{bmatrix} \in \mathbb{R}^{18 \times 18}
$$

$$
R_k = \begin{bmatrix}
(R_{k-1} J_l(a_{k-1}) S_{k-1} \Delta t)\Sigma_g(R_{k-1} J_l(a_{k-1}) S_{k-1} \Delta t)^T & 0 & 0 & 0 & 0 & 0 \\
0 & \Sigma_{bg}\Delta t & 0 & 0 & 0 & 0 \\
0 & 0 & 0 & 0 & 0 & 0 \\
0 & 0 & 0 & \Sigma_{ba}\Delta t & 0 & 0 \\
0 & 0 & 0 & 0 & 0 & 0 \\
0 & 0 & 0 & 0 & 0 & 0
\end{bmatrix} \in \mathbb{R}^{18 \times 18}
$$

Proof

The nominal orientation advances by the increment $\mathrm{Exp}(S_{k-1}(\tilde\omega_{k-1} - b_{g,k-1})\Delta t)$; the true orientation advances by the same increment perturbed by the angular-velocity error
$$
\delta\omega = -S_{k-1}\delta b_{g,k-1} + \mathrm{diag}(\tilde\omega_{k-1} - b_{g,k-1})\delta s_{k-1} - S_{k-1}\eta_g
$$
from the gyro bias, scale, and white-noise errors, so the true and nominal propagations are
$$
\begin{aligned}
\hat R_k &= \hat R_{k-1}\mathrm{Exp}\big((S_{k-1}(\tilde\omega_{k-1} - b_{g,k-1}) + \delta\omega)\Delta t\big) = \hat R_{k-1}\mathrm{Exp}\big(a_{k-1} + \delta\omega\Delta t\big) \\
R_k &= R_{k-1}\mathrm{Exp}\big(S_{k-1}(\tilde\omega_{k-1} - b_{g,k-1})\Delta t\big) = R_{k-1}\mathrm{Exp}(a_{k-1})
\end{aligned}
$$

Inverting the nominal propagation gives
$$
R_k^{-1} = \mathrm{Exp}(- a_{k-1})R_{k-1}^{-1}
$$
Also we have
$$
\begin{aligned}
& \hat R = \mathrm{Exp}(\delta\phi)R$ \\
\Rightarrow & \mathrm{Exp}(\delta\phi_k) = \hat R_k R_k^{-1}
\end{aligned}
$$

Therefore
$$
\begin{aligned}
\mathrm{Exp}(\delta\phi_k)
&= \hat R_k R_k^{-1} \\
&= \mathrm{Exp}(\delta\phi_{k-1}) R_{k-1}\mathrm{Exp}\big(a_{k-1} + \delta\omega\Delta t\big)\mathrm{Exp}(-a_{k-1})R_{k-1}^{-1} \\
&= \mathrm{Exp}(\delta\phi_{k-1}) R_{k-1}\mathrm{Exp}(a_{k-1})\mathrm{Exp}\big(J_r(a_{k-1})\delta\omega\Delta t\big)\mathrm{Exp}(-a_{k-1})R_{k-1}^{-1} \\
&= \mathrm{Exp}(\delta\phi_{k-1}) R_{k-1}\mathrm{Exp}\big(J_l(a_{k-1})\delta\omega\Delta t\big)R_{k-1}^{-1} \\
&= \mathrm{Exp}(\delta\phi_{k-1})\mathrm{Exp}(R_{k-1}J_l(a_{k-1})\delta\omega\Delta t) \\
&\approx \mathrm{Exp}(\delta\phi_{k-1} + R_{k-1}J_l(a_{k-1})\delta\omega\Delta t)
\end{aligned}
$$
using the nominal increment $a_{k-1}$ defined above. The two middle lines are where the Jacobian appears, so it is worth unpacking them:

  • Right Jacobian (second line). For a small perturbation $\delta$, the exponential does not split additively; the first-order correction is the right Jacobian, $\mathrm{Exp}(a + \delta) = \mathrm{Exp}(a)\mathrm{Exp}\big(J_r(a)\delta\big)$. This factors the perturbed increment into the nominal increment $\mathrm{Exp}\big(a_{k-1} + \delta\omega\Delta t\big) = \mathrm{Exp}(a_{k-1}) \mathrm{Exp}\big(J_r(a_{k-1})\delta\omega\Delta t\big)$
  • Adjoint turns $J_r$ into $J_l$ (third line). The nominal $\mathrm{Exp}(a_{k-1})$ cancels the $\mathrm{Exp}(-a_{k-1})$ coming from $R_k^{-1}$, but only after conjugating the small rotation sitting between them. For any Lie group $g\mathrm{Exp}(\xi)g^{-1} = \mathrm{Exp}(\mathrm{Ad}(g)\xi)$, and in $SO(3)$ the adjoint is the rotation itself, $\mathrm{Ad}(\mathrm{Exp}(a)) = \mathrm{Exp}(a)$. Therefore,
    $$
    \mathrm{Exp}(a_{k-1})\mathrm{Exp}\big(J_r(a_{k-1})\delta\omega\Delta t\big)\mathrm{Exp}(-a_{k-1}) = \mathrm{Exp}\big(\mathrm{Exp}(a_{k-1})J_r(a_{k-1})\delta\omega\Delta t\big)
    $$
    Also we have
    $$
    J_l(a) = \mathrm{Exp}(a)J_r(a)
    $$
    so the increment collapses to $\mathrm{Exp}\big(J_l(a_{k-1})\delta\omega\Delta t\big)$.
  • The fourth line applies the same adjoint once more, now with $R_{k-1}$, to rotate the body-frame error into the world frame, $R_{k-1}\mathrm{Exp}(\xi)R_{k-1}^{-1} = \mathrm{Exp}(R_{k-1}\xi)$
  • The last line is the first-order BCH approximation $\mathrm{Exp}(x)\mathrm{Exp}(y) \approx \mathrm{Exp}(x + y)$. The left Jacobian (rather than the right) survives precisely because the perturbation is defined on the left, $\hat R = \mathrm{Exp}(\delta\phi)R$.

Substituting $\delta\omega$ gives the orientation error row

$$
\delta\phi_k = \delta\phi_{k-1} + R_{k-1}J_l(a_{k-1})\big(-S_{k-1}\delta b_{g,k-1} + \mathrm{diag}(\tilde\omega_{k-1} - b_{g,k-1})\delta s_{k-1} - S_{k-1}\eta_g\big)\Delta t
$$

whose three terms fill the gyro-bias and scale blocks of the orientation row of $F_k$, and the gyro white-noise $\eta_g$ contribution to $R_k$.

Update with accel

Accel measures the specific force in the IMU frame. At step $k$, its measurement $\tilde a_k$ has following observation equation.
$$
\tilde a_k = \mathrm{Exp}(\hat\phi_k)^T(a_k - g) + \hat b_{a,k} + \eta_a
$$
where

  • $a_k$ is the acceleration under world space
  • $\eta_a \sim \mathcal{N}(0, \Sigma_a)$ is the random noise of accel

When the IMU is quasi-static ($a_k \approx 0$), the observation equation reduces to
$$
\tilde a_k = -\mathrm{Exp}(\hat\phi_k)^T g + \hat b_{a,k} + \eta_a
$$
so the measured gravity direction pins down roll and pitch.

The Jacobian are as follows
$$
\begin{aligned}
H_k &= \begin{bmatrix} -\mathrm{Exp}(\hat \phi_k)^T g^\wedge & 0 & 0 & I_{3\times 3} & 0 & 0 \end{bmatrix} \in \mathbb{R}^{3 \times 18} \\
Q_k &= \Sigma_a \in \mathbb{R}^{3 \times 3}
\end{aligned}
$$

Proof

$$
\begin{aligned}
\frac{\partial (\tilde a)}{\partial (\delta\phi)}
&= \lim_{\psi \to 0} \frac{-(\mathrm{Exp}(\psi)\hat R)^T g + \hat{R}^T g}{\psi} \\
&= \lim_{\psi \to 0} \frac{-((I_{3\times 3} + \psi^\wedge)\hat R)^T g + \hat{R}^T g}{\psi} \\
&= \lim_{\psi \to 0} \frac{-(\psi^\wedge \hat R)^T g}{\psi} \\
&= \lim_{\psi \to 0} \frac{-\hat R^T (\psi^\wedge)^T g}{\psi} \\
&= \lim_{\psi \to 0} \frac{\hat R^T (\psi^\wedge) g}{\psi} \\
&= \lim_{\psi \to 0} \frac{-\hat R^T g^\wedge \psi}{\psi} \\
&= - \mathrm{Exp}(\hat \phi)^T g^\wedge
\end{aligned}
$$

$$
\frac{\partial (\tilde a)}{\partial (\delta b_a)} = \frac{\partial (\tilde a)}{\partial (\delta \hat b_a)} \frac{\partial (\delta \hat b_a)}{\partial (\delta b_a)} = I_{3 \times 3}
$$

Update with mag

Mag measures the local geomagnetic field in the IMU frame. At step $k$, its measurement $\tilde m_k$ has following observation equation.
$$
\tilde m_k = \mathrm{Exp}(\hat\phi_k)^T \hat f_{k} + \hat b_{m,k} + \eta_m
$$
where

  • $\hat f_{k}$ is the geomagnetic field in world reference frame
  • $\eta_m \sim \mathcal{N}(0, \Sigma_m)$ is the random noise of mag

Because the field has a horizontal (north-pointing) component, the measured field direction pins down the yaw that gravity alone cannot observe.

The Jacobian are as follows
$$
\begin{aligned}
H_k &= \begin{bmatrix} \mathrm{Exp}(\hat\phi_k)^T \hat f_{k}^\wedge & 0 & 0 & 0 & I_{3\times 3} & \mathrm{Exp}(\hat\phi_k)^T \end{bmatrix} \in \mathbb{R}^{3 \times 18} \\
Q_k &= \Sigma_m \in \mathbb{R}^{3 \times 3}
\end{aligned}
$$

Proof

$$
\begin{aligned}
\frac{\partial (\tilde m)}{\partial (\delta\phi)}
&= \lim_{\psi \to 0} \frac{(\mathrm{Exp}(\psi)\hat R)^T \hat f - \hat{R}^T \hat f}{\psi} \\
&= \lim_{\psi \to 0} \frac{((I_{3\times 3} + \psi^\wedge)\hat R)^T \hat f - \hat{R}^T \hat f}{\psi} \\
&= \lim_{\psi \to 0} \frac{(\psi^\wedge \hat R)^T \hat f}{\psi} \\
&= \lim_{\psi \to 0} \frac{\hat R^T (\psi^\wedge)^T \hat f}{\psi} \\
&= \lim_{\psi \to 0} \frac{-\hat R^T \psi^\wedge \hat f}{\psi} \\
&= \lim_{\psi \to 0} \frac{\hat R^T \hat f^\wedge \psi}{\psi} \\
&= \mathrm{Exp}(\hat \phi)^T \hat f^\wedge
\end{aligned}
$$

$$
\begin{aligned}
\frac{\partial (\tilde m)}{\partial (\delta b_m)} &= \frac{\partial (\tilde m)}{\partial (\delta \hat b_m)} \frac{\partial (\delta \hat b_m)}{\partial (\delta b_m)} = I_{3 \times 3} \\
\frac{\partial (\tilde m)}{\partial (\delta f)} &= \frac{\partial (\tilde m)}{\partial (\delta \hat f)} \frac{\partial (\delta \hat f)}{\partial (\delta f)} = \mathrm{Exp}(\hat\phi)^T
\end{aligned}
$$

Inertial Navigation with ESKF

TBD

Visual + IMU Fusion

TBD

Tightly Coupled

Tightly coupled method could be recursive estimation like IEKF, batch estimation. To achieve the realtime, sliding window batch estimation with IMU pre-integration are usually used.

Sliding Window Management

  • Which frames and mappoints to remove
  • How to marginalize

Backend Optimization

The mappoint position could be optimized in frontend, or it could be optimized in backend to leave frontend only optimize poses.

Frontend processing needs to estimate each frame pose, while the backend needs to do more processing on the keyframes, to reconstruct map and optimize more to get accurate pose.

Map Management

  • How to construct backend map
  • How to optimize the mappoint positions