Notations

A large part of robotics is about developing machines that perceives and interact with the environment. For that robots use sensors to collect and process data, and knowing what the data describes is of important for the robot to navigate and interact with the environment. The following notes will be using the frame notation as described by Paul Furgale (see link). The aim is to mitigate the ambiguity that arises when describing robot poses, sensor data and more.

A vector expressed in the world frame, $\frame_{W}$, is written as $\pos_{W}$. Or more precisely if the vector describes the position of the camera frame, $\frame_{C}$, expressed in $\frame_{W}$, the vector can be written as $\pos_{WC}$. The left hand subscripts indicates the coordinate system the vector is expressed in, while the right-hand subscripts indicate the start and end points. For brevity if the vector has the same start point as the frame to which it is expressed in, the same vector can be written as $\pos_{WC}$. Similarly a transformation of a point from $\frame_{W}$ to $\frame_{C}$ can be represented by a homogeneous transform matrix, $\tf_{WC}$, where its rotation matrix component is written as $\rot_{WC}$ and the translation component written as $\pos_{WC}$. A rotation matrix that is parametrized by quaternion $\quat_{WC}$ is written as $\rot\{\quat_{WC}\}$.

$$ \begin{align} &\text{Position:} \enspace & \pos_{WB} \\ &\text{Velocity:} \enspace & \vel_{WB} \\ &\text{Acceleration:} \enspace & \acc_{WB} \\ &\text{Angular velocity:} \enspace & \angvel_{WB} \\ &\text{Rotation:} \enspace & \rot_{WB} \\ &\text{Transform:} \enspace & \tf_{WB} \\ \end{align} $$

Quaternions

A quaternion, $\vec{q} \in \real^{4}$, generally has the following form $$ \quat = q_{w} + q_{x} \mathbf{i} + q_{y} \mathbf{j} + q_{z} \mathbf{k}, $$

where $\{ q_{w}, q_{x}, q_{y}, q_{z} \} \in \real$ and $\{ \mathbf{i}, \mathbf{j}, \mathbf{k} \}$ are the imaginary numbers satisfying

$$ \begin{align} &\mathbf{i}^{2} = \mathbf{j}^{2} = \mathbf{k}^{2} = \mathbf{ijk} = -1 \\ \mathbf{ij} = -\mathbf{ji} &= \mathbf{k}, \enspace \mathbf{jk} = -\mathbf{kj} = \mathbf{i}, \enspace \mathbf{ki} = -\mathbf{ik} = \mathbf{j} \end{align} $$

corresponding to the Hamiltonian convention. The quaternion can be written as a 4 element vector consisting of a \textit{real} (\textit{scalar}) part, $q_{w}$, and \textit{imaginary} (\textit{vector}) part $\quat_{v}$ as,

$$ \quat = \begin{bmatrix} q_{w} \\ \quat_{v} \end{bmatrix} = \begin{bmatrix} q_{w} \\ q_{x} \\ q_{y} \\ q_{z} \end{bmatrix} $$

There are other quaternion conventions, for example, the JPL convention. A more detailed discussion between Hamiltonian and JPL quaternion convention is discussed in \cite{Sola2017}.

Main Quaternion Properties

Sum

Let $\vec{p}$ and $\vec{q}$ be two quaternions, the sum of both quaternions is,

$$ \vec{p} \pm \vec{q} = \begin{bmatrix} p_w \\ \vec{p}_{v} \end{bmatrix} \pm \begin{bmatrix} q_w \\ \vec{q}_{v} \end{bmatrix} = \begin{bmatrix} p_w \pm q_w \\ \vec{p}_{v} \pm \vec{q}_{v} \end{bmatrix}. $$

The sum between two quaternions $\vec{p}$ and $\vec{q}$ is **commutative** and **associative**.

$$ \vec{p} + \vec{q} = \vec{q} + \vec{p} $$ $$ \vec{p} + (\vec{q} + \vec{r}) = (\vec{p} + \vec{q}) + \vec{r} $$

Product

The quaternion multiplication (or product) of two quaternions $\vec{p}$ and $\vec{q}$, denoted by $\otimes$ is defined as

$$ \begin{align} \vec{p} \otimes \vec{q} &= (p_w + p_x \mathbf{i} + p_y \mathbf{j} + p_z \mathbf{k}) (q_w + q_x \mathbf{i} + q_y \mathbf{j} + q_z \mathbf{k}) \\ &= \begin{matrix} &(p_w q_w - p_x q_x - p_y q_y - p_z q_z)& \\ &(p_w q_x + p_x q_w + p_y q_z - p_z q_y)& \mathbf{i}\\ &(p_w q_y - p_y q_w + p_z q_x + p_x q_z)& \mathbf{j}\\ &(p_w q_z + p_z q_w - p_x q_y + p_y q_x)& \mathbf{k}\\ \end{matrix} \\ &= \begin{bmatrix} p_w q_w - p_x q_x - p_y q_y - p_z q_z \\ p_w q_x + p_x q_w + p_y q_z - p_z q_y \\ p_w q_y - p_y q_w + p_z q_x + p_x q_z \\ p_w q_z + p_z q_w - p_x q_y + p_y q_x \\ \end{bmatrix} \\ &= \begin{bmatrix} p_w q_w - \vec{p}_{v}^{T} \vec{q}_{v} \\ p_w \vec{q}_{v} + q_w \vec{p}_{v} + \vec{p}_{v} \times \vec{q}_{v} \end{bmatrix}. \end{align} $$

The quaternion product is not commutative in the general case. There are exceptions to the general non-commutative rule, where either $\vec{p}$ or $\vec{q}$ is real such that $\vec{p}_{v} \times \vec{q}_{v} = 0$, or when both $\vec{p}_v$ and $\vec{q}_v$ are parallel, $\vec{p}_v || \vec{q}_v$. Only in these cirmcumstances is the quaternion product commutative.,

$$ {\vec{p} \otimes \vec{q} \neq \vec{q} \otimes \vec{p}} \enspace . $$

The quaternion product is however **associative**,

$$ \vec{p} \otimes (\vec{q} \otimes \vec{r}) = (\vec{p} \otimes \vec{q}) \otimes \vec{r} $$

and **distributive over the sum**

$$ \vec{p} \otimes (\vec{q} + \vec{r}) = \vec{p} \otimes \vec{q} + \vec{p} \otimes \vec{r} \quad \text{and} \quad (\vec{p} \otimes \vec{q}) + \vec{r} = \vec{p} \otimes \vec{r} + \vec{q} \otimes \vec{r} $$

The quaternion product can alternatively be expressed in matrix form as

$$ \vec{p} \otimes \vec{q} = [\vec{p}]_{L} \vec{q} \quad \text{and} \quad \vec{p} \otimes \vec{q} = [\vec{q}]_{R} \vec{p} \enspace , $$

where $[\vec{p}]_{L}$ and $[\vec{q}]_{R}$ are the left and right quaternion-product matrices which are derived from \eqref{eq:quaternion_product},

$$ [\vec{p}]_{L} = \begin{bmatrix} p_w & -p_x & -p_y & -p_z \\ p_x & p_w & -p_z & p_y \\ p_y & p_z & p_w & -p_x \\ p_z & -p_y & p_x & p_w \end{bmatrix}, \quad \text{and} \quad [\vec{q}]_{R} = \begin{bmatrix} q_w & -q_x & -q_y & -q_z \\ q_x & q_w & q_z & -q_y \\ q_y & -q_z & q_w & q_x \\ q_z & q_y & -q_x & q_w \end{bmatrix}, $$

or inspecting a compact form can be derived as,

$$ [\vec{p}]_{L} = \begin{bmatrix} 0 & -\vec{p}_{v}^{T} \\ \vec{p}_w \I_{3 \times 3} + \vec{p}_{v} & \vec{p}_w \I_{3 \times 3} -\vee{\vec{p}_{v}} \end{bmatrix} $$

and

$$ [\vec{q}]_{R} = \begin{bmatrix} 0 & -\vec{q}_{v}^{T} \\ \vec{q}_w \I_{3 \times 3} + \vec{q}_{v} & \vec{q}_w \I_{3 \times 3} -\vee{\vec{q}_{v}} \end{bmatrix}, $$

where $\vee{\bullet}$ is the skew operator that produces a matrix cross product matrix, and is defined as,

$$ \vee{\vec{v}} = \begin{bmatrix} 0 & -v_{3} & v_{2} \\ v_{3} & 0 & -v_{1} \\ -v_{2} & v_{1} & 0 \end{bmatrix}, \quad \vec{v} \in \real^{3} $$

Conjugate

The conjugate operator for quaternion, ${(\bullet)}^{\ast}$, is defined as

$$ \quat^{\ast} = \begin{bmatrix} q_w \\ - \vec{q}_v \end{bmatrix} = \begin{bmatrix} q_w \\ - q_x \\ - q_y \\ - q_z \end{bmatrix}. $$

This has the properties

$$ \quat \otimes \quat^{-1} = \quat^{-1} \otimes \quat = q_{w}^{2} + q_{x}^{2} + q_{y}^{2} + q_{z}^{2} = \begin{bmatrix} q_{w}^{2} + q_{x}^{2} + q_{y}^{2} + q_{z}^{2} \\ \vec{0} \end{bmatrix}, $$ and $$ (\vec{p} \otimes \vec{q})^{\ast} = \vec{q}^{\ast} \otimes \vec{p}^{\ast}. $$

Norm

The norm of a quaternion is defined by

$$ \begin{align} \norm{\quat} &= \sqrt{\quat \otimes \quat^{\ast}} \\ &= \sqrt{\quat^{\ast} \otimes \quat} \\ &= \sqrt{q_{w}^{2} + q_{x}^{2} + q_{y}^{2} + q_{z}^{2}} \enspace \in \real, \end{align} $$

and has the property

$$ \norm{\vec{p} \otimes \vec{q}} = \norm{\vec{q} \otimes \vec{p}} = \norm{\vec{p}} \norm{\vec{q}} $$

Quaternion from Two Vectors

TODO: Need to reword the beginning. Using the properties of the cross and dot product $$ \begin{align} \vec{u} \cdot \vec{v} &= \norm{\vec{u}} \norm{\vec{v}} \cos \theta \\ \norm{\vec{u} \times \vec{v}} &= \norm{\vec{u}} \norm{\vec{v}} \norm{\sin \theta} , \end{align} $$

the axis angle, $\boldsymbol{\theta} \in \real^{3}$, can be obtained from $\vec{u}$ and $\vec{v}$ with

$$ \begin{align} %-- Axis-angle \boldsymbol{\theta} &= \theta \vec{e} \\ % -- Angle \theta &= \cos^{-1}( \dfrac{\vec{u} \cdot \vec{v}} {\norm{\vec{u}} \norm{\vec{v}}} ) \quad , \enspace \theta \in \real \\ % -- Axis \vec{e} &= \dfrac{\vec{u} \times \vec{v}}{\norm{\vec{u} \times \vec{v}}} \quad , \enspace \vec{e} \in \real^{3} \end{align} $$

where $\vec{e}$ is the unit vector that defines the rotation axis and $\theta$ is the rotation angle about $\vec{e}$. Once the axis angle, $\boldsymbol{\theta}$, is obtained a quaternion can be formed

$$ \quat = \cos \dfrac{\theta}{2} + \vec{i} \sin \dfrac{\theta}{2} e_{x} + \vec{j} \sin \dfrac{\theta}{2} e_{y} + \vec{k} \sin \dfrac{\theta}{2} e_{z} $$

Example: Attitude from gravity and accelerometer vectors

In robotics knowing the attitude of the system is often required. An Inertial Measurement Unit (IMU) is commonly used to obtain this information. Using the method described previously, a gravity vector along with an accelerometer measurement vector can be used to obtain an attitude in form of a quaternion.

Let $\vec{g} \in \Real{3}$ be the gravity vector, and $\vec{a}_{m} \in \Real{3}$ be the accelerometer measurement from an IMU. With the two vectors $\vec{g}$ and $\vec{a}_{m}$ a quaternion $\quat_{WS}$ expressing the rotation of the IMU sensor frame, $\frame_{S}$, with respect to the world frame, $\frame_{W}$, can be calculated given that values for $\vec{g}$ and $\vec{a}_{m}$ are known. For example let

$$ \begin{align} % -- Gravity vector \vec{g} &= \begin{bmatrix} 0 & 0 & -9.81 \end{bmatrix}^{T} \\ % -- Accelerometer measurement vector \vec{a}_{m} &= \begin{bmatrix} 9.2681 & -0.310816 & -3.14984 \end{bmatrix}^{T} , \end{align} $$

taken from the first measurement of the imu_april calibration sequence of the EuRoC MAV dataset.

Before calculating the axis-angle, however, it should be noted that when an accelerometer is at rest the measurement reading in the z-axis is positive instead of negative. The reason is accelerometers measures acceleration by measuring the displacement of a proof mass that is suspended with springs. For example, if gravity is ignored and the accelerometer moves upwards, the proof mass will be displaced towards the bottom of the accelerometer. This is interpreted as an acceleration in the upwards direction, and so when the accelerometer is at rest on a flat surface, gravity pulls on the proof mass yeilding a positive measurement in the upwards direction. To resolve this issue the gravity vector is negated, and so $\vec{u} = -\vec{g}$ and $\vec{v} = \vec{a}_{m}$. Using :eq:$axis_angle$ the axis-angle obtained is:

$$ \begin{align} % -- Axis-Angle \theta &= 1.8982 \\ \vec{e} &= \Transpose{ \begin{bmatrix} 0.03352 & 0.99944 & 0.00000 \end{bmatrix} } \end{align} $$ Finally the quaternion, $\quat_{WS}$, can be calculated using :eq:$axis_angle_to_quaternion$ resulting in $$ \begin{align} \quat_{WS} = \Transpose{ \begin{bmatrix} 0.58240 & 0.02725 & 0.81245 & 0.00000 \end{bmatrix} } \enspace . \end{align} $$

Quaternion to Rotation Matrix

$$ \rot\{\quat \} = \begin{bmatrix} q_w^2 + q_x^2 - q_y^2 - q_z^2 & 2(q_x q_y - q_w q_z) & 2(q_x q_z + q_w q_y) \\ 2(q_x q_y + q_w q_z) & q_w^2 - q_x^2 + q_y^2 - q_z^2 & 2(q_y q_z - q_w q_x) \\ 2(q_x q_y - q_w q_y) & 2(q_y q_z + q_w q_x) & q_w^2 - q_x^2 - q_y^2 + q_z^2 \end{bmatrix} $$

Rotation Matrix to Quaternion

$$ \begin{align} q_w &= \dfrac{\sqrt{1 + m_{11} + m_{22} + m_{33}}}{2} \\ q_x &= \dfrac{m_{32} - m_{23}}{4 q_w} \\ q_y &= \dfrac{m_{13} - m_{31}}{4 q_w} \\ q_z &= \dfrac{m_{21} - m_{02}}{4 q_w} \end{align} $$ Note, while the equations seems straight forward in practice, however,the trace of the rotation matrix need to be checked inorder to guarantee correctness.

LU Decomposition

Lower–Upper (LU) decomposition or factorization factors a matrix as the product of a lower triangular matrix and an upper triangular matrix. The product sometimes includes a permutation matrix as well. LU decomposition can be viewed as the matrix form of Gaussian elimination. Computers usually solve square systems of linear equations using LU decomposition, and it is also a key step when inverting a matrix or computing the determinant of a matrix.

Let $\mat{A}$ be a square matrix. An LU factorization refers to the factorization of $\mat{A}$, with proper row and/or column orderings or permutations, into two factors a lower triangular matrix $\mat{L}$ and an upper triangular matrix $\mat{U}$:

$$ \mat{A} = \mat{L} \mat{U} $$

In the lower triangular matrix all elements above the diagonal are zero, in the upper triangular matrix, all the elements below the diagonal are zero. For example, for a $3 \times 3$ matrix $\mat{A}$, its $\mat{LU}$ decomposition looks like this:

$$ \begin{bmatrix} a_{11} & a_{12} & a_{13} \\ a_{21} & a_{22} & a_{23} \\ a_{31} & a_{32} & a_{33} \end{bmatrix} = \begin{bmatrix} l_{11} & 0 & 0 \\ l_{21} & l_{22} & 0 \\ l_{31} & l_{32} & l_{33} \end{bmatrix} \begin{bmatrix} u_{11} & u_{12} & u_{13} \\ 0 & u_{22} & u_{23} \\ 0 & 0 & u_{33} \end{bmatrix} $$

Without a proper ordering or permutations in the matrix, the factorization may fail to materialize. For example, it is easy to verify (by expanding the matrix multiplication) that $a_{11} = l_{11} u_{11}$. If $a_{11} = 0$, then at least one of $l_{11}$ and $u_{11}$ has to be zero, which implies that either $\mat{L}$ or $\mat{U}$ is singular. This is impossible if $\mat{A}$ is non-singular (invertible). This is a procedural problem. It can be removed by simply reordering the rows of A so that the first element of the permuted matrix is non-zero. The same problem in subsequent factorization steps can be removed the same way; see the basic procedure below.

Cholesky Decomposition

Cholesky decomposition or Cholesky factorization is a decomposition of a Hermitian, positive-definite matrix into the product of a lower triangular matrix and its conjugate transpose.

$$ \mat{A} = \mat{L} \mat{L}^{\ast} $$

where $\mat{L}$ is a lower triangular matrix with real and positive diagonal entries, and $\mat{L}^{\ast}$ is the conjugate transpose of $\mat{L}$.

Singular Value Decomposition (SVD)

$$ \mat{A} = \mat{U} \mat{\Sigma} \mat{V}^{\ast} $$

How to compute SVD

Compute eigenvalues and eigenvectors of $\mat{A}^{T} \mat{A}$ $$ \begin{align} \mat{A}^{T} \mat{A} \vec{v}_{1} &= \lambda_{1} \vec{v}_{1} \\ &\vdots \\ \mat{A}^{T} \mat{A} \vec{v}_{n} &= \lambda_{n} \vec{v}_{n} \\ \end{align} $$
Make matrix $\mat{V}$ from the vectors $\vec{v}_{i}$ $$ \mat{V} = \begin{bmatrix} \vert & & \vert \\ \vec{v}_{1} & \dots & \vec{v}_{n} \\ \vert & & \vert \\ \end{bmatrix} $$
Make a diagonal matrix $\mat{\Sigma}$ from the square roots of the eigen values. $$ \mat{\Sigma} = \begin{bmatrix} \sqrt{\lambda_1} & 0 & 0 & 0 & 0 \\ 0 & \sqrt{\lambda_2} & 0 & 0 & 0 \\ 0 & 0 & \ddots & 0 & 0 \\ 0 & 0 & 0 & \ddots & 0 \\ 0 & 0 & 0 & 0 & \sqrt{\lambda_{n}} \end{bmatrix} $$
Find $\mat{U}$ from $$ \begin{align} \mat{A} &= \mat{U} \mat{\Sigma} \mat{V}^{T} \\ \mat{A} \mat{V} &= \mat{U} \mat{\Sigma} \\ \mat{U} &= \mat{A} \mat{V} \mat{\Sigma}^{-1} \end{align} $$

Inverting a Matrix

Inverting a matrix in linear algebra is often avoidable and recommended.

Inverting a Matrix with Cholesky Decomposition
Pseudo Inverse of a Matrix with SVD

Inverting a Matrix with Cholesky Decomposition

Let matrix $\mat{A}$ be invertible, using the identity: $$ \mat{A} \mat{A}^{-1} = \eye $$ we can solve for the inverse of $\mat{A}$ by using Cholesky decomposition, but first lets rewrite the above by first decomposing $\mat{A} = \mat{L} \mat{L}^{T}$ and rearrange the terms such that we can take advantage of back substition, $$ \begin{align} (\mat{L} \mat{L}^{T}) \mat{A}^{-1} &= \eye \\ (\mat{L}^{T}) \mat{L} \mat{A}^{-1} &= \eye \\ \mat{L} \mat{A}^{-1} &= (\mat{L}^{T})^{-1} \\ \end{align} $$

Pseudo Inverse of a Matrix with SVD

$$ \mat{A} = \mat{U} \mat{\Sigma} \mat{V}^{T} $$ $$ \begin{align} \mat{A}^{\dagger} &= (\mat{A}^{T} \mat{A})^{-1} \mat{A}^{T} \\ &= ((\mat{U} \mat{\Sigma} \mat{V}^{T})^{T} (\mat{U} \mat{\Sigma} \mat{V}^{T}))^{-1} (\mat{U} \mat{\Sigma} \mat{V}^{T})^{T} \\ &= ((\mat{V} \mat{\Sigma} \mat{U}^{T} \mat{U} \mat{\Sigma} \mat{V}^{T})^{-1} (\mat{U} \mat{\Sigma} \mat{V}^{T})^{T} \\ &= (\mat{V} \mat{\Sigma}^{2} \mat{V}^{T})^{-1} \mat{V} \mat{\Sigma} \mat{U}^{T} \\ &= (\mat{V}^{T})^{-1} \mat{\Sigma}^{-2} \mat{V}^{-1} \mat{V} \mat{\Sigma} \mat{U}^{T} \\ &= \mat{V} \mat{\Sigma}^{-2} \mat{\Sigma} \mat{U}^{T} \\ &= \mat{V} \mat{\Sigma}^{-1} \mat{U}^{T} \\ \end{align} $$ where $\mat{\Sigma}^{-1}$ is the pseudo inverse of $\mat{\Sigma}$, which is formed by replacing everey non-zero diagonal entry by its reciprocal and transposing the resulting matrix.

Invert Lower Triangular Matrix

If we have a lower triangular matrix $\mat{L}$ and our objective is to find $\mat{L}^{-1}$, we can use the property

$$ \begin{align} \mat{L}\mat{L}^{-1} = \eye, \end{align} $$

and use forward-substitution to solve for $\mat{L}^{-1}$ column by column, staring from the first column $j$.

$$ \begin{align} \begin{bmatrix} l_{11} & 0 & \dots & 0 \\ l_{21} & l_{22} & \dots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ l_{m1} & l_{m2} & \dots & l_{mn} \end{bmatrix} \begin{bmatrix} a_{1j} \\ a_{2j} \\ \vdots \\ a_{mj} \end{bmatrix} = \begin{bmatrix} 1 \\ 0 \\ \vdots \\ 0 \end{bmatrix} \end{align} $$ $$ \begin{align} & l_{11} a_{1j} = 1 \\ & l_{21} a_{1j} + l_{22} a_{2j} = 0 \\ & \qquad\qquad\qquad\vdots \\ & l_{m1} a_{1j} + l_{m2} a_{2j} + \dots + l_{mn} a_{mj} = 0 \end{align} $$ $$ \begin{align} & a_{1j} = 1 / l_{11} \\ & a_{2j} = - l_{21} a_{1j} / l_{22} \\ & \qquad\qquad\qquad\vdots \\ & a_{mj} = (-l_{m1} a_{1j} - l_{m2} a_{2j} - \dots - l_{m,j-1} a_{m-1,j}) / l_{mn} \end{align} $$ $$ \begin{align} a_{ij} = \dfrac{I_{ij} - \sum_{j=i+1}^{1} l_{ij} a_{i}}{l_{ii}} \quad \text{where} \; i = n, n - 1, \cdots, 1 \end{align} $$

Condition Number

There are different condition numbers. In the following the condition number for the problem $\mat{A} \vec{x} = \vec{b}$ and matrix inversion are discussed. In general, the condition number, $\kappa(\cdot)$, for a matrix, $\mat{A}$, or computational task such as $\mat{A} \vec{x} = \vec{b}$ measures how sensitive the output is to perturbations in the input data and to round off errors. If the condition number is large, even a small error in $\vec{x}$ would cause a large error in $\vec{x}$. On the other hand, if the condition number is small then the error in $\vec{x}$ will not be much bigger than the error in $\vec{b}$.

$$ \begin{align} \kappa(\mat{A}) &\approx 1 \quad \text{well-Conditioned} \\ \kappa(\mat{A}) &> 1 \quad \text{ill-Conditioned} \end{align} $$

The condition number is defined more precisely to be the maximum ratio of the relative error in $\vec{x}$ to the relative error in $\vec{b}$.

Let $\vec{e}$ be the error in $\vec{b}$. Assuming that $\mat{A}$ is a nonsingular matrix, the error in the solution $\mat{A}^{-1} \vec{b}$ is $\mat{A}^{-1} \vec{e}$. The ratio of the relative error in the solution to the relative error in $\vec{b}$ is

$$ \dfrac{ \dfrac{\norm{\mat{A}^{-1} \vec{e}}}{\norm{\mat{A}^{-1} \vec{b}}} }{ \dfrac{\norm{\vec{e}}}{\norm{\vec{b}}} } $$

which can be rewritten as,

$$ \left( \dfrac{\norm{\mat{A}^{-1} \vec{e}}}{\norm{\vec{e}}} \right) \cdot \left( \dfrac{\norm{\vec{b}}}{\norm{\mat{A}^{-1} \vec{b}}} \right) $$

The maximum value (for nonzero $\vec{b}$ and $\vec{e}$) is then seen to be the product of the two operator norms as follows:

$$ \begin{align} % -- LINE 1 &\max_{\vec{e}, \vec{b} \neq 0} \left\{ \left( \dfrac{\norm{\mat{A}^{-1} \vec{e}}}{\norm{\vec{e}}} \cdot \dfrac{\norm{\vec{b}}}{\norm{\mat{A}^{-1} \vec{b}}} \right) \right\} \\ % -- LINE 2 &= \max_{\vec{e}, \vec{b} \neq 0} \left\{ \left( \dfrac{\norm{\mat{A}^{-1} \vec{e}}}{\norm{\vec{e}}} \right) \right\} \cdot \max_{\vec{e}, \vec{b} \neq 0} \left\{ \left( \dfrac{\norm{\vec{b}}}{\norm{\mat{A}^{-1} \vec{b}}} \right) \right\} \\ % -- LINE 3 &= \max_{\vec{e}, \vec{b} \neq 0} \left\{ \left( \dfrac{\norm{\mat{A}^{-1} \vec{e}}}{\norm{\vec{e}}} \right) \right\} \cdot \max_{\vec{e}, \vec{b} \neq 0} \left\{ \left( \dfrac{\norm{\mat{A} \vec{x}}}{\norm{\vec{x}}} \right) \right\} \\ % -- LINE 4 &= \norm{\mat{A}^{-1}} \cdot \norm{\mat{A}} \end{align} $$

The same definition is used for any matrix norm, i.e. one that satisfies

$$ \kappa(\mat{A}) = \norm{\mat{A}^{-1}} \cdot \norm{\mat{A}} \geq \norm{\mat{A}^{-1} \cdot \mat{A}} = 1 . $$

When the condition number is exactly one (which can only happen if $\mat{A}$ is a scalar multiple of a linear isometry), then a solution algorithm can find (in principle, meaning if the algorithm introduces no errors of its own) an approximation of the solution whose precision is no worse than that of the data.

However, it does not mean that the algorithm will converge rapidly to this solution, just that it won't diverge arbitrarily because of inaccuracy on the source data (backward error), provided that the forward error introduced by the algorithm does not diverge as well because of accumulating intermediate rounding errors.

The condition number may also be infinite, but this implies that the problem is ill-posed (does not possess a unique, well-defined solution for each choice of data -- that is, the matrix is not invertible), and no algorithm can be expected to reliably find a solution.

The definition of the condition number depends on the choice of norm, as can be illustrated by two examples.

Rank

The rank $\rho(\mat{A})$ of a matrix $\mat{A}$ of $n$ rows and $m$ columns is defined as **the number of independent rows or columns**. Rank is thus a measure of the "non-degenerateness" of the system of lienar equations and linear transformation encoded by $\mat{A}$.

Full rank (non-singular): $\rho(\mat{A}) = \min(n, m)$
Not full rank (singular): $\rho(\mat{A}) < \min(n, m)$

Properties

Let $\mat{A}$ be an $m \times n$ matrix, and $\mat{B}$ be an $n \times k$ matrix, $$ \begin{align} \rank{\mat{A}} &\leq \Min{m}{n} \\ \rank{\mat{AB}} &\leq \Min{\rank{\mat{A}}}{\rank{\mat{B}}} \\ \rank{\mat{AB}} + \rank{\mat{BC}} &\leq \rank{\mat{B}} + \rank{\mat{ABC}} \\ \rank{\mat{A} + \mat{B}} &\leq \rank{\mat{A}} + \rank{\mat{B}} \\ \rank{\mat{A}^{T} \mat{A}} &= \rank{\mat{A} \mat{A}^{T}} = \rank{\mat{A}} = \rank{\mat{A}^{T}} \\ \rank{\mat{0}} &= 0 \end{align} $$

Trace

$$ \text{tr}(\mat{A}) = \sum_{i} \mat{A}_{ii} $$

The trace of a matrix $\mat{A}$ is simply the sum of all of its diagonal. For example, let $\mat{A}$ be a matrix, with

$$ \begin{align} \mat{A} = \begin{bmatrix} a_{11} & a_{12} & a_{13} \\ a_{21} & a_{22} & a_{23} \\ a_{31} & a_{32} & a_{33} \end{bmatrix} = \mat{A} = \begin{bmatrix} -1 & 0 & 3 \\ 11 & 5 & 2 \\ 6 & 12 & -5 \end{bmatrix} \end{align} $$

then

$$ \tr{\mat{A}} = \sum_{i = 1}^{3} a_{ii} = a_{11} + a_{22} + a_{33} = -1 + 5 + (-5) = -1 $$

Properties

$$ \begin{align} % Property 1 \tr{\mat{A} + \mat{B}} &= \tr{\mat{A}} + \tr{\mat{B}} \\ % Property 2 \tr{c \mat{A}} &= c \; \tr{\mat{A}} \\ % Property 3 \tr{\mat{A}} &= \tr{\mat{A}}^{T} \\ % Property 4 \tr{\mat{A}^{T} \mat{B}} &= \tr{\mat{A} \mat{B}^{T}} = \tr{\mat{B}^{T} \mat{A}} = \tr{\mat{B} \mat{A}^{T}} \\ % Property 5 \tr{\mat{A} \mat{B}} &= \tr{\mat{B} \mat{A}} \\ % Property 6 \tr{\vec{b} \vec{a}^{T}} &= \vec{a}^{T} \vec{b} \\ % Property 7 \tr{\mat{ABCD}} &= \tr{\mat{BCDA}} = \tr{\mat{CDAB}} = \tr{\mat{DABC}} & \text{Cyclic Property}\\ % Property 8 \tr{\mat{ABC}} &\neq \tr{\mat{ACB}} & \text{iff not symmetric} \\ % Property 9 \tr{\mat{AB}} &\neq \tr{\mat{A}} \; \tr{\mat{B}} \\ % Property 10 \tr{\mat{A} \otimes \mat{B}} &= \tr{\mat{A}} \; \tr{\mat{B}} & \text{Trace of Kronecker product} \\ \end{align} $$

Linear Least Squares

Linear problems generally have the form $$ \mat{A} \vec{x} = \vec{b} $$ If $\mat{A}$ is skinny (number of rows is larger than number of columns) the problem is over constrained and there is no *unique* solution. Instead, the problem can be solved by minizming the squared error between $\mat{A} \vec{x}$ and $\vec{b}$. The linear least squares problem is then defined as, $$ \min_{\vec{x}} || \mat{A} \vec{x} - \vec{b} ||^{2}_{2} $$ where the goal is to find an *approximate* solution. The local minima can be found when the derivative of the squared error is zero. First the squared error is expanded to give: $$ (\mat{A} \vec{x} - \vec{b})^{T} (\mat{A} \vec{x} - \vec{b}) \\ \vec{x}^{T} \mat{A}^{T} \mat{A} \vec{x} - 2 \vec{b}^{T} \mat{A} \vec{x} $$ then by differentiating the expanded squared error with respect to $\vec{x}$, setting the derivative to zero, and rearranging the equation with respect to $\vec{x}$ gives the following: $$ \begin{align} 2 \vec{x}^{T} \mat{A}^{T} \mat{A} - 2 \vec{b}^{T} \mat{A} &= 0 \\ \vec{x}^{T} \mat{A}^{T} \mat{A} &= \vec{b}^{T} \mat{A} \\ \mat{A}^{T} \mat{A} \vec{x} &= \mat{A}^{T} \vec{b} \\ \vec{x} &= \left( \mat{A}^{T} \mat{A} \right)^{-1} \mat{A}^{T} \vec{b} \\ \vec{x} &= \mat{A}^{\dagger} \vec{b} \enspace, \end{align} $$ where $\left( \mat{A}^{T} \mat{A} \right)^{-1} \mat{A}^{T}$ is known as the pseudo inverse $\mat{A}^{\dagger}$.

Forward Substitution

$$ \mat{L} \vec{x} = \vec{b} $$ $$ \begin{bmatrix} l_{11} & 0 & \dots & 0 \\ l_{21} & l_{22} & \dots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ l_{m1} & l_{m2} & \dots & l_{mn} \end{bmatrix} \begin{bmatrix} x_{1} \\ x_{2} \\ \vdots \\ x_{m} \end{bmatrix} = \begin{bmatrix} b_{1} \\ b_{2} \\ \vdots \\ b_{m} \end{bmatrix} $$ writing out the above, $$ \begin{align} &l_{11} x_{1} = b_{1} \\ &l_{21} x_{1} + l_{22} x_{2} = b_{2} \\ &l_{31} x_{1} + l_{32} x_{2} + l_{33} x_{3} = b_{3} \\ &\qquad\qquad\qquad\vdots \\ &l_{m,1} x_{1} + l_{m,2} x_{2} + \dots + l_{m,n} x_{n} = b_{n} \end{align} $$ and rearranging to solve for $\vec{x}$, $$ \begin{align} x_{1} &= b_{1} / l_{11} \\ x_{2} &= (b_{2} - l_{21} x_{1}) / l_{22} \\ x_{3} &= (b_{3} - l_{31} x_{1} - l_{32} x_{2} ) / l_{33} \\ &\qquad\qquad\qquad\qquad\qquad\vdots \\ x_{m} &= (b_{m} - l_{m,1} x_{1} - l_{m,2} x_{2} - \dots - l_{m,m-1} x_{m-1} ) / l_{m,n} \end{align} $$ or more generally, $$ \boxed{ x_{i} = \dfrac{b_{i} - \sum_{j=1}^{i-1} l_{ij} x_{i}}{l_{ii}} \quad \text{where} \; 1 \leq i \leq n }. $$

Backward Substitution

$$ \mat{U} \vec{x} = \vec{b} $$ $$ \begin{bmatrix} u_{11} & u_{12} & \dots & u_{1n} \\ 0 & u_{22} & \dots & u_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \dots & u_{mn} \\ \end{bmatrix} \begin{bmatrix} x_{1} \\ x_{2} \\ \vdots \\ x_{m} \end{bmatrix} = \begin{bmatrix} b_{1} \\ b_{2} \\ \vdots \\ b_{m} \end{bmatrix} $$

writing out the above,

$$ \begin{align} &u_{11} x_{1} + u_{12} x_{2} + \dots + u_{1n} x_{n} = b_{1} \\ &u_{22} x_{2} + \dots + u_{2n} x_{n} = b_{2} \\ &\qquad\qquad\qquad\vdots \\ &u_{mn} x_{n} = b_{n} \end{align} $$

and rearranging to solve for $\vec{x}$,

$$ \begin{align} x_{1} &= (b_{1} - u_{12} x_{2} - \dots - u_{1n} x_{n}) / u_{11} \\ x_{2} &= (b_{2} - u_{22} x_{3} - \dots - u_{2n} x_{n}) / u_{22} \\ &\qquad\qquad\qquad\vdots \\ x_{m} &= b_{m} / u_{mn} \end{align} $$

or more generally,

$$ \boxed{ x_{i} = \dfrac{b_{i} - \sum_{j=i+1}^{1} u_{ij} x_{i}}{u_{ii}} \quad \text{where} \; i = n, n - 1, \cdots, 1 }. $$

Solve Least Squares with SVD

To solve $\mat{A} \vec{x} = \vec{b}$ with non-singular $\mat{A} \in \real^{n \times n}$, lets factor $\mat{A}$ using SVD and rearrange the terms gives,

$$ \begin{align} \mat{A} \vec{x} &= \vec{b} \\ \mat{U} \mat{\Sigma} \mat{V}^{T} \vec{x} &= \vec{b} \\ \mat{\Sigma} \mat{V}^{T} \vec{x} &= \mat{U}^{T} \vec{b} \end{align} $$ Note: $\mat{U}$ and $\mat{V}$ are orthogonal matrices, therefore the inverse is its transpose. Let $\vec{y} = \vec{V}^{T} \vec{x}$ and subbing into the above gives, $$ \mat{\Sigma} \vec{y} = \mat{U}^{T} \vec{b}, $$ solve $\vec{y}$ via forward substitution. Once $\vec{y}$ is known solve for $\vec{x}$ in, $$ \mat{V}^{T} \vec{x} = \vec{y} $$ using back-substitution.

Solve Least Squares with QR

To solve $\mat{A} \vec{x} = \vec{b}$ with non-singular $\mat{A} \in \real^{n \times n}$, lets factor $\mat{A}$ using QR decomposition and rearrange the terms gives, $$ \begin{align} \mat{A} \vec{x} &= \vec{b} \\ \mat{Q} \mat{R} \vec{x} &= \vec{b} \\ \mat{R} \vec{x} &= \mat{Q}^{T} \vec{b}. \end{align} $$ Note: $\mat{Q}$ is an orthogonal matrix, therefore the inverse of $\mat{Q}$ is its transpose. The R.H.S. of the last equation is simply matrix products of $\mat{Q}^{T}$, and $\vec{b}$ which are known. Once the R.H.S is computed, $\vec{x}$ can be solved using back-substitution.

Solve Least Squares with Cholesky Decomposition

To solve $\mat{A} \vec{x} = \vec{b}$ with non-singular $\mat{A} \in \real^{n \times n}$, lets factor $\mat{A}$ using Cholesky decomposition gives, $$ \begin{align} \mat{A} \vec{x} &= \vec{b} \\ \mat{L} \mat{L}^{T} \vec{x} &= \vec{b}, \end{align} $$ let $\vec{y} = \mat{L}^{T} \vec{x}$, subbing into the above, $$ \mat{L} \vec{y} = \vec{b}. $$ Solve for $\vec{y}$ using forward-substitution, and then solve for $\vec{x}$ in $$ \mat{L}^{T} \vec{x} = \vec{y} $$ using backward-substitution.

Non-linear Least Squares

Gauss Newton

$$ \begin{align} \argmin_{\vec{x}} J(\vec{x}) &= \dfrac{1}{2} \sum_{i} \vec{e}_{i}^{T} \mat{W} \vec{e}_{i} \\ &= \dfrac{1}{2} \enspace \vec{e}_{i}^{T}(\vec{x}) \mat{W} \vec{e}_{i}(\vec{x}) \end{align} $$ where the error function, $\vec{e}(\cdot)$, depends on the optimization parameter, $\vec{x} \in \real^{n}$. The error function, $\vec{e}(\cdot)$, has a form of $$ \vec{e}_{i} = \vec{z} - \vec{h}(\vec{x}) $$ is defined as the difference between the measured value, $\vec{z}$, and the estimated value calculated using the measurement function, $\vec{h}(\cdot)$. Since the error function, $\vec{e}(\vec{x})$, is non-linear, it is approximated with the first-order Taylor series, $$ \vec{e}(\vec{x}) \approx \vec{e}(\bar{\vec{x}}) + \mat{E}(\bar{\vec{x}}) \Delta\vec{x} $$ where $\mat{E}(\bar{\vec{x}}) = \dfrac{\partial\vec{e}(\vec{x})}{\partial\vec{x}} \bigg\rvert_{\vec{x}_{k}}$ and $\Delta{\vec{x}} = \vec{x} - \bar{\vec{x}}$. $$ \dfrac{\partial{J}}{\partial{\vec{x}}} = \dfrac{\partial{J}}{\partial{\vec{e}}} \dfrac{\partial{\vec{e}}}{\partial{\vec{x}}} $$ $$ \begin{align} \dfrac{\partial{J}}{\partial{\vec{e}}} &= \dfrac{1}{2} \vec{e}^{T}(\vec{x}) \mat{W} \vec{e}(\vec{x}) = \vec{e}^{T}(\vec{x}) \mat{W} \\ % \dfrac{\partial{\vec{e}}}{\partial{\vec{x}}} &= \vec{e}(\bar{\vec{x}}) + \mat{E}(\bar{\vec{x}}) \Delta\vec{x} = \mat{E}(\bar{\vec{x}}) \end{align} $$ $$ \begin{align} \dfrac{\partial{J}}{\partial{\vec{x}}} &= (\vec{e}^{T}(\vec{x}) \mat{W}) (\mat{E}(\bar{\vec{x}})) \\ % Line 2 &= ( \vec{e}(\bar{\vec{x}}) + \mat{E}(\bar{\vec{x}}) \Delta\vec{x} )^{T} \mat{W} \mat{E}(\bar{\vec{x}}) \\ % Line 3 &= \vec{e}^{T}(\bar{\vec{x}}) \mat{W} \mat{E}(\bar{\vec{x}}) + \Delta\vec{x}^{T} \mat{E}(\bar{\vec{x}})^{T} \mat{W} \mat{E}(\bar{\vec{x}}) = 0 \\ \end{align} $$ $$ \begin{align} % Line 4 \Delta\vec{x}^{T} \mat{E}(\bar{\vec{x}})^{T} \mat{W} \mat{E}(\bar{\vec{x}}) &= - \vec{e}^{T}(\bar{\vec{x}}) \mat{W} \mat{E}(\bar{\vec{x}}) \\ % Line 5 \underbrace{ \mat{E}(\bar{\vec{x}})^{T} \mat{W} \mat{E}(\bar{\vec{x}}) }_{\mat{H}} \Delta\vec{x} &= \underbrace{ - \mat{E}(\bar{\vec{x}})^{T} \mat{W} \vec{e}(\bar{\vec{x}}) }_{\vec{b}} \end{align} $$ Solve the normal equations $\mat{H}\Delta\vec{x} = \vec{b}$ for $\Delta\vec{x}$ using the Cholesky or QR-decompositon. Once $\Delta\vec{x}$ is found the best estimate $\bar{\vec{x}}$ can be updated via, $$ \bar{\vec{x}}_{k + 1} = \bar{\vec{x}}_{k} + \Delta\vec{x}. $$

Pinhole Camera Model

The pinhole camera model describes how 3D scene points are projected onto the 2D image plane of an ideal pinhole camera. The model makes the assumption that light rays emitted from an object in the scene pass through the pinhole of the camera, and projected onto the image plane.

A 3D point $\vec{p}_{C} = [p_x \quad p_y \quad p_z]^{T}$ expressed in the camera frame, $\frame_{C}$, projected on to the camera's 2D image plane $(u, v)$ is written as,

$$ u = \dfrac{p_{x} \cdot f_{x}}{p_{z}} + c_{x} \quad \quad v = \dfrac{p_{y} \cdot f_{y}}{p_{z}} + c_{y} $$

where $f_{x}$ and $f_{y}$ denote the focal lengths, $c_{x}$ and $c_{y}$ represents the principal point offset in the $x$ and $y$ direction. Or, in matrix form

$$ \vec{x}_{C} = \mat{K} \cdot \vec{p}_{C} $$ $$ \begin{bmatrix} u \\ v \\ 1 \end{bmatrix} = \begin{bmatrix} f_{x} & 0 & c_{x} \\ 0 & f_{x} & c_{y} \\ 0 & 0 & 1 \end{bmatrix} \begin{bmatrix} p_x / p_z \\ p_y / p_z \\ 1 \end{bmatrix} $$

In practice, the pinhole camera model only serves as an approximation to modern cameras. The assumptions made in the model are often violated with factors such as large camera apertures (pinhole size), distortion effects in camera lenses, and other factors. That is why the pinhole camera model is often used in combination with a distortion model in the hope of minimizing projection errors from 3D to 2D. Common distortion models used in conjuction with the pinhole camera model includes:

Radial-Tangential Distortion
Equi-Distant Distortion

Radial-Tangential Distortion

Lens distortion generally exist in all camera lenses, therefore it is vital the distortions observed are modelled. The most common distortion model is the radial-tangential (or simply as radtan) distortion model. The two main distortion components, as the name suggests, are the radial and tangential distortion.

Radial distortion occurs due to the shape of the lens, where light passing through the center undergoes no refraction, and light passing through the edges of the lens, undergoes through severe bending causing the radial distortion.

Tangential distortion, on the other hand, is mainly due to camera sensor mis-alignment during the manufacturing process. It occurs when the camera sensor is not in parallel with the lens.

The combined radial-tangential distortion is modelled using a polynomial approximation with parameters $k_{1}, k_{2}$ and $p_{1}, p_{2}$ respectively. To apply the distortion the observed 3D point $\vec{p} = [x \quad y \quad z]^{T}$ is first projected, distorted, and finally scaled and offset in the image plane $(u, v)$.

Equi-Distant Distortion

$$ \begin{align} r &= \sqrt{x^{2} + y^{2}} \\ \theta &= \arctan{(r)} \\ \theta_d &= \theta (1 + k_1 \theta^2 + k_2 \theta^4 + k_3 \theta^6 + k_4 \theta^8) \\ x' &= (\theta_d / r) \cdot x \\ y' &= (\theta_d / r) \cdot y \end{align} $$

Equi-distant Point Jacobian

$$ \begin{align} \dfrac{\partial{\vec{x}'}}{\partial{\vec{x}}} &= \begin{bmatrix} J_{11} & J_{12} \\ J_{21} & J_{22} \end{bmatrix} \\ \\ % r &= \sqrt{x^{2} + y^{2}} \\ \theta &= \arctan(r) \\ \theta_d &= \theta (1 + k_1 \theta^2 + k_2 \theta^4 + k_3 \theta^6 + k_4 \theta^8) \\ \theta_d' &= 1 + 3 k_1 \theta^2 + 5 k_2 \theta^4 + 7 k_3 \theta^6 + 9 k_4 \theta^8 \\ \theta_r &= 1 / (r^2 + 1) \\ \\ % s &= \theta_d / r \\ s_r &= \theta_d' \theta_r / r - \theta_d / r^2 \\ \\ % r_x &= 1 / r x \\ r_y &= 1 / r y \\ \\ % J_{11} &= s + x s_r r_x \\ J_{12} &= x s_r r_y \\ J_{21} &= y s_r r_x \\ J_{22} &= s + y s_r r_y \end{align} $$

Equi-distant Parameter Jacobian

$$ \begin{align} \dfrac{\partial{\vec{x}'}}{\partial{\vec{d}}} &= \begin{bmatrix} J_{11} & J_{12} & J_{13} & J_{14} \\ J_{21} & J_{22} & J_{23} & J_{24} \end{bmatrix} \\ \\ r &= \sqrt{x^{2} + y^{2}} \\ \theta &= \arctan(r) \\ \\ J_{11} &= x \theta^3 / r \\ J_{12} &= x \theta^5 / r \\ J_{13} &= x \theta^7 / r \\ J_{14} &= x \theta^9 / r \\ \\ J_{21} &= y \theta^3 / r \\ J_{22} &= y \theta^5 / r \\ J_{23} &= y \theta^7 / r \\ J_{24} &= y \theta^9 / r \end{align} $$

Epipolar Geometry

Essential Matrix

$$ \vec{x} \mat{E} \vec{x}' = 0 $$

Scalar Tripple Product

$$ \begin{align} \vec{a} \times \vec{b} \; = &\enspace (a_y b_z - a_z b_y) \; i \\ &+ (a_z b_x - a_x b_x) \; j \\ &+ (a_x b_y - a_y b_x) \; k \\ \\ (\vec{a} \times \vec{b}) \cdot \vec{c} \; = &\enspace (a_y b_z - a_z b_y) \; c_x \\ &+ (a_z b_x - a_x b_x) \; c_y \\ &+ (a_x b_y - a_y b_x) \; c_z \\ = & \underbrace{ | \vec{a} \times \vec{b} | }_{\text{Area of \\\ Parallelogram}} \cdot \underbrace{ | \vec{c} | \cos{\theta} }_{\text{Height of \\\ Parallelepiped}} \end{align} $$

Derivation of the Essential Matrix

$$ \begin{align} \vec{p}_{1} &= \lambda_{1} \vec{x}_{1} \\ \vec{p}_{2} &= \lambda_{2} \vec{x}_{2} \\ \vec{p}_{2} &= \rot \vec{p}_{1} + \pos \end{align} $$ $$ \begin{align} % Line 1 \vec{p}_{2} &= \rot \vec{p}_{1} + \pos \\ % Line 2 \lambda_{1} \vec{x}_{1} &= \rot \lambda_{2} \vec{x}_{2} + \pos \\ % Line 3 \lambda_{1} \vee{\pos} \vec{x}_{1} &= \vee{\pos} \rot \lambda_{2} \vec{x}_{2} + \underbrace{\vee{\pos} \pos}_{=0} \\ % Line 4 \lambda_{1} \underbrace{\vec{x}_{1}^{T} \vee{\pos} \vec{x}_{1}}_{= 0} &= \vec{x}_{1}^{T} \vee{\pos} \rot \lambda_{2} \vec{x}_{2} \\ % Line 5 \vec{x}_{1}^{T} \underbrace{\vee{\pos} \rot}_{\mat{E}} \vec{x}_{2} &= 0 \end{align} $$ $$ \begin{align} \boxed{ \vec{x}_{1}^{T} \mat{E} \; \vec{x}_{2} = 0 } \end{align} $$

Fundamental Matrix

$$ \vec{x}^{T} \mat{F} \; \vec{x}' = 0 $$

Triangulation

Linear Triangulation

There are various methods for triangulating a 3D point obeserved from at least two camera views. The linear triangulation method [Hartley2003] is frequently used. This method assumes a pair of homogeneous points $\vec{x}$ and $\vec{x}' \in \real^{3}$ in the image plane that observes the same 3D point, $\vec{X} \in \real^{4}$, in homogeneous coordinates from two different camera frames. The homogeneous projection from 3D to 2D with a known camera matrix $\mat{P} \in \real^{3 \times 4}$ for each measurement is given as,

$$ \begin{align} \vec{x} &= \mat{P} \vec{X} \\ \vec{x}' &= \mat{P}' \vec{X}. \end{align} $$

Taking avantage of the fact if two vectors $\vec{x}$ and $\mat{P}\vec{X}$ have the same direction then $\vec{x} \times \mat{P} \vec{X} = 0$. These equations can be combined to form a system of equations of the form $\mat{A} \vec{x} = \vec{0}$. To eliminate the homogeneous scale factor we apply a cross product to give three equations for each image point, for example $\vec{z} \times (\mat{P} \mat{X}) = \vec{0}$ writing this out gives

$$ x (\vec{p}^{3T} \vec{X}) - (\vec{p}^{1T} \vec{X}) = 0 \\ y (\vec{p}^{3T} \vec{X}) - (\vec{p}^{2T} \vec{X}) = 0 \\ x (\vec{p}^{2T} \vec{X}) - y (\vec{p}^{1T} \vec{X}) = 0 $$

where $\vec{p}^{iT}$ is the $i^{\text{th}}$ row of $\vec{P}$. Note that the third line in the above equation is a linear combination of the first two, ($c_1$ times first line plus $c_2$ times second line = third line), as such the third line spans the space of the first two equations and therefore is redundant.

From the above, an equation of the form $\mat{A} \vec{x} = \vec{0}$ for each image point can be formed, where $\vec{x}$ represents the unknown homogeneous feature location to be estimated, and $\mat{A}$ is given as

$$ \mathbf{A} = \begin{bmatrix} x (\vec{p}^{3T}) - (\vec{p}^{1T}) \\ y (\vec{p}^{3T}) - (\vec{p}^{2T}) \\ x' (\vec{p'}^{3T}) - (\vec{p'}^{1T}) \\ y' (\vec{p'}^{3T}) - (\vec{p'}^{2T}) \end{bmatrix} $$

giving a total of four equations in four homogeneous unknowns. Solving for $\vec{A}$ using SVD allows us to estimate the initial feature location.

In an ideal world, the position of 3D points can be solved as a system of equations using the linear triangulation method. In reality, however, errors are present in the camera poses and pixel measurements. The pixel measurements observing the same 3D point are generally noisy. In addition, the camera models and distortion models used often do not model the camera projection or distortion observed perfectly. Therefore to obtain the best results an iterative method should be used. This problem is generally formulated as a non-linear least square problem and can be solved by numerical methods, such as the Gauss-Newton algorithm.

References

[Hartley2003]: Hartley, Richard, and Andrew Zisserman. Multiple view geometry in computer vision. Cambridge university press, 2003.

Optical Flow

Optical flow estimates the velocity of each image feature in successive images of a scene. It makes the following explicit assumptions:

Pixel intensity does not change between consecutive frames
Displacement of features is small
Features are within the same local neighbourhood

Let us consider a pixel, $p$, in the first frame which has an intensity, $I(x, y, t)$, where it is a function of the pixel location, $x$ and $y$, and time, $t$. If we apply the aforementioned assumptions, we can say that the intensity of said pixel in the first frame to the second does not change. Additionally, if there was a small displacement, $dx$ and $dy$, and small time difference, $dt$, between images this can be written in mathematical form as,

$$ I(x, y, t) = I(x + dx, y + dy, t + dt). $$

This is known as the brightness constancy equation. To obtain the image gradient and velocity of the pixel, we can use Taylor series approximation of right-hand side of the above to get,

$$ \begin{align} &I(x + dx, y + dy, t + dt) \\ &= I(x, y, t) + \dfrac{\partial{I}}{\partial{x}} dx + \dfrac{\partial{I}}{\partial{y}} dy + \dfrac{\partial{I}}{\partial{t}} dt + \dots \end{align} $$

removing common terms and dividing by $dt$ we get,

$$ I_{x} v_{x} + I_{y} v_y + I_{t} = 0 $$

or,

$$ I_{x} v_{x} + I_{y} v_y = -I_{t} $$

where:

$$ I_{x} = \dfrac{\partial I}{\partial x} ; \quad I_{y} = \dfrac{\partial I}{\partial y} \\ v_{x} = \dfrac{dx}{dt} ; \quad v_y = \dfrac{dy}{dt}. $$

The image gradients along the x and y directions are $I_{x}$ and $I_{y}$, where $I_{t}$ is the image gradient along time, finally, $v_{x}$ and $v_{y}$ are the pixel velocity in $x$ and $y$ directions, which is unknown. The problem with with the above is that it provides a single constraint with two degrees of freedom, and as such requires at least one additional constraint to identify a solution.

The Lucas-Kanade method solves the aperture problem by introducing additional conditions. This method assumes all pixels within a window centered around a pixel $p$ will have similar motion, and that the window size is configurable. For example, a window size of $3 \times 3$ around the pixel $p$, the $9$ points within the window should have a similar motion. Using the intensity inside the window must therefore satisfy,

$$ \begin{align} I_{x}(p_1) v_{x}(p_1) &+ I_{y}(p_1) v_y = -I_{t}(p_1) \\ I_{x}(p_1) v_{x}(p_2) &+ I_{y}(p_2) v_y = -I_{t}(p_2) \\ & \enspace \vdots \\ I_{x}(p_1) v_{x}(p_n) &+ I_{y}(p_n) v_y = -I_{t}(p_n) \end{align} $$

where $p_{1}, p_{2} ,\dots , p_{n}$ are the pixels in the window. This can be re-written in matrix form $\mathbf{A} \mathbf{x} = \mathbf{b}$ as,

$$ \begin{align} \mathbf{A} = \begin{bmatrix} I_{x}(p_{1}) & I_{y}(p_{1}) \\ I_{x}(p_{2}) & I_{y}(p_{2}) \\ \vdots & \vdots \\ I_{x}(p_{n}) & I_{y}(p_{n}) \end{bmatrix} \quad \mathbf{x} = \begin{bmatrix} v_{x} \\ v_{y} \\ \end{bmatrix} \quad \mathbf{b} = \begin{bmatrix} -I_{t}(p_{1}) \\ -I_{t}(p_{2}) \\ \vdots \\ -I_{t}(p_{n}) \end{bmatrix}. \end{align} $$

The linear system of equations above is over-determined, therefore there is no exact solution. To address this issue, a least squares method can be used to approximate the solution by applying the ordinary least squares. For the system $\mathbf{A} \mathbf{x} = \mathbf{b}$, the least squares formula is obtained by minimizing the following,

$$ \begin{align} \underset{\mathbf{x}}{\text{argmin }} || \mathbf{A} \mathbf{x} - \mathbf{b} ||, \end{align} $$

the solution of which can be obtained by using normal equations,

$$ \begin{align} \mathbf{A}^{T} \mathbf{A} \mathbf{x} &= \mathbf{A}^{T} \mathbf{b} \\ \mathbf{x} &= (\mathbf{A}^{T} \mathbf{A})^{-1} \mathbf{A}^{T} \mathbf{b} \end{align} $$

where

$$ \begin{bmatrix} v_{x} \\ v_{y} \end{bmatrix} = \begin{bmatrix} \sum_{i}{I_{x}(p_{i})}^2 & \sum_{i}{I_{x}(p_{i}) I_{y}(p_{i}) } \\ \sum_{i}{I_{x}(p_{i}) I_{y}(p_{i})} & \sum_{i}{I_{y}(p_{i})}^2 \end{bmatrix}^{-1} \begin{bmatrix} - \sum_{i}{I_{x}(p_{i}) I_{t}(p_{i})} \\ - \sum_{i}{I_{y}(p_{i}) I_{t}(p_{i})} \end{bmatrix} $$

which is finally used to obtain the optical flow of pixel $p$.

Bundle Adjustment

$$ \begin{align} \argmin_{\tf_{WC}, \pt_{W}} \norm{\vec{z} - \boldsymbol{\pi}(\tf_{WC}^{-1} \enspace \pt_{W})}^{2} \\ \boldsymbol{\pi} = \boldsymbol{k}\left( \boldsymbol{d}\left( \boldsymbol{p}\left( \tf_{WC}^{-1} \enspace \pt_{W} \right) \right) \right) \end{align} $$ Useful skew properties: $$ \begin{align} \vee{\vec{v}}^{T} = -\vee{\vec{v}} \\ \vee{\vec{v}}^{2} = -\vec{v}\vec{v}^{T} - \vec{v}^{T} \vec{v} \eye \\ \vee{\rot({\Phi}) \vec{v}} = \rot({\Phi}) \vee{\vec{v}} \rot({\Phi})^{T} \end{align} $$

Project

$$ \begin{align} \vec{x} &= \boldsymbol{p}(\tf_{WC}^{-1} \enspace \pt_{W}) \\ &= \boldsymbol{p}(\pt_{C}) \\ &= \begin{bmatrix} x / z \\ y / z \\ \end{bmatrix} \end{align} $$ $$ \begin{align} \dfrac{\partial{\vec{x}}}{\partial{\pt_{C}}} &= \begin{bmatrix} 1 / z & 0 & -x / z^{2} \\ 0 & 1 / z & -y / z^{2} \end{bmatrix} \end{align} $$

Radial-Tangential Distortion

$$ \begin{align} x &= X / Z \\ y &= Y / Z \\ r^2 &= x^2 + y^2 \\ \\ x' &= x \cdot (1 + (k_1 r^2) + (k_2 r^4)) \\ y' &= y \cdot (1 + (k_1 r^2) + (k_2 r^4)) \\ x'' &= x' + (2 p_1 x y + p_2 (r^2 + 2 x^2)) \\ y'' &= y' + (p_1 (r^2 + 2 y^2) + 2 p_2 x y) \end{align} $$ $$ \begin{align} \dfrac{\partial{\vec{x}'}}{\partial{\vec{x}}} &= \begin{bmatrix} J_{11} & J_{12} \\ J_{21} & J_{22} \end{bmatrix} \\ \\ J_{11} &= k_1 r^2 + k_2 r^4 + 2 p_1 y + 6 p_2 x + x (2 k_1 x + 4 k_2 x r^2) + 1 \\ J_{12} &= 2 x p_1 + 2 y p_2 + y (2 k_1 x + 4 k_2 x r^2) \\ J_{21} &= 2 x p_1 + 2 y p_2 + y (2 k_1 x + 4 k_2 x r^2) \\ J_{22} &= k_1 r^2 + k_2 r^4 + 6 p_1 y + 2 p_2 x + y (2 k_1 y + 4 k_2 y r^2) + 1 \end{align} $$ $$ \begin{align} \dfrac{\partial{\vec{x}'}}{\partial{\vec{d}_{\text{params}}}} &= \begin{bmatrix} J_{11} & J_{12} & J_{13} & J_{14} \\ J_{21} & J_{22} & J_{23} & J_{24} \end{bmatrix} \\ \\ r^2 &= x^2 + y^2 \\ \\ J_{11} &= x r^2 \\ J_{12} &= x r^4 \\ J_{13} &= 2 x y \\ J_{14} &= 3 x^2 + y^2 \\ \\ J_{21} &= y r^2 \\ J_{22} &= y r^4 \\ J_{23} &= x^2 + 3 y^2 \\ J_{24} &= 2 x y \end{align} $$

Scale and Center

$$ u = f_x \cdot x' + c_x \\ v = f_y \cdot y' + c_y $$ $$ \dfrac{\partial\hat{\vec{z}}}{\partial\vec{x}'} = \begin{bmatrix} f_x & 0 \\ 0 & f_y \end{bmatrix} $$

Camera Pose $\tf_{WC}$

$$ \begin{align} \pt_{C} &= \tf_{WC}^{-1} \enspace \pt_{W} \\ &= \rot_{WC}^{-1} \enspace \pt_{W} - \rot_{WC}^{-1} \pos_{WC} \end{align} $$ $$ \begin{align} \dfrac{\partial\hat{\vec{z}}}{\partial\vec{x}'} \dfrac{\partial\vec{x}'}{\partial\vec{x}} \dfrac{\partial\vec{x}}{\partial\pt_{C}} \dfrac{\partial{\pt_{C}}}{\partial{\tf_{WC}}} \end{align} $$ $$ \begin{align} \dfrac{\partial{\pt_{C}}}{\partial{\tf_{WC}}} = \begin{bmatrix} \dfrac{\partial{\pt_{C}}}{\partial{\quat_{WC}}} \enspace \dfrac{\partial{\pt_{C}}}{\partial{\pos_{WC}}} \end{bmatrix} \end{align} $$ $$ \begin{align} \dfrac{\partial{\pt_{C}}}{\partial{\quat_{WC}^{-1}}} &= -\vee{\rot_{WC}^{-1} \left( \pt_{W} - \pos_{WC} \right)} \\ \dfrac{\partial{\quat_{WC}^{-1}}}{\partial{\quat_{WC}}} &= -\rot_{WC}^{-1} \\ \\ \dfrac{\partial{\pt_{C}}}{\partial{\quat_{WC}^{-1}}} \dfrac{\partial{\quat_{WC}^{-1}}}{\partial{\quat_{WC}}} &= (-\vee{\rot_{WC}^{-1} \left( \pt_{W} - \pos_{WC} \right)}) (-\rot_{WC}^{-1}) \\ & \text{using skew property:} \enspace \vee{\rot \enspace \vec{v}} = \rot \vee{\vec{v}} \rot^{T} \\ &= (-\rot_{WC}^{-1} \vee{\left( \pt_{W} - \pos_{WC} \right)} \enspace \rot_{WC})(-\rot_{WC}^{-1}) \\ &= \rot_{WC}^{-1} \vee{\left( \pt_{W} - \pos_{WC} \right)} \\ \\ \\ \dfrac{\partial{\pt_{C}}}{\partial{\pos_{WC}}} &= -\rot_{WC}^{-1} \end{align} $$

Landmark $\pt_{W}$

$$ \begin{align} \pt_{C} &= \tf_{WC}^{-1} \enspace \pt_{W} \\ &= \rot_{WC}^{-1} \enspace \pt_{W} - \rot_{WC}^{-1} \pos_{WC} \end{align} $$ $$ \dfrac{\partial\hat{\vec{z}}}{\partial\vec{x}'} \dfrac{\partial\vec{x}'}{\partial\vec{x}} \dfrac{\partial\vec{x}}{\partial\pt_{C}} \dfrac{\partial{\pt_{C}}}{\partial{\pt_{W}}} $$ $$ \dfrac{\partial\pt_{C}}{\partial\pt_{W}} = \rot_{WC}^{-1} $$

Illumination Invariant Transform

The illumination invariant transform takes three input channels from the image, and returns a single illumination adjusted channel, $I$, as follows,

$$ I = \log(R_{2}) - \alpha \log(R_{1}) - (1 - \alpha) \log(R_{3}) $$

where $R_{1}, R_{2}, R_{3}$ are sensor responses (or image channels) corresponding to peak sensitivities at ordered wavelengths $\lambda_{1} < \lambda_{2} < \lambda_{3}$, and $\alpha$ is determined by the following equations,

$$ \begin{align} \dfrac{1}{\lambda_{2}} &= \dfrac{\alpha}{\lambda_{1}} + \dfrac{\left(1 - \alpha \right)}{\lambda_{3}} \\ \alpha &= \dfrac{\lambda_{1} (\lambda_{2} - \lambda_{3})} {\lambda_{2} (\lambda_{1} - \lambda_{3})} \end{align} $$

This transform, however, has a non-intuitive effect on black and white targets, as the three channels tend to be equally over and under exposed in RGB images. As a result, the transform leads to similar values for white and black pixels, eliminating the ability for the computer vision algorithms to detect edges.

References

[Maddern2014]: Maddern, Will, et al. "Illumination invariant imaging: Applications in robust vision-based localisation, mapping and classification for autonomous vehicles." Proceedings of the Visual Place Recognition in Changing Environments Workshop, IEEE International Conference on Robotics and Automation (ICRA), Hong Kong, China. Vol. 2. 2014.

Marginalization

As a reminder, marginalization is about having a joint density $p(x, y)$ over two variables $x$ and $y$, and we would like to marginalize out or "eliminate a variable", lets say $y$ in this case:

$$ p(x) = \int_{y} p(x, y) $$

resulting in a density $p(x)$ over the remaining variable $x$.

Now, if the density was in covariance form with mean $\boldsymbol{\mu}$ and covariance $\mathbf{\Sigma}$, partitioned as follows:

$$ \begin{align} p(x, y) = \mathcal{N}( % Mean \begin{bmatrix} \boldsymbol\mu_{x} \\ \boldsymbol\mu_{y} \end{bmatrix}, % Covariance \begin{bmatrix} \mathbf\Sigma_{xx}, \mathbf\Sigma_{xy} \\ \mathbf\Sigma_{yx}, \mathbf\Sigma_{yy} \end{bmatrix} ) \end{align} $$

marginalization is simple, as the corresponding sub-block $\mathbf{\Sigma}_{xx}$ already contains the covariance on $x$ i.e.,

$$ p(x) = \mathcal{N}( % Mean \boldsymbol\mu_{x}, % Covariance \mathbf\Sigma_{xx} ). $$

In the nonlinear-least squares formulation, however, we can only obtain the covariance $\mathbf{\Sigma}$ with the following property,

$$ \mathbf{\Sigma} = \mat{H}^{-1} $$

where $\mat{H}$ is the Hessian matrix in a Gauss-Newton system ($\mat{H}\delta\vec{x} = \vec{b}$).

Using Shur's Complement for marginalization

First let $\state_1$ be the states to be marginalized out, $\state_{2}$ be the set of states related to those by error terms, and $\state_{\rho}$ be the set of remaining states. Partitioning the Hessian, error state and R.H.S of the Gauss-Newton system gives:

$$ \begin{align} \begin{bmatrix} \mat{H}_{11} & \mat{H}_{12} \\ \mat{H}_{21} & \mat{H}_{22} \end{bmatrix} \begin{bmatrix} \delta\state_{1} \\ \delta\state_{2} \end{bmatrix} = \begin{bmatrix} \vec{b}_{1} \\ \vec{b}_{2} \end{bmatrix} \end{align} $$

and applying the Shur complement operation yields:

$$ \begin{align} \mat{H}^{\ast}_{22} &= \mat{H}_{22} - \mat{H}_{21} \mat{H}_{11}^{-1} \mat{H}_{12} \\ \vec{b}^{\ast}_{2} &= \vec{b}_{2} - \mat{H}_{21} \mat{H}_{11}^{-1} \vec{b}_{1} \end{align} $$

where $\vec{b}^{\ast}_{2}$ and $\mat{H}^{\ast}_{22}$ are non-linear functions of $\state_2$ and $\state_1$.

Derivation of Schur's Complement for Marginalization

From the Gauss-Newton system, $\mat{H} \delta\vec{x} = \vec{b}$, we can derive the marginalization of the old states in $\delta\vec{x}$ algebraically. Let us decompose the system as:

$$ \begin{align} % H \begin{bmatrix} \mat{H}_{11} & \mat{H}_{12} \\ \mat{H}_{21} & \mat{H}_{22} \end{bmatrix} % x \begin{bmatrix} \delta\vec{x}_{1} \\ \delta\vec{x}_{2} \end{bmatrix} = % b \begin{bmatrix} \vec{b}_{1} \\ \vec{b}_{2} \end{bmatrix} \end{align} $$

If we multiply out the block matrices and vectors out we get:

$$ \begin{align} % Line 1 \mat{H}_{11} \delta\vec{x}_{1} + \mat{H}_{12} \delta\vec{x}_{2} = \vec{b}_{1} \\ % Line 2 \mat{H}_{21} \delta\vec{x}_{1} + \mat{H}_{22} \delta\vec{x}_{2} = \vec{b}_{2} \end{align} $$

Suppose we want to marginalize out the $\delta\vec{x}_{2}$, the second equation above can be rearranged w.r.t. $\delta\vec{x}_{2}$ like so:

$$ \begin{align} % Line 1 \mat{H}_{21} \delta\vec{x}_{1} + \mat{H}_{22} \delta\vec{x}_{2} &= \vec{b}_{2} \\ % Line 2 \mat{H}_{22} \delta\vec{x}_{2} &= \vec{b}_{2} - \mat{H}_{21} \delta\vec{x}_{1} \\ % Line 3 \delta\vec{x}_{2} &= \mat{H}_{22}^{-1} \vec{b}_{2} - \mat{H}_{22}^{-1} \mat{H}_{21} \delta\vec{x}_{1} \\ \end{align} $$

substituting $\delta\vec{x}_{2}$ back into $\mat{H}_{11} \delta\vec{x}_{1} + \mat{H}_{12} \delta\vec{x}_{2} = \vec{b}_{1}$, and rearranging the terms so it is w.r.t $\delta\vec{x}_{1}$ to get:

$$ \begin{align} % Line 1 \mat{H}_{11} \delta\vec{x}_{1} + \mat{H}_{12} (\mat{H}_{22}^{-1} \vec{b}_{2} - \mat{H}_{22}^{-1} \mat{H}_{21} \delta\vec{x}_{1}) &= \vec{b}_{1} \\ % Line 2 \mat{H}_{11} \delta\vec{x}_{1} + \mat{H}_{12} \mat{H}_{22}^{-1} \vec{b}_{2} - \mat{H}_{12} \mat{H}_{22}^{-1} \mat{H}_{21} \delta\vec{x}_{1} &= \vec{b}_{1} \\ % Line 3 (\mat{H}_{11} - \mat{H}_{12}\mat{H}_{22}^{-1}\mat{H}_{21}) \delta\vec{x}_{1} &= \vec{b}_{1} - \mat{H}_{12} \mat{H}_{22}^{-1} \vec{b}_{2} \end{align} $$

thus the Schur Complement of $\mat{H}_{22}$ in $\mat{H}$ is:

$$ \begin{align} \mat{H} / \mat{H}_{22} &= \mat{H}_{11} - \mat{H}_{12} \mat{H}_{22}^{-1} \mat{H}_{21} \\ \vec{b} / \vec{b}_{2} &= \vec{b}_{1} - \mat{H}_{12} \mat{H}_{22}^{-1} \vec{b}_{2} \end{align} $$

If you want to marginalize out $\delta\vec{x}_{1}$ you can follow the same process above but w.r.t $\vec{x}_{1}$.

First Estimate Jacobians (FEJ)

In the context of real time state-estimation, a fixed-lag smoother provides a way to bound the optimization problem in order to operate in real time. The method to remove old states in the state vector is called *marginalization*. To perform marginalization the Schur's complement is used to marginalize out old states.

Simply performing marginalization, however, introduces estimator inconsistencies.

Let us consider the following scenario. A state vector, $\state$, during the time interval $[0, \, k]$ will contain $m$ old states to be marginalized out and $r$ remain states which we wish to keep. i.e. $\state = [\state_{m}^{T} \quad \state_{r}^{T}]^{T}$. Then the cost function, $c(\cdot)$, can be written as a function of $\state$ at time $k$ as,

$$ \begin{align} c(\state_{k}) &= c(\state_{m}, \state_{r}) \\ &= c(\state_{m}) + c(\state_{r}). \end{align} $$

The intuition behind the above is since the state at time $k$ can be partitioned into $m$ and $r$, the cost can also be decomposed. Utilizing this property, the multivariate optimization can also be decomposed as follows,

$$ \begin{align} \min_{\state_{m}, \state_{r}} c(\state_{m}, \state_{r}) &= \min_{\state_{r}} (\min_{\state_{m}} c(\state_{m}, \state_{r})) \\ &= \min_{\state_{r}} (c(\state_{r}) + \min_{\state_{m}} c(\state_{m})). \end{align} $$

The equation above shows the minimization problem can be solved by first optimizing for the states $\state_{m}$, and then forming a prior towards the problem of solving for $\state_{r}$. The reformulation of the minimization problem entails no approximation.

Covariance Recovery

The Hessian matrix $\mat{H}$ is known to be related to the marginal covariance matrix $\covar$ by $\covar = \mat{H}^{-1}$. However, inverting $\mat{H}$ can be expensive and impossible if it is not well-conditioned. The objective in the following is to recover the marginal covariance matrix without explicitly inverting $\mat{H}$.

This can be done by decomposing the Hessian $\mat{H}$ is into a lower and upper triangular matrix, $\mat{R}^{T} \mat{R}$ using either Cholesky or QR factorization. Then let us write,

$$ \begin{align} (\mat{R}^{T} \mat{R}) (\mat{R}^{T} \mat{R})^{-1} &= \eye \\ (\mat{R}^{T} \mat{R}) \covar &= \eye \\ \mat{R} \covar &= (\mat{R}^{T})^{-1}. \end{align} $$

and using back-substitution on the last equation to solve for $\covar$ results in the following two general equations for any diagonal and off-diagonal values of $\covar$:

$$ \boxed{ \sigma_{ii} = \dfrac{1}{u_{ii}} \left( l_{ii} -\sum_{j=i+1}^{n} u_{i,j} \sigma_{j,i} \right) } $$ $$ \boxed{ \sigma_{il} = \dfrac{1}{u_{ii}} \left( -\sum_{j=i+1}^{l} u_{i,j} \sigma_{j,l} -\sum_{j=l+1}^{n} u_{i,j} \sigma_{j,l} \right) } $$ Note: that the summations only apply to non-zero entries of single columns or rows of the sparse matrix $\mat{R}$.

Derivation of Covariance Recovery using Square Root Matrix

$$ \mat{R} \covar = (\mat{R}^{T})^{-1} $$ $$ \underbrace{ \begin{bmatrix} u_{11} & u_{12} & \cdots & u_{1n} \\ 0 & u_{22} & \cdots & u_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \cdots & u_{nn} \\ \end{bmatrix} }_{\mat{R}} \underbrace{ \begin{bmatrix} \sigma_{1,1} & \sigma_{1,2} & \dots & \sigma_{1,n} \\ \sigma_{2,1} & \sigma_{2,2} & \dots & \sigma_{2,n} \\ \vdots & \vdots & \vdots & \vdots \\ \sigma_{n,1} & \sigma_{n,2} & \dots & \sigma_{n,n} \\ \end{bmatrix} }_{\covar} = \underbrace{ \begin{bmatrix} l_{11} & 0 & \dots & 0 \\ l_{21} & l_{22} & \dots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ l_{m1} & l_{m2} & \dots & l_{nn} \end{bmatrix} }_{(\mat{R}^{T})^{-1}} $$

The trick to solving for $\covar$ is the fact the term $(\mat{R}^{T})^{-1}$ is not actually evaluated. Instead, we take advantage of the struture of the lower triangular matrix to solve a system of equations via back-substituion. For one, we know for a fact the inverse of the diagonals in $(\mat{R}^{T})^{-1}$ is the reciprocal of itself, i.e. $l_{ii} = 1 / u_{ii}$ (a known property of inverting diagonal matrices). Secondly, by performing back-substition the lower triangle values of $(\mat{R}^{T})^{-1}$ are not required.

Lets see an example, suppose $\mat{R}$, $\covar$, and $(\mat{R}^{T})^{-1}$ are $4 \times 4$ matrices,

$$ \begin{bmatrix} u_{11} & u_{12} & u_{13} & u_{14} \\ 0 & u_{22} & u_{23} & u_{24} \\ 0 & 0 & u_{33} & u_{34} \\ 0 & 0 & 0 & u_{44} \end{bmatrix} \begin{bmatrix} \sigma_{11} & \sigma_{12} & \sigma_{13} & \sigma_{14} \\ \sigma_{21} & \sigma_{22} & \sigma_{23} & \sigma_{24} \\ \sigma_{31} & \sigma_{32} & \sigma_{33} & \sigma_{34} \\ \sigma_{41} & \sigma_{42} & \sigma_{43} & \sigma_{44} \end{bmatrix} = \begin{bmatrix} l_{11} & 0 & 0 & 0 \\ l_{21} & l_{22} & 0 & 0 \\ l_{31} & l_{32} & l_{33} & 0 \\ l_{41} & l_{42} & l_{43} & l_{44} \end{bmatrix}, $$

to workout $\covar$ we only need to find the values of the diagonals and upper triangular matrix of $\covar$ (because a covariance matrix is symmetrical). If we write out the matrix multiplication, and rearrange w.r.t values of $\covar$ for each column in $\covar$ we get:

1st Column of $\covar$ $$ \begin{align} u_{11} \sigma_{11} + u_{12} \sigma_{21} + u_{13} \sigma_{31} + u_{14} \sigma_{41} &= l_{11} \end{align} $$ $$ \begin{align} \sigma_{11} &= (l_{11} -u_{12} \sigma_{21} - u_{13} \sigma_{31} - u_{14} \sigma_{41}) / u_{11} \end{align} $$ 2nd Column of $\covar$ $$ \begin{align} u_{11} \sigma_{12} + u_{12} \sigma_{22} + u_{13} \sigma_{32} + u_{14} \sigma_{42} &= 0 \\ u_{22} \sigma_{22} + u_{23} \sigma_{32} + u_{24} \sigma_{42} &= l_{22} \end{align} $$ $$ \begin{align} \sigma_{12} &= (-u_{12} \sigma_{22} - u_{13} \sigma_{32} - u_{14} \sigma_{42}) / u_{11} \\ \sigma_{22} &= (l_{22} -u_{23} \sigma_{32} - u_{24} \sigma_{42}) / u_{22} \end{align} $$ 3rd Column of $\covar$ $$ \begin{align} u_{11} \sigma_{13} + u_{12} \sigma_{23} + u_{13} \sigma_{33} + u_{14} \sigma_{43} &= 0 \\ u_{22} \sigma_{23} + u_{23} \sigma_{33} + u_{24} \sigma_{43} &= 0 \\ u_{33} \sigma_{33} + u_{34} \sigma_{43} &= l_{33} \end{align} $$ $$ \begin{align} \sigma_{13} &= (-u_{12} \sigma_{23} - u_{13} \sigma_{33} - u_{14} \sigma_{43}) / u_{11} \\ \sigma_{23} &= (-u_{23} \sigma_{33} - u_{24} \sigma_{43}) / u_{22} \\ \sigma_{33} &= (l_{33} - u_{34} \sigma_{43}) / u_{33} \end{align} $$ 4th Column of $\covar$ $$ \begin{align} u_{11} \sigma_{14} + u_{12} \sigma_{24} + u_{13} \sigma_{34} + u_{14} \sigma_{44} &= 0 \\ u_{22} \sigma_{24} + u_{23} \sigma_{34} + u_{24} \sigma_{44} &= 0 \\ u_{33} \sigma_{34} + u_{34} \sigma_{44} &= 0 \\ u_{44} \sigma_{44} &= l_{44} \end{align} $$ $$ \begin{align} \sigma_{14} &= (-u_{12} \sigma_{24} - u_{13} \sigma_{34} - u_{14} \sigma_{44}) / u_{11} \\ \sigma_{24} &= (-u_{23} \sigma_{34} - u_{24} \sigma_{44}) / u_{22} \\ \sigma_{34} &= (-u_{34} \sigma_{44}) / u_{33} \\ \sigma_{44} &= l_{44} / u_{44} \end{align} $$

Collecting the diagonal and off-diagonal terms we can form general equations to find any values in $\covar$:

Diagonals $$ \begin{align} % Line 1 \color{blue}{\sigma_{11}} &= (\color{brown}{l_{11}} \color{magenta}{-u_{12} \sigma_{21} - u_{13} \sigma_{31} - u_{14} \sigma_{41}}) / \color{red}{u_{11}} \\ % Line 2 \color{blue}{\sigma_{22}} &= (\color{brown}{l_{22}} \color{magenta}{-u_{23} \sigma_{32} - u_{24} \sigma_{42}}) / \color{red}{u_{22}} \\ % Line 3 \color{blue}{\sigma_{33}} &= (\color{brown}{l_{33}} \color{magenta}{-u_{34} \sigma_{43}}) / \color{red}{u_{33}} \\ % Line 4 \color{blue}{\sigma_{44}} &= \color{brown}{l_{44}} / \color{red}{u_{44}} \end{align} $$ $$ \begin{align} \color{blue}{{\sigma}_{ii}} = \color{red}{\dfrac{1}{{u}_{ii}}} \left( \color{brown}{l_{ii}} \color{magenta}{-{\sum}_{j=i+1}^{n} u_{i,j} {\sigma}_{j,i}} \right) \end{align} $$

Since we know that the inverse of the diagonals are its reciprocal, $l_{ii}$ can be written as $\frac{1}{u_{ii}}$ giving us the general formula for the diagonals of $\covar$ as,

$$ \boxed{ \color{blue}{\sigma_{ii}} = \color{red}{\dfrac{1}{u_{ii}}} \left( \color{brown}{\dfrac{1}{u_{ii}}} \color{magenta}{-\sum_{j=i+1}^{n} u_{i,j} \sigma_{j,i}} \right) } $$ Off-Diagonals $$ \begin{align} \color{blue}{\sigma_{12}} &= (\color{magenta}{-u_{12} \sigma_{22}} \color{purple}{-u_{13} \sigma_{32} - u_{14} \sigma_{42}}) / \color{red}{u_{11}} \\ \color{blue}{\sigma_{13}} &= (\color{magenta}{-u_{12} \sigma_{23}} \color{purple}{-u_{13} \sigma_{33} - u_{14} \sigma_{43}}) / \color{red}{u_{11}} \\ \color{blue}{\sigma_{14}} &= (\color{magenta}{-u_{12} \sigma_{24}} \color{purple}{-u_{13} \sigma_{34} - u_{14} \sigma_{44}}) / \color{red}{u_{11}} \\ \\ \color{blue}{\sigma_{23}} &= (\color{magenta}{-u_{23} \sigma_{33}} \color{purple}{-u_{24} \sigma_{43}}) / \color{red}{u_{22}} \\ \color{blue}{\sigma_{24}} &= (\color{magenta}{-u_{23} \sigma_{34}} \color{purple}{-u_{24} \sigma_{44}}) / \color{red}{u_{22}} \\ \\ \color{blue}{\sigma_{34}} &= (\color{magenta}{-u_{34} \sigma_{44}}) / \color{red}{u_{33}} \end{align} $$ $$ \boxed{ \color{blue}{\sigma_{il}} = \color{red}{\dfrac{1}{u_{ii}}} \left( \color{magenta}{-\sum_{j=i+1}^{l} u_{i,j} \sigma_{j,l}} \color{purple}{-\sum_{j=l+1}^{n} u_{i,j} \sigma_{j,l}} \right) } $$

Gauage Freedom

Gauge theory is borrowed from physics.

Accurate structure from motion or vision based state estimation is hard. One hurdle is addressing the accuracy quantitatively. There are two main problems that arise:

Inherent Physical Indeterminancy: cause by loss of information while projecting 3D objects onto a 2D image plane.
Overparameterized Problem: e.g. a shape model that can be parameterized by a vector, each representing the absolute position and orientation of the object could itself be indeterminant.

It is well known that a vision only bundle adjustment has 7 unobserable degrees-of-freedom (DoF), while for a visual-inertial system, the global position and global yaw is not observable, a total of four unobservable DoFs. These unobservable DoFs (a.k.a gauge freedoms) have to be handled properly. There are three main approaches to address the unobservability in state-estimation they are:

Gauge fixation
Gauge prior
Free gauge

Gauge fixation

Gauge fixation method works by decreasing the number of optimization parameters to where there are no unobservable states left for the opitmization problem to optimize. This is to ensure the Hessian is well conditioned and invertable. This approach enforces hard constraints to the solution.

The standard method to update orientation variables such as a rotation, $\rot$, during the iterations of a non-linear least squares solver is to use local coordinates, where at the $k$-th iteration, the update is

$$ \rot^{k + 1} = \text{Exp}(\delta \boldsymbol{\phi}^{k}) \rot^{k} . $$

Setting the $z$ component of $\boldsymbol{\phi}^{k}$ to 0 allows fixating the yaw with respect to $\rot^{k}$. However, concatenating several such updates over $K$-iterations,

$$ \rot^{K} = \prod^{K-1}_{k=0} \text{Exp}(\delta \boldsymbol{\phi}^{k}) , $$

does not fixate the yaw with respect to the initial rotation $\rot^{0}$, and therefore, this parameterization cannot be used to fix the yaw-value of $\rot^{K}$ to that of the initial value $\rot^{0}$.

Although pose fixation or prior can be applied to any camera pose, it is common practice to fixate the first camera.

$$ \pos_{0} = \pos^{0}_{0} , \enspace \Delta \boldsymbol{\phi}_{0 z} \dot{=} \, \vec{e}^{T}_{z} \boldsymbol{\phi}_{0} = 0 \, , $$

where $\pos^{0}_{0}$ is the initial position of the first camera. Which is equivalent to setting the corresponding columns of the Jacobian of the residual vector to zero, namely $\mat{J}_{\pos_0} = 0$, $\mat{J}_{\Delta \phi_{0 z}} = 0$. Thus, for rotations of the other camera poses, the standard iterative update Eq.~\eqref{eq:opt-rot_std_update} is used, and, for the first camera rotation, $\rot_{0}$, a more convenient parameterization is used. Instead of directly using $\rot_{0}$, a left-multiplicative increment is used.

$$ \rot_{0} = \text{Exp}(\Delta \boldsymbol{\phi}_{0}) \rot^{0}_{0} \, , $$

where the rotation vector $\Delta \boldsymbol{\phi}_{0}$ is initialized to zero and updated.

Gauge Prior

Gauge prior augments the objective function with an additional penalty to favor a solution that satisfies certain constraints in a soft manner.

$$ \norm{\vec{e}^{\pos}_{0}}^{2}_{\Sigma^{\pos}_{0}} \, , \quad \text{where} \quad \vec{e}^{\pos}_{0}(\boldsymbol{\theta}) \enspace \dot{=} \enspace (\pos_{0} - \pos^{0}_{0}, \enspace \Delta \phi_{0 z}) $$

Free gauge

Free gauge is the most general, lets the optimization parameters evolve freely. In order to deal with the singularity with the Hessian, the pseudo inverse is used or some preconditioning method inorder to make the Hessian well-conditioned and invertible.

Shannon Information

$$ \begin{align} H(\mathbf{x}) &= - \E{\ln p(\mathbf{\vec{x}})} \\ &= - \int p(\vec{x}) \ln p(\vec{x}) dx \\ &= - \int_{-\infty}^{\infty} p(\vec{x}) \ln \left( \frac{1}{\sqrt{(2 \pi)^{N} \text{det}({\mathbf{\Sigma}})}} \exp\left( -\dfrac{1}{2} (\vec{x} - \boldsymbol{\mu})^{T} \Sigma^{-1} (\vec{x} - \boldsymbol{\mu}) \right) \right) \\ &= - \int_{-\infty}^{\infty} p(\vec{x}) \left( - \frac{1}{2} \ln (2 \pi)^{N} \text{det}({\mathbf{\Sigma}}) -\dfrac{1}{2} (\vec{x} - \boldsymbol{\mu})^{T} \Sigma^{-1} (\vec{x} - \boldsymbol{\mu}) \right) \\ &= \int_{-\infty}^{\infty} \frac{1}{2} \ln (2 \pi)^{N} \text{det}({\mathbf{\Sigma}}) \enspace p(\vec{x}) + \int_{-\infty}^{\infty} \dfrac{1}{2} (\vec{x} - \boldsymbol{\mu})^{T} \Sigma^{-1} (\vec{x} - \boldsymbol{\mu}) \enspace p(\vec{x}) \\ &= \int_{-\infty}^{\infty} \frac{1}{2} \ln (2 \pi)^{N} \text{det}({\mathbf{\Sigma}}) \enspace p(\vec{x}) + \int_{-\infty}^{\infty} \dfrac{1}{2} (\vec{x} - \boldsymbol{\mu})^{T} \Sigma^{-1} (\vec{x} - \boldsymbol{\mu}) \enspace p(\vec{x}) \\ &= \frac{1}{2} \ln (2 \pi)^{N} \text{det}({\mathbf{\Sigma}}) \int_{-\infty}^{\infty} p(\vec{x}) + \int_{-\infty}^{\infty} \dfrac{1}{2} (\vec{x} - \boldsymbol{\mu})^{T} \Sigma^{-1} (\vec{x} - \boldsymbol{\mu}) \enspace p(\vec{x}) \\ &= \frac{1}{2} \ln (2 \pi)^{N} \text{det}({\mathbf{\Sigma}}) + \int_{-\infty}^{\infty} \dfrac{1}{2} (\vec{x} - \boldsymbol{\mu})^{T} \Sigma^{-1} (\vec{x} - \boldsymbol{\mu}) \enspace p(\vec{x}) \\ &= \frac{1}{2} \ln (2 \pi)^{N} \text{det}({\mathbf{\Sigma}}) + \E{ \dfrac{1}{2} (\vec{x} - \boldsymbol{\mu})^{T} \Sigma^{-1} (\vec{x} - \boldsymbol{\mu}) } \end{align} $$ $$ \begin{align} % Line 1 (\vec{x} - \boldsymbol{\mu})^{T} \mathbf{\Sigma}^{-1} (\vec{x} - \boldsymbol{\mu}) &= \tr{ \mathbf{\Sigma}^{-1} (\vec{x} - \boldsymbol{\mu}) (\vec{x} - \boldsymbol{\mu})^{T} } \\ \end{align} $$ $$ \begin{align} &= \frac{1}{2} \ln (2 \pi)^{N} \text{det}({\mathbf{\Sigma}}) + \dfrac{1}{2} \E{ (\vec{x} - \boldsymbol{\mu})^{T} \Sigma^{-1} (\vec{x} - \boldsymbol{\mu}) } \\ &= \frac{1}{2} \ln (2 \pi)^{N} \text{det}({\mathbf{\Sigma}}) + \dfrac{1}{2} \E{ \tr{ \mathbf{\Sigma}^{-1} (\vec{x} - \boldsymbol{\mu}) (\vec{x} - \boldsymbol{\mu})^{T} } } \\ &= \frac{1}{2} \ln (2 \pi)^{N} \text{det}({\mathbf{\Sigma}}) + \dfrac{1}{2} \tr{ \E{ \mathbf{\Sigma}^{-1} (\vec{x} - \boldsymbol{\mu}) (\vec{x} - \boldsymbol{\mu})^{T} } } \\ &= \frac{1}{2} \ln (2 \pi)^{N} \text{det}({\mathbf{\Sigma}}) + \dfrac{1}{2} \tr{ \mathbf{\Sigma}^{-1} \E{ (\vec{x} - \boldsymbol{\mu}) (\vec{x} - \boldsymbol{\mu})^{T} } } \\ &= \frac{1}{2} \ln (2 \pi)^{N} \text{det}({\mathbf{\Sigma}}) + \dfrac{1}{2} \tr{ \mathbf{\Sigma}^{-1} \mathbf{\Sigma} } \\ &= \frac{1}{2} \ln (2 \pi)^{N} \text{det}({\mathbf{\Sigma}}) + \dfrac{1}{2} \tr{ \eye } \\ &= \frac{1}{2} \ln (2 \pi)^{N} \text{det}({\mathbf{\Sigma}}) + \dfrac{1}{2} N \\ &= \frac{1}{2} \ln (2 \pi)^{N} \text{det}({\mathbf{\Sigma}}) + \dfrac{1}{2} N \ln e \\ &= \frac{1}{2} \ln (2 \pi)^{N} \text{det}({\mathbf{\Sigma}}) + \dfrac{1}{2} \ln e^{N} \\ \end{align} $$ $$ \begin{align} H(\vec{x}) = \frac{1}{2} \ln (2 \pi e)^{N} \text{det}({\mathbf{\Sigma}}) \end{align} $$

Model Predictive Control

Position tracking is achieved by means of a cascaded connection of a Model Predictive Controller (MPC) for the MAV position and a PID controller for its attitude. This approach is motivated by the fact the majority of commercially available flight controllers come with a pre-implemented attitude controller which requires little or even no tuning, enabling easy adaptation to a wide range of platforms.

Linear Model

The linear model of the MAV in navigation frame $\frame_{N}$ (or body frame) is,

$$ \dot{\vec{v}}_{N} = \begin{bmatrix} g \dot{\theta} - c_{x} \dot{x} \\ -g \dot{\phi} - c_{y} \dot{y} \\ T - c_{z} \dot{z} \end{bmatrix} $$

The closed loop attitude dynamics is modelled as a first order system. Namely the roll and pitch are:

$$ \begin{align} \ddot{\theta} &= -b_{\ddot{\theta}\theta} \theta -b_{\ddot{\theta}\dot{\theta}} \dot{\theta} +b_{\theta^{r}} \theta^{r} \\ \ddot{\phi} &= -b_{\ddot{\phi}\phi} \phi -b_{\ddot{\phi}\dot{\phi}} \dot{\phi} +b_{\phi^{r}} \phi^{r} \end{align} $$

where $b_{(\cdot)}$ are constants of the first order system. The values of these constants are obtained by performing system identification of the MAV. Yaw is omitted above because the reference yaw command $\psi^{r}$ will be passed directly to the inner-loop attitude control, therefore it is not considered here in the outer-loop linear MPC controller.

The state vector $\state$ is,

$$ \state = \begin{bmatrix} x \enspace \dot{x} \enspace \theta \enspace \dot{\theta} \enspace \enspace y \enspace \dot{y} \enspace \phi \enspace \dot{\phi} \enspace \enspace \enspace z \enspace \dot{z} \end{bmatrix} \in \real^{10} $$

The input vector $\vec{u}$ to the linear model contains reference roll $\theta^{r}$, pitch $\phi^{r}$ and thrust $T^{r}$, or written as

$$ \vec{u} = \begin{bmatrix} \theta^{r} \enspace \phi^{r} \enspace T^{r} \end{bmatrix} $$

Time-invariant state space representation:

$$ \begin{align} \dot{\state} &= \underbrace{ \begin{bmatrix} \mat{A}_{\text{LON}} & \vec{0} & \vec{0} \\ \vec{0} & \mat{A}_{\text{LAT}} & \vec{0} \\ \vec{0} & \vec{0} & \mat{A}_{\text{ALT}} \end{bmatrix} }_{\mat{A}} \vec{x} + \underbrace{ \begin{bmatrix} \mat{B}_{\text{LON}} & \vec{0} & \vec{0} \\ \vec{0} & \mat{B}_{\text{LAT}} & \vec{0} \\ \vec{0} & \vec{0} & \mat{B}_{\text{ALT}} \end{bmatrix} }_{\mat{B}} \vec{u} \\ \vec{y} &= \mat{C} \vec{x} \end{align} $$

where

$$ \begin{align} \mat{A}_{\text{LON}} &= \begin{bmatrix} 0 & 1 & 0 \\ 0 & -c_{x} & g \\ 0 & 0 & -b_{\theta} \end{bmatrix} ,& &\quad &\mat{B}_{\text{LON}} &= \begin{bmatrix} 0 \\ 0 \\ b_{\theta^{r}} \end{bmatrix} \\ \mat{A}_{\text{LAT}} &= \begin{bmatrix} 0 & 1 & 0 \\ 0 & -c_{x} & -g \\ 0 & 0 & -b_{\phi} \end{bmatrix} ,& &\quad &\mat{B}_{\text{LAT}} &= \begin{bmatrix} 0 \\ 0 \\ b_{\phi^{r}} \end{bmatrix} \\ \mat{A}_{\text{ALT}} &= \begin{bmatrix} 0 & 1 \\ 0 & -c_{z} \end{bmatrix} ,& &\quad &\mat{B}_{\text{ALT}} &= \begin{bmatrix} 0 \\ 1 \end{bmatrix} \end{align} $$ $$ % C Matrix \mat{C} = \mat{I}_{8 \times 8} . $$

Since the controller is implemented in discrete time, the equations are discretized using zero order hold for the input $u$. The discrete equivalent of the matrices $\mat{A}$, $\mat{B}$ and $\mat{C}$ can be obtained by.

$$ \begin{align} \mat{A}_{d} &= e^{\mat{A} dt} \\ \mat{B}_{d} &= \left( \int_{0}^{dt} e^{\mat{A}\tau} d\tau \right) \mat{B} \\ \mat{C}_{d} &= \mat{C} \end{align} $$

The discretization step $dt$ coincides with the control update rate and was set to 50ms.

MPC Formulated as a QP Problem

Our goal is to obtain an optimal input sequence $\bar{\vec{u}}^{\ast} = \vec{u}_{0}^{\ast} \dots \vec{u}_{N-1}^{\ast}$ which is the solution of the following optimization problem:

$$ \begin{align} & \bar{\vec{u}}^{\ast} = \argmin{\vec{u}_{0} \dots \vec{u}_{N - 1}} J, \\ s.t. : \enspace & \state_{k + 1} = \mat{A}_{d} \state_{k} + \mat{B}_{d} \vec{u}_{k}, \\ & \mathbf{y}_{k} = \mat{C}_{d} \state_{k}, \\ & \mathbf{x}_{0} = \hat{\mat{x}}_{0}, \\ & \hat{\vec{u}}_{\text{min}} \leq \vec{u} \leq \vec{u}_{\text{max}} \end{align} $$ $$ J = \sum_{k = 0}^{N - 1} \left( \norm{\mat{Q}_{k + 1} (\vec{y}_{k + 1} - \vec{s}_{k + 1}^{y})}_{2}^{2} + \norm{\mat{R}_{k} (\vec{u}_{k} - \vec{s}_{k}^{u})}_{2}^{2} \right) $$

where:

$\state_{k + 1} \in \real^{n}$: system state at time $k$
$\hat{\state}_{0} \in \real^{n}$: estimated state at time 0
$\vec{y}_{k} \in \real^{p}$: system output at time $k$
$\vec{u}_{k}$: system input at time $k$
$\vec{s}_{k}^{y} \in \real^{p}$: reference output at time $k$ (given)
$\vec{s}_{k}^{u} \in \real^{m}$: reference input at time $k$ (given)
$\mat{R}_{k} \in \real^{m \times m}$: input gain matrix (tuning parameters)
$\mat{Q}_{k} \in \real^{n \times n}$: output gain matrix (tuning parameters)

By concatenating the two squared 2-norms that appear in the cost function $J$, we can rewrite it as:

$$ J = \norm{\begin{matrix} \mat{Q}_{1} (\vec{y}_{1} - \vec{s}_{1}^{y}) \\ \mat{Q}_{2} (\vec{y}_{2} - \vec{s}_{2}^{y}) \\ \vdots \\ \mat{Q}_{N} (\vec{y}_{N} - \vec{s}_{N}^{y}) \\ \mat{R}_{0} (\vec{u}_{0} - \vec{s}_{0}^{u}) \\ \mat{R}_{1} (\vec{u}_{1} - \vec{s}_{1}^{u}) \\ \vdots \\ \; \mat{R}_{N-1} (\vec{u}_{N-1} - \vec{s}_{N-1}^{u}) \\ \end{matrix}}_{2}^{2} $$

and stacking the $\mat{Q}$ and $\mat{R}$ as,

$$ J = \norm{ \begin{matrix} \; \bar{\mat{Q}}(\bar{\vec{y}} - \bar{\vec{s}}^{y}) \\ \; \bar{\mat{R}}(\bar{\vec{u}} - \bar{\vec{s}}^{u}) \end{matrix} }_{2}^{2}. $$

The problem with the current formulation is the equality constraints $\vec{x}_{k + 1}$, $\vec{y}_{k}$ and $\state_{0}$ may not be valid in practice due to imperfect model, and/or sensor measurement noise. If the equality constraints are invalid the optimized solution will not be feasible. Instead, the equality constraints can be eliminated by rewriting $\bar{\vec{y}}$ to depend only on the initial state $\state_{0}$ instead of $\state_{k - 1}$. In other words from this,

$$ \begin{align} \state_{1} &= \mat{A}_{d} \state_{0} + \mat{B}_{d} \vec{u}_{0} \\ \state_{2} &= \mat{A}_{d} \state_{1} + \mat{B}_{d} \vec{u}_{1} \\ \state_{3} &= \mat{A}_{d} \state_{2} + \mat{B}_{d} \vec{u}_{2} \\ & \qquad \qquad \vdots \\ \state_{N} &= \mat{A}_{d} \state_{N-1} + \mat{B}_{d} \vec{u}_{N-1} \end{align} $$

to this,

$$ \begin{align} \state_{1} &= \mat{A}_{d} \state_{0} + \mat{B}_{d} \vec{u}_{0} \\ \state_{2} &= \mat{A}_{d}^{2} \state_{0} + \mat{A}_{d} \mat{B}_{d} \vec{u}_{0} + \mat{B}_{d} \vec{u}_{1} \\ \state_{3} &= \mat{A}_{d}^{3} \state_{0} + \mat{A}_{d}^{2} \mat{B}_{d} \vec{u}_{0} + \mat{A}_{d} \mat{B}_{d} \vec{u}_{1} + \mat{B}_{d} \vec{u}_{2} \\ & \qquad \qquad \qquad \vdots \\ \state_{N} &= \mat{A}_{d}^{N} \state_{0} + \mat{A}_{d}^{N-1}\mat{B}_{d} \vec{u}_{0} + \dots + \mat{B} \vec{u}_{N-1} \bar{\state} = \mathbf{\Phi} \state_{0} + \mathbf{\Gamma} \bar{\vec{u}} \end{align} $$

where

$$ \begin{align} % xbar &\bar{\state} = \begin{bmatrix} \state_{1} \\ \state_{2} \\ \dot{\vec{v}} \\ \state_{N} \end{bmatrix}, % Phi &\mathbf{\Phi} = \begin{bmatrix} \mat{A}_{d} \\ \mat{A}_{d}^{2} \\ \dot{\vec{v}} \\ \mat{A}_{d}^{N} \end{bmatrix} \\ % Gamma &\mathbf{\Gamma} = \begin{bmatrix} \mat{B}_{d} & \mathbf{0} & \dots & \mathbf{0} \\ \mat{A}_{d} \mat{B}_{d} & \mat{B}_{d} & \dots & \mathbf{0} \\ \vdots & \vdots & \ddots & \vdots \\ \mat{A}_{d}^{N-1} \mat{B}_{d} & \mat{A}_{d}^{N-2} \mat{B}_{d} & \dots & \mat{B}_{d} \\ \end{bmatrix}, % ubar &\bar{\vec{u}} = \begin{bmatrix} \vec{u}_{0} \\ \vec{u}_{1} \\ \vdots \\ \vec{u}_{N-1} \end{bmatrix} . \end{align} $$

Rewriting $\bar{\vec{y}}$ with the above,

$$ \bar{\vec{y}} = \bar{\mat{C}} \bar{\vec{x}} = \bar{\mat{C}} \mathbf{\Phi} \hat{\state} + \bar{\mat{C}} \mathbf{\Gamma} \bar{\vec{u}}, $$

and substituting into the cost function $J$, collect the $\bar{\vec{u}}$ terms and rearrange so that it is in the form of $\mat{A}\vec{\state} - \vec{b}$,

$$ \begin{align} % Line 1 J &= \norm{\begin{matrix} \bar{\mat{Q}} (\bar{\mat{C}} \mathbf{\Phi} \state_{0} + \mathbf{\Gamma} \bar{\vec{u}} - \bar{\vec{s}}^{y}) \\ \bar{\mat{R}} (\bar{\vec{u}} - \bar{\vec{s}}^{u}) \end{matrix}}_{2}^{2} \\ % Line 2 &= \norm{\begin{matrix} \bar{\mat{Q}} \bar{\mat{C}} \mathbf{\Phi} \state_{0} + \bar{\mat{Q}} \mathbf{\Gamma} \bar{\vec{u}} - \bar{\mat{Q}} \bar{\vec{s}}^{y} \\ \bar{\mat{R}} \bar{\vec{u}} - \bar{\mat{R}} \bar{\vec{s}}^{u} \end{matrix}}_{2}^{2} \\ % Line 3 &= \norm{ \underbrace{ \left(\begin{matrix} \bar{\mat{Q}} \bar{\mat{C}} \mathbf{\Gamma} \\ \bar{\mat{R}} \end{matrix}\right) \bar{\vec{u}} - \left(\begin{matrix} \bar{\mat{Q}} \bar{\vec{s}}^{y} + \bar{\mat{Q}} \bar{\mat{C}} \mathbf{\Phi} \state_{0} \\ \bar{\mat{R}} \bar{\vec{s}}^{u} \end{matrix}\right) }_{\mat{A}\vec{x} - \vec{b}} }_{2}^{2} \end{align} $$

then expanding the equation out and ignoring the constant term (i.e. $\vec{b}^{T}\vec{b}$) gives,

$$ \begin{align} J = \underbrace{ \bar{\vec{u}}^{T} \left(\begin{matrix} \bar{\mat{Q}} \bar{\mat{C}} \mathbf{\Gamma} \\ \bar{\mat{R}} \end{matrix}\right)^{T} \left(\begin{matrix} \bar{\mat{Q}} \bar{\mat{C}} \mathbf{\Gamma} \\ \bar{\mat{R}} \end{matrix}\right) \bar{\vec{u}} \\ - 2 \left(\begin{matrix} \bar{\mat{Q}} \bar{\vec{s}}^{y} + \bar{\mat{Q}} \bar{\mat{C}} \mathbf{\Phi} \state_{0} \\ \bar{\mat{R}} \bar{\vec{s}}^{u} \end{matrix}\right)^{T} \left(\begin{matrix} \bar{\mat{Q}} \bar{\mat{C}} \mathbf{\Gamma} \\ \bar{\mat{R}} \end{matrix}\right) \bar{\vec{u}} }_{ \vec{x}^{T} \mat{A}^{T}\mat{A}\vec{x} - 2 \vec{b}^{T} \mat{A} \vec{x} } \end{align} $$

With the cost function in quadratic form, the optimization problem is now transformed into the following equivalent QP with inequality constraints:

$$ \begin{align} & \bar{\vec{u}}^{\ast} = \argmin{\mu_{0} \dots \vec{u}_{N - 1}} J , \\ s.t. : & \hat{\vec{u}}_{\text{min}} \leq \vec{u} \leq \vec{u}_{\text{max}} \end{align} $$