A large part of robotics is about developing machines that perceives and
interact with the environment. For that robots use sensors to collect and
process data, and knowing what the data describes is of important for the robot
to navigate and interact with the environment. The following notes will be
using the frame notation as described by Paul Furgale (see link). The aim is to mitigate the
ambiguity that arises when describing robot poses, sensor data and more.
A vector expressed in the world frame, $\frame_{W}$, is written as
$\pos_{W}$. Or more precisely if the vector describes the position of the
camera frame, $\frame_{C}$, expressed in $\frame_{W}$, the vector can be
written as $\pos_{WC}$. The left hand subscripts indicates the coordinate
system the vector is expressed in, while the right-hand subscripts indicate the
start and end points. For brevity if the vector has the same start point as the
frame to which it is expressed in, the same vector can be written as
$\pos_{WC}$. Similarly a transformation of a point from $\frame_{W}$ to
$\frame_{C}$ can be represented by a homogeneous transform matrix, $\tf_{WC}$,
where its rotation matrix component is written as $\rot_{WC}$ and the
translation component written as $\pos_{WC}$. A rotation matrix that is
parametrized by quaternion $\quat_{WC}$ is written as $\rot\{\quat_{WC}\}$.
corresponding to the Hamiltonian convention. The quaternion can be written
as a 4 element vector consisting of a \textit{real} (\textit{scalar}) part,
$q_{w}$, and \textit{imaginary} (\textit{vector}) part $\quat_{v}$ as,
There are other quaternion conventions, for example, the JPL convention. A
more detailed discussion between Hamiltonian and JPL quaternion convention is
discussed in \cite{Sola2017}.
Main Quaternion Properties
Sum
Let $\vec{p}$ and $\vec{q}$ be two quaternions, the sum of both
quaternions is,
The quaternion product is not commutative in the general case. There
are exceptions to the general non-commutative rule, where either $\vec{p}$ or
$\vec{q}$ is real such that $\vec{p}_{v} \times \vec{q}_{v} = 0$, or when both
$\vec{p}_v$ and $\vec{q}_v$ are parallel, $\vec{p}_v || \vec{q}_v$. Only in
these cirmcumstances is the quaternion product commutative.,
TODO: Need to reword the beginning.
Using the properties of the cross and dot product
$$
\begin{align}
\vec{u} \cdot \vec{v} &=
\norm{\vec{u}} \norm{\vec{v}} \cos \theta \\
\norm{\vec{u} \times \vec{v}} &=
\norm{\vec{u}} \norm{\vec{v}} \norm{\sin \theta} ,
\end{align}
$$
the axis angle, $\boldsymbol{\theta} \in \real^{3}$, can be obtained from
$\vec{u}$ and $\vec{v}$ with
where $\vec{e}$ is the unit vector that defines the rotation axis and
$\theta$ is the rotation angle about $\vec{e}$. Once the axis angle,
$\boldsymbol{\theta}$, is obtained a quaternion can be formed
Example: Attitude from gravity and accelerometer vectors
In robotics knowing the attitude of the system is often required. An
Inertial Measurement Unit (IMU) is commonly used to obtain this information.
Using the method described previously, a gravity vector along with an
accelerometer measurement vector can be used to obtain an attitude in form of a
quaternion.
Let $\vec{g} \in \Real{3}$ be the gravity vector, and $\vec{a}_{m} \in
\Real{3}$ be the accelerometer measurement from an IMU. With the two vectors
$\vec{g}$ and $\vec{a}_{m}$ a quaternion $\quat_{WS}$ expressing the rotation
of the IMU sensor frame, $\frame_{S}$, with respect to the world frame,
$\frame_{W}$, can be calculated given that values for $\vec{g}$ and
$\vec{a}_{m}$ are known. For example let
taken from the first measurement of the imu_april calibration
sequence of the EuRoC MAV dataset.
Before calculating the axis-angle, however, it should be noted that when an
accelerometer is at rest the measurement reading in the z-axis is positive
instead of negative. The reason is accelerometers measures acceleration by
measuring the displacement of a proof mass that is suspended with springs. For
example, if gravity is ignored and the accelerometer moves upwards, the proof
mass will be displaced towards the bottom of the accelerometer. This is
interpreted as an acceleration in the upwards direction, and so when the
accelerometer is at rest on a flat surface, gravity pulls on the proof mass
yeilding a positive measurement in the upwards direction. To resolve this issue
the gravity vector is negated, and so $\vec{u} = -\vec{g}$ and
$\vec{v} = \vec{a}_{m}$. Using :eq:$axis_angle$ the axis-angle obtained
is:
$$
\begin{align}
q_w &= \dfrac{\sqrt{1 + m_{11} + m_{22} + m_{33}}}{2} \\
q_x &= \dfrac{m_{32} - m_{23}}{4 q_w} \\
q_y &= \dfrac{m_{13} - m_{31}}{4 q_w} \\
q_z &= \dfrac{m_{21} - m_{02}}{4 q_w}
\end{align}
$$
Note, while the equations seems straight forward in practice, however,the trace
of the rotation matrix need to be checked inorder to guarantee correctness.
LU Decomposition
Lower–Upper (LU) decomposition or factorization factors a matrix as the
product of a lower triangular matrix and an upper triangular matrix. The
product sometimes includes a permutation matrix as well. LU decomposition can
be viewed as the matrix form of Gaussian elimination. Computers usually solve
square systems of linear equations using LU decomposition, and it is also a key
step when inverting a matrix or computing the determinant of a matrix.
Let $\mat{A}$ be a square matrix. An LU factorization refers to the
factorization of $\mat{A}$, with proper row and/or column orderings or
permutations, into two factors a lower triangular matrix $\mat{L}$ and an
upper triangular matrix $\mat{U}$:
$$
\mat{A} = \mat{L} \mat{U}
$$
In the lower triangular matrix all elements above the diagonal are zero, in the
upper triangular matrix, all the elements below the diagonal are zero. For
example, for a $3 \times 3$ matrix $\mat{A}$, its $\mat{LU}$
decomposition looks like this:
Without a proper ordering or permutations in the matrix, the factorization may
fail to materialize. For example, it is easy to verify (by expanding the matrix
multiplication) that $a_{11} = l_{11} u_{11}$. If $a_{11} = 0$,
then at least one of $l_{11}$ and $u_{11}$ has to be zero, which
implies that either $\mat{L}$ or $\mat{U}$ is singular. This is
impossible if $\mat{A}$ is non-singular (invertible). This is a
procedural problem. It can be removed by simply reordering the rows of A so
that the first element of the permuted matrix is non-zero. The same problem in
subsequent factorization steps can be removed the same way; see the basic
procedure below.
Cholesky Decomposition
Cholesky decomposition or Cholesky factorization is a decomposition of a
Hermitian, positive-definite matrix into the product of a lower triangular
matrix and its conjugate transpose.
$$
\mat{A} = \mat{L} \mat{L}^{\ast}
$$
where $\mat{L}$ is a lower triangular matrix with real and positive diagonal
entries, and $\mat{L}^{\ast}$ is the conjugate transpose of $\mat{L}$.
Inverting a matrix in linear algebra is often avoidable and recommended.
Inverting a Matrix with Cholesky Decomposition
Pseudo Inverse of a Matrix with SVD
Inverting a Matrix with Cholesky Decomposition
Let matrix $\mat{A}$ be invertible, using the identity:
$$
\mat{A} \mat{A}^{-1} = \eye
$$
we can solve for the inverse of $\mat{A}$ by using Cholesky
decomposition, but first lets rewrite the above by first decomposing
$\mat{A} = \mat{L} \mat{L}^{T}$ and rearrange the terms such
that we can take advantage of back substition,
$$
\begin{align}
(\mat{L} \mat{L}^{T}) \mat{A}^{-1} &= \eye \\
(\mat{L}^{T}) \mat{L} \mat{A}^{-1} &= \eye \\
\mat{L} \mat{A}^{-1} &= (\mat{L}^{T})^{-1} \\
\end{align}
$$
Pseudo Inverse of a Matrix with SVD
$$
\mat{A} = \mat{U} \mat{\Sigma} \mat{V}^{T}
$$
$$
\begin{align}
\mat{A}^{\dagger}
&= (\mat{A}^{T} \mat{A})^{-1} \mat{A}^{T} \\
&= ((\mat{U} \mat{\Sigma} \mat{V}^{T})^{T}
(\mat{U} \mat{\Sigma} \mat{V}^{T}))^{-1}
(\mat{U} \mat{\Sigma} \mat{V}^{T})^{T} \\
&= ((\mat{V} \mat{\Sigma} \mat{U}^{T}
\mat{U} \mat{\Sigma} \mat{V}^{T})^{-1}
(\mat{U} \mat{\Sigma} \mat{V}^{T})^{T} \\
&= (\mat{V} \mat{\Sigma}^{2} \mat{V}^{T})^{-1}
\mat{V} \mat{\Sigma} \mat{U}^{T} \\
&= (\mat{V}^{T})^{-1} \mat{\Sigma}^{-2} \mat{V}^{-1}
\mat{V} \mat{\Sigma} \mat{U}^{T} \\
&= \mat{V} \mat{\Sigma}^{-2} \mat{\Sigma} \mat{U}^{T} \\
&= \mat{V} \mat{\Sigma}^{-1} \mat{U}^{T} \\
\end{align}
$$
where $\mat{\Sigma}^{-1}$ is the pseudo inverse of $\mat{\Sigma}$,
which is formed by replacing everey non-zero diagonal entry by its reciprocal
and transposing the resulting matrix.
Invert Lower Triangular Matrix
If we have a lower triangular matrix $\mat{L}$ and our objective is to
find $\mat{L}^{-1}$, we can use the property
There are different condition numbers. In the following the condition number
for the problem $\mat{A} \vec{x} = \vec{b}$ and matrix inversion are
discussed. In general, the condition number, $\kappa(\cdot)$, for a
matrix, $\mat{A}$, or computational task such as $\mat{A} \vec{x} =
\vec{b}$ measures how sensitive the output is to perturbations in the input
data and to round off errors. If the condition number is large, even a small
error in $\vec{x}$ would cause a large error in $\vec{x}$. On the
other hand, if the condition number is small then the error in $\vec{x}$
will not be much bigger than the error in $\vec{b}$.
The condition number is defined more precisely to be the maximum ratio of the
relative error in $\vec{x}$ to the relative error in $\vec{b}$.
Let $\vec{e}$ be the error in $\vec{b}$. Assuming that
$\mat{A}$ is a nonsingular matrix, the error in the solution
$\mat{A}^{-1} \vec{b}$ is $\mat{A}^{-1} \vec{e}$. The ratio of the
relative error in the solution to the relative error in $\vec{b}$ is
When the condition number is exactly one (which can only happen if $\mat{A}$
is a scalar multiple of a linear isometry), then a solution algorithm can find
(in principle, meaning if the algorithm introduces no errors of its own) an
approximation of the solution whose precision is no worse than that of the
data.
However, it does not mean that the algorithm will converge rapidly to this
solution, just that it won't diverge arbitrarily because of inaccuracy on the
source data (backward error), provided that the forward error introduced by the
algorithm does not diverge as well because of accumulating intermediate
rounding errors.
The condition number may also be infinite, but this implies that the problem
is ill-posed (does not possess a unique, well-defined solution for each choice
of data -- that is, the matrix is not invertible), and no algorithm can be
expected to reliably find a solution.
The definition of the condition number depends on the choice of norm, as can
be illustrated by two examples.
Rank
The rank $\rho(\mat{A})$ of a matrix $\mat{A}$ of $n$ rows
and $m$ columns is defined as **the number of independent rows or
columns**. Rank is thus a measure of the "non-degenerateness" of the system of
lienar equations and linear transformation encoded by $\mat{A}$.
Full rank (non-singular): $\rho(\mat{A}) = \min(n, m)$
Not full rank (singular): $\rho(\mat{A}) < \min(n, m)$
Properties
Let $\mat{A}$ be an $m \times n$ matrix, and $\mat{B}$ be an
$n \times k$ matrix,
$$
\begin{align}
\rank{\mat{A}} &\leq \Min{m}{n} \\
\rank{\mat{AB}} &\leq \Min{\rank{\mat{A}}}{\rank{\mat{B}}} \\
\rank{\mat{AB}} + \rank{\mat{BC}}
&\leq \rank{\mat{B}} + \rank{\mat{ABC}} \\
\rank{\mat{A} + \mat{B}} &\leq \rank{\mat{A}} + \rank{\mat{B}} \\
\rank{\mat{A}^{T} \mat{A}}
&= \rank{\mat{A} \mat{A}^{T}}
= \rank{\mat{A}}
= \rank{\mat{A}^{T}} \\
\rank{\mat{0}} &= 0
\end{align}
$$
Trace
$$
\text{tr}(\mat{A}) = \sum_{i} \mat{A}_{ii}
$$
The trace of a matrix $\mat{A}$ is simply the sum of all of its diagonal.
For example, let $\mat{A}$ be a matrix, with
Linear problems generally have the form
$$
\mat{A} \vec{x} = \vec{b}
$$
If $\mat{A}$ is skinny (number of rows is larger than number of columns)
the problem is over constrained and there is no *unique* solution.
Instead, the problem can be solved by minizming the squared error between
$\mat{A} \vec{x}$ and $\vec{b}$. The linear least squares problem
is then defined as,
$$
\min_{\vec{x}} || \mat{A} \vec{x} - \vec{b} ||^{2}_{2}
$$
where the goal is to find an *approximate* solution.
The local minima can be found when the derivative of the squared error is
zero. First the squared error is expanded to give:
$$
(\mat{A} \vec{x} - \vec{b})^{T} (\mat{A} \vec{x} - \vec{b}) \\
\vec{x}^{T} \mat{A}^{T} \mat{A} \vec{x} - 2 \vec{b}^{T} \mat{A} \vec{x}
$$
then by differentiating the expanded squared error with respect to
$\vec{x}$, setting the derivative to zero, and rearranging the equation
with respect to $\vec{x}$ gives the following:
$$
\begin{align}
2 \vec{x}^{T} \mat{A}^{T} \mat{A} - 2 \vec{b}^{T} \mat{A} &= 0 \\
\vec{x}^{T} \mat{A}^{T} \mat{A} &= \vec{b}^{T} \mat{A} \\
\mat{A}^{T} \mat{A} \vec{x} &= \mat{A}^{T} \vec{b} \\
\vec{x} &= \left( \mat{A}^{T} \mat{A} \right)^{-1} \mat{A}^{T} \vec{b} \\
\vec{x} &= \mat{A}^{\dagger} \vec{b} \enspace,
\end{align}
$$
where $\left( \mat{A}^{T} \mat{A} \right)^{-1}
\mat{A}^{T}$ is known as the pseudo inverse $\mat{A}^{\dagger}$.
$$
\boxed{
x_{i} = \dfrac{b_{i} - \sum_{j=i+1}^{1} u_{ij} x_{i}}{u_{ii}}
\quad
\text{where} \; i = n, n - 1, \cdots, 1
}.
$$
Solve Least Squares with SVD
To solve $\mat{A} \vec{x} = \vec{b}$ with non-singular $\mat{A} \in
\real^{n \times n}$, lets factor $\mat{A}$ using SVD and rearrange the
terms gives,
$$
\begin{align}
\mat{A} \vec{x} &= \vec{b} \\
\mat{U} \mat{\Sigma} \mat{V}^{T} \vec{x} &= \vec{b} \\
\mat{\Sigma} \mat{V}^{T} \vec{x} &= \mat{U}^{T} \vec{b}
\end{align}
$$
Note: $\mat{U}$ and $\mat{V}$ are orthogonal matrices, therefore
the inverse is its transpose. Let $\vec{y} = \vec{V}^{T}
\vec{x}$ and subbing into the above gives,
$$
\mat{\Sigma} \vec{y} = \mat{U}^{T} \vec{b},
$$
solve $\vec{y}$ via forward substitution. Once $\vec{y}$ is known
solve for $\vec{x}$ in,
$$
\mat{V}^{T} \vec{x} = \vec{y}
$$
using back-substitution.
Solve Least Squares with QR
To solve $\mat{A} \vec{x} = \vec{b}$ with non-singular $\mat{A} \in
\real^{n \times n}$, lets factor $\mat{A}$ using QR decomposition and
rearrange the terms gives,
$$
\begin{align}
\mat{A} \vec{x} &= \vec{b} \\
\mat{Q} \mat{R} \vec{x} &= \vec{b} \\
\mat{R} \vec{x} &= \mat{Q}^{T} \vec{b}.
\end{align}
$$
Note: $\mat{Q}$ is an orthogonal matrix, therefore the inverse of
$\mat{Q}$ is its transpose. The R.H.S. of the last equation is simply
matrix products of $\mat{Q}^{T}$, and $\vec{b}$ which are
known. Once the R.H.S is computed, $\vec{x}$ can be solved using
back-substitution.
Solve Least Squares with Cholesky Decomposition
To solve $\mat{A} \vec{x} = \vec{b}$ with non-singular $\mat{A} \in
\real^{n \times n}$, lets factor $\mat{A}$ using Cholesky decomposition
gives,
$$
\begin{align}
\mat{A} \vec{x} &= \vec{b} \\
\mat{L} \mat{L}^{T} \vec{x} &= \vec{b},
\end{align}
$$
let $\vec{y} = \mat{L}^{T} \vec{x}$, subbing into the above,
$$
\mat{L} \vec{y} = \vec{b}.
$$
Solve for $\vec{y}$ using forward-substitution, and then solve for
$\vec{x}$ in
$$
\mat{L}^{T} \vec{x} = \vec{y}
$$
using backward-substitution.
Non-linear Least Squares
Gauss Newton
$$
\begin{align}
\argmin_{\vec{x}} J(\vec{x})
&=
\dfrac{1}{2}
\sum_{i}
\vec{e}_{i}^{T} \mat{W} \vec{e}_{i} \\
&=
\dfrac{1}{2} \enspace
\vec{e}_{i}^{T}(\vec{x})
\mat{W}
\vec{e}_{i}(\vec{x})
\end{align}
$$
where the error function, $\vec{e}(\cdot)$, depends on the optimization
parameter, $\vec{x} \in \real^{n}$. The error function,
$\vec{e}(\cdot)$, has a form of
$$
\vec{e}_{i} = \vec{z} - \vec{h}(\vec{x})
$$
is defined as the difference between the measured value, $\vec{z}$, and
the estimated value calculated using the measurement function,
$\vec{h}(\cdot)$. Since the error function, $\vec{e}(\vec{x})$, is
non-linear, it is approximated with the first-order Taylor series,
$$
\vec{e}(\vec{x})
\approx
\vec{e}(\bar{\vec{x}}) +
\mat{E}(\bar{\vec{x}}) \Delta\vec{x}
$$
where $\mat{E}(\bar{\vec{x}}) =
\dfrac{\partial\vec{e}(\vec{x})}{\partial\vec{x}} \bigg\rvert_{\vec{x}_{k}}$
and $\Delta{\vec{x}} = \vec{x} - \bar{\vec{x}}$.
$$
\dfrac{\partial{J}}{\partial{\vec{x}}} =
\dfrac{\partial{J}}{\partial{\vec{e}}}
\dfrac{\partial{\vec{e}}}{\partial{\vec{x}}}
$$
$$
\begin{align}
\dfrac{\partial{J}}{\partial{\vec{e}}} &=
\dfrac{1}{2} \vec{e}^{T}(\vec{x}) \mat{W} \vec{e}(\vec{x}) =
\vec{e}^{T}(\vec{x}) \mat{W} \\
%
\dfrac{\partial{\vec{e}}}{\partial{\vec{x}}} &=
\vec{e}(\bar{\vec{x}}) +
\mat{E}(\bar{\vec{x}}) \Delta\vec{x} =
\mat{E}(\bar{\vec{x}})
\end{align}
$$
$$
\begin{align}
\dfrac{\partial{J}}{\partial{\vec{x}}}
&=
(\vec{e}^{T}(\vec{x}) \mat{W}) (\mat{E}(\bar{\vec{x}})) \\
% Line 2
&=
(
\vec{e}(\bar{\vec{x}}) + \mat{E}(\bar{\vec{x}}) \Delta\vec{x}
)^{T} \mat{W}
\mat{E}(\bar{\vec{x}}) \\
% Line 3
&=
\vec{e}^{T}(\bar{\vec{x}}) \mat{W} \mat{E}(\bar{\vec{x}})
+ \Delta\vec{x}^{T}
\mat{E}(\bar{\vec{x}})^{T} \mat{W} \mat{E}(\bar{\vec{x}})
= 0 \\
\end{align}
$$
$$
\begin{align}
% Line 4
\Delta\vec{x}^{T}
\mat{E}(\bar{\vec{x}})^{T} \mat{W} \mat{E}(\bar{\vec{x}})
&=
- \vec{e}^{T}(\bar{\vec{x}}) \mat{W} \mat{E}(\bar{\vec{x}}) \\
% Line 5
\underbrace{
\mat{E}(\bar{\vec{x}})^{T} \mat{W} \mat{E}(\bar{\vec{x}})
}_{\mat{H}}
\Delta\vec{x}
&=
\underbrace{
- \mat{E}(\bar{\vec{x}})^{T} \mat{W} \vec{e}(\bar{\vec{x}})
}_{\vec{b}}
\end{align}
$$
Solve the normal equations $\mat{H}\Delta\vec{x} = \vec{b}$ for $\Delta\vec{x}$
using the Cholesky or QR-decompositon. Once $\Delta\vec{x}$ is found the best
estimate $\bar{\vec{x}}$ can be updated via,
$$
\bar{\vec{x}}_{k + 1} = \bar{\vec{x}}_{k} + \Delta\vec{x}.
$$
Pinhole Camera Model
The pinhole camera model describes how 3D scene points are projected onto the
2D image plane of an ideal pinhole camera. The model makes the assumption that
light rays emitted from an object in the scene pass through the pinhole of the
camera, and projected onto the image plane.
A 3D point $\vec{p}_{C} = [p_x \quad p_y \quad p_z]^{T}$
expressed in the camera frame, $\frame_{C}$, projected on to the camera's 2D
image plane $(u, v)$ is written as,
$$
u = \dfrac{p_{x} \cdot f_{x}}{p_{z}} + c_{x}
\quad \quad
v = \dfrac{p_{y} \cdot f_{y}}{p_{z}} + c_{y}
$$
where $f_{x}$ and $f_{y}$ denote the focal lengths, $c_{x}$
and $c_{y}$ represents the principal point offset in the $x$ and
$y$ direction. Or, in matrix form
In practice, the pinhole camera model only serves as an approximation to
modern cameras. The assumptions made in the model are often violated with
factors such as large camera apertures (pinhole size), distortion effects in
camera lenses, and other factors. That is why the pinhole camera model is often
used in combination with a distortion model in the hope of minimizing
projection errors from 3D to 2D. Common distortion models used in conjuction
with the pinhole camera model includes:
Lens distortion generally exist in all camera lenses, therefore it is vital
the distortions observed are modelled. The most common distortion model is the
radial-tangential (or simply as radtan) distortion model. The two main
distortion components, as the name suggests, are the radial and tangential
distortion.
Radial distortion occurs due to the shape of the lens, where light passing
through the center undergoes no refraction, and light passing through the edges
of the lens, undergoes through severe bending causing the radial
distortion.
Tangential distortion, on the other hand, is mainly due to camera sensor
mis-alignment during the manufacturing process. It occurs when the camera
sensor is not in parallel with the lens.
The combined radial-tangential distortion is modelled using a polynomial
approximation with parameters $k_{1}, k_{2}$ and $p_{1}, p_{2}$ respectively.
To apply the distortion the observed 3D point $\vec{p} = [x \quad y \quad
z]^{T}$ is first projected, distorted, and finally scaled and offset in the
image plane $(u, v)$.
$$
\begin{align}
x &= X / Z \\
y &= Y / Z \\
r^2 &= x^2 + y^2 \\ \\
x' &= x \cdot (1 + (k_1 r^2) + (k_2 r^4)) \\
y' &= y \cdot (1 + (k_1 r^2) + (k_2 r^4)) \\
x'' &= x' + (2 p_1 x y + p_2 (r^2 + 2 x^2)) \\
y'' &= y' + (p_1 (r^2 + 2 y^2) + 2 p_2 x y)
\end{align}
$$
There are various methods for triangulating a 3D point obeserved from at
least two camera views. The linear triangulation method [Hartley2003] is
frequently used. This method assumes a pair of homogeneous points $\vec{x}$ and
$\vec{x}' \in \real^{3}$ in the image plane that observes the same 3D point,
$\vec{X} \in \real^{4}$, in homogeneous coordinates from two different camera
frames. The homogeneous projection from 3D to 2D with a known camera matrix
$\mat{P} \in \real^{3 \times 4}$ for each measurement is given as,
Taking avantage of the fact if two vectors $\vec{x}$ and $\mat{P}\vec{X}$ have
the same direction then $\vec{x} \times \mat{P} \vec{X} = 0$. These equations
can be combined to form a system of equations of the form $\mat{A} \vec{x} =
\vec{0}$. To eliminate the homogeneous scale factor we apply a cross product to
give three equations for each image point, for example $\vec{z} \times (\mat{P}
\mat{X}) = \vec{0}$ writing this out gives
$$
x (\vec{p}^{3T} \vec{X}) - (\vec{p}^{1T} \vec{X}) = 0 \\
y (\vec{p}^{3T} \vec{X}) - (\vec{p}^{2T} \vec{X}) = 0 \\
x (\vec{p}^{2T} \vec{X}) - y (\vec{p}^{1T} \vec{X}) = 0
$$
where $\vec{p}^{iT}$ is the $i^{\text{th}}$ row of $\vec{P}$.
Note that the third line in the above equation is a linear combination of the
first two, ($c_1$ times first line plus $c_2$ times second line =
third line), as such the third line spans the space of the first two equations
and therefore is redundant.
From the above, an equation of the form $\mat{A} \vec{x} = \vec{0}$ for
each image point can be formed, where $\vec{x}$ represents the unknown
homogeneous feature location to be estimated, and $\mat{A}$ is given as
giving a total of four equations in four homogeneous unknowns. Solving for
$\vec{A}$ using SVD allows us to estimate the initial feature location.
In an ideal world, the position of 3D points can be solved as a system of
equations using the linear triangulation method. In reality, however, errors
are present in the camera poses and pixel measurements. The pixel measurements
observing the same 3D point are generally noisy. In addition, the camera models
and distortion models used often do not model the camera projection or
distortion observed perfectly. Therefore to obtain the best results an
iterative method should be used. This problem is generally formulated as a
non-linear least square problem and can be solved by numerical methods, such as
the Gauss-Newton algorithm.
References
[Hartley2003]: Hartley, Richard, and Andrew Zisserman. Multiple view geometry
in computer vision. Cambridge university press, 2003.
Optical Flow
Optical flow estimates the velocity of each image feature in successive
images of a scene. It makes the following explicit assumptions:
Pixel intensity does not change between consecutive frames
Displacement of features is small
Features are within the same local neighbourhood
Let us consider a pixel, $p$, in the first frame which has an intensity,
$I(x, y, t)$, where it is a function of the pixel location, $x$ and $y$, and
time, $t$. If we apply the aforementioned assumptions, we can say that the
intensity of said pixel in the first frame to the second does not change.
Additionally, if there was a small displacement, $dx$ and $dy$, and small time
difference, $dt$, between images this can be written in mathematical form
as,
$$
I(x, y, t) = I(x + dx, y + dy, t + dt).
$$
This is known as the brightness constancy equation. To obtain the image
gradient and velocity of the pixel, we can use Taylor series approximation of
right-hand side of the above to get,
$$
\begin{align}
&I(x + dx, y + dy, t + dt) \\
&= I(x, y, t)
+ \dfrac{\partial{I}}{\partial{x}} dx
+ \dfrac{\partial{I}}{\partial{y}} dy
+ \dfrac{\partial{I}}{\partial{t}} dt
+ \dots
\end{align}
$$
removing common terms and dividing by $dt$ we get,
The image gradients along the x and y directions are $I_{x}$ and $I_{y}$,
where $I_{t}$ is the image gradient along time, finally, $v_{x}$ and $v_{y}$
are the pixel velocity in $x$ and $y$ directions, which is unknown. The problem
with with the above is that it provides a single constraint with two degrees of
freedom, and as such requires at least one additional constraint to identify a
solution.
The Lucas-Kanade method solves the aperture problem by introducing
additional conditions. This method assumes all pixels within a window centered
around a pixel $p$ will have similar motion, and that the window size is
configurable. For example, a window size of $3 \times 3$ around the pixel $p$,
the $9$ points within the window should have a similar motion. Using the
intensity inside the window must therefore satisfy,
The linear system of equations above is over-determined, therefore there is
no exact solution. To address this issue, a least squares method can be used to
approximate the solution by applying the ordinary least squares. For the system
$\mathbf{A} \mathbf{x} = \mathbf{b}$, the least squares formula is obtained by
minimizing the following,
where $R_{1}, R_{2}, R_{3}$ are sensor responses (or image channels)
corresponding to peak sensitivities at ordered wavelengths $\lambda_{1} <
\lambda_{2} < \lambda_{3}$, and $\alpha$ is determined by the following
equations,
This transform, however, has a non-intuitive effect on black and white targets,
as the three channels tend to be equally over and under exposed in RGB images.
As a result, the transform leads to similar values for white and black pixels,
eliminating the ability for the computer vision algorithms to detect edges.
References
[Maddern2014]: Maddern, Will, et al. "Illumination invariant imaging:
Applications in robust vision-based localisation, mapping and classification
for autonomous vehicles." Proceedings of the Visual Place Recognition in
Changing Environments Workshop, IEEE International Conference on Robotics and
Automation (ICRA), Hong Kong, China. Vol. 2. 2014.
Marginalization
As a reminder, marginalization is about having a joint density $p(x, y)$
over two variables $x$ and $y$, and we would like to marginalize out or
"eliminate a variable", lets say $y$ in this case:
$$
p(x) = \int_{y} p(x, y)
$$
resulting in a density $p(x)$ over the remaining variable $x$.
Now, if the density was in covariance form with mean $\boldsymbol{\mu}$
and covariance $\mathbf{\Sigma}$, partitioned as follows:
In the nonlinear-least squares formulation, however, we can only obtain the
covariance $\mathbf{\Sigma}$ with the following property,
$$
\mathbf{\Sigma} = \mat{H}^{-1}
$$
where $\mat{H}$ is the Hessian matrix in a Gauss-Newton system
($\mat{H}\delta\vec{x} = \vec{b}$).
Using Shur's Complement for marginalization
First let $\state_1$ be the states to be marginalized out, $\state_{2}$ be the
set of states related to those by error terms, and $\state_{\rho}$ be the set
of remaining states. Partitioning the Hessian, error state and R.H.S of the
Gauss-Newton system gives:
where $\vec{b}^{\ast}_{2}$ and
$\mat{H}^{\ast}_{22}$ are non-linear functions of
$\state_2$ and $\state_1$.
Derivation of Schur's Complement for Marginalization
From the Gauss-Newton system, $\mat{H} \delta\vec{x} = \vec{b}$, we can
derive the marginalization of the old states in $\delta\vec{x}$ algebraically.
Let us decompose the system as:
$$
\begin{align}
% H
\begin{bmatrix}
\mat{H}_{11} & \mat{H}_{12} \\
\mat{H}_{21} & \mat{H}_{22}
\end{bmatrix}
% x
\begin{bmatrix}
\delta\vec{x}_{1} \\
\delta\vec{x}_{2}
\end{bmatrix}
=
% b
\begin{bmatrix}
\vec{b}_{1} \\
\vec{b}_{2}
\end{bmatrix}
\end{align}
$$
If we multiply out the block matrices and vectors out we get:
substituting $\delta\vec{x}_{2}$ back into $\mat{H}_{11} \delta\vec{x}_{1} +
\mat{H}_{12} \delta\vec{x}_{2} = \vec{b}_{1}$, and rearranging the terms so it
is w.r.t $\delta\vec{x}_{1}$ to get:
If you want to marginalize out $\delta\vec{x}_{1}$ you can follow the same
process above but w.r.t $\vec{x}_{1}$.
First Estimate Jacobians (FEJ)
In the context of real time state-estimation, a fixed-lag smoother provides a
way to bound the optimization problem in order to operate in real time. The
method to remove old states in the state vector is called *marginalization*. To
perform marginalization the Schur's complement is used to marginalize out old
states.
Simply performing marginalization, however, introduces estimator
inconsistencies.
Let us consider the following scenario. A state vector, $\state$, during the
time interval $[0, \, k]$ will contain $m$ old states to be marginalized
out and $r$ remain states which we wish to keep. i.e. $\state = [\state_{m}^{T}
\quad \state_{r}^{T}]^{T}$. Then the cost function, $c(\cdot)$, can be written
as a function of $\state$ at time $k$ as,
The intuition behind the above is since the state at time $k$ can be
partitioned into $m$ and $r$, the cost can also be decomposed. Utilizing this
property, the multivariate optimization can also be decomposed as follows,
The equation above shows the minimization problem can be solved by first
optimizing for the states $\state_{m}$, and then forming a prior towards
the problem of solving for $\state_{r}$. The reformulation of the
minimization problem entails no approximation.
Covariance Recovery
The Hessian matrix $\mat{H}$ is known to be related to the marginal
covariance matrix $\covar$ by $\covar = \mat{H}^{-1}$. However,
inverting $\mat{H}$ can be expensive and impossible if it is not
well-conditioned. The objective in the following is to recover the marginal
covariance matrix without explicitly inverting $\mat{H}$.
This can be done by decomposing the Hessian $\mat{H}$ is into a lower and
upper triangular matrix, $\mat{R}^{T} \mat{R}$ using either
Cholesky or QR factorization. Then let us write,
and using back-substitution on the last equation to solve for $\covar$
results in the following two general equations for any diagonal and
off-diagonal values of $\covar$:
$$
\boxed{
\sigma_{ii} =
\dfrac{1}{u_{ii}}
\left(
l_{ii}
-\sum_{j=i+1}^{n} u_{i,j} \sigma_{j,i}
\right)
}
$$
$$
\boxed{
\sigma_{il} =
\dfrac{1}{u_{ii}}
\left(
-\sum_{j=i+1}^{l} u_{i,j} \sigma_{j,l}
-\sum_{j=l+1}^{n} u_{i,j} \sigma_{j,l}
\right)
}
$$
Note: that the summations only apply to non-zero entries of single
columns or rows of the sparse matrix $\mat{R}$.
Derivation of Covariance Recovery using Square Root Matrix
The trick to solving for $\covar$ is the fact the term $(\mat{R}^{T})^{-1}$
is not actually evaluated. Instead, we take advantage of the struture of the
lower triangular matrix to solve a system of equations via back-substituion.
For one, we know for a fact the inverse of the diagonals in
$(\mat{R}^{T})^{-1}$ is the reciprocal of itself, i.e. $l_{ii} = 1 / u_{ii}$
(a known property of inverting diagonal matrices). Secondly, by performing
back-substition the lower triangle values of $(\mat{R}^{T})^{-1}$ are not
required.
Lets see an example, suppose $\mat{R}$, $\covar$, and
$(\mat{R}^{T})^{-1}$ are $4 \times 4$ matrices,
to workout $\covar$ we only need to find the values of the diagonals
and upper triangular matrix of $\covar$ (because a covariance matrix is
symmetrical). If we write out the matrix multiplication, and rearrange w.r.t
values of $\covar$ for each column in $\covar$ we get:
Accurate structure from motion or vision based state estimation is hard. One
hurdle is addressing the accuracy quantitatively. There are two main problems
that arise:
Inherent Physical Indeterminancy: cause by loss of information while
projecting 3D objects onto a 2D image plane.
Overparameterized Problem: e.g. a shape model that can be
parameterized by a vector, each representing the absolute position and
orientation of the object could itself be indeterminant.
It is well known that a vision only bundle adjustment has 7 unobserable
degrees-of-freedom (DoF), while for a visual-inertial system, the global
position and global yaw is not observable, a total of four unobservable DoFs.
These unobservable DoFs (a.k.a gauge freedoms) have to be handled properly.
There are three main approaches to address the unobservability in
state-estimation they are:
Gauge fixation
Gauge prior
Free gauge
Gauge fixation
Gauge fixation method works by decreasing the number of optimization parameters
to where there are no unobservable states left for the opitmization problem to
optimize. This is to ensure the Hessian is well conditioned and invertable.
This approach enforces hard constraints to the solution.
The standard method to update orientation variables such as a rotation,
$\rot$, during the iterations of a non-linear least squares solver is to
use local coordinates, where at the $k$-th iteration, the update is
Setting the $z$ component of $\boldsymbol{\phi}^{k}$ to 0 allows
fixating the yaw with respect to $\rot^{k}$. However, concatenating
several such updates over $K$-iterations,
does not fixate the yaw with respect to the initial rotation
$\rot^{0}$, and therefore, this parameterization cannot be used to fix
the yaw-value of $\rot^{K}$ to that of the initial value
$\rot^{0}$.
Although pose fixation or prior can be applied to any camera pose, it is
common practice to fixate the first camera.
where $\pos^{0}_{0}$ is the initial position of the first camera. Which
is equivalent to setting the corresponding columns of the Jacobian of the
residual vector to zero, namely $\mat{J}_{\pos_0} = 0$, $\mat{J}_{\Delta
\phi_{0 z}} = 0$. Thus, for rotations of the other camera poses, the standard
iterative update Eq.~\eqref{eq:opt-rot_std_update} is used, and, for the first
camera rotation, $\rot_{0}$, a more convenient parameterization is used.
Instead of directly using $\rot_{0}$, a left-multiplicative increment is
used.
Free gauge is the most general, lets the optimization parameters evolve
freely. In order to deal with the singularity with the Hessian, the pseudo
inverse is used or some preconditioning method inorder to make the Hessian
well-conditioned and invertible.
Position tracking is achieved by means of a cascaded connection of a Model
Predictive Controller (MPC) for the MAV position and a PID controller for its
attitude. This approach is motivated by the fact the majority of commercially
available flight controllers come with a pre-implemented attitude controller
which requires little or even no tuning, enabling easy adaptation to a wide
range of platforms.
Linear Model
The linear model of the MAV in navigation frame $\frame_{N}$ (or body
frame) is,
where $b_{(\cdot)}$ are constants of the first order system. The
values of these constants are obtained by performing system identification of
the MAV. Yaw is omitted above because the reference yaw command
$\psi^{r}$ will be passed directly to the inner-loop attitude control,
therefore it is not considered here in the outer-loop linear MPC
controller.
The state vector $\state$ is,
$$
\state = \begin{bmatrix}
x \enspace \dot{x} \enspace \theta \enspace \dot{\theta}
\enspace \enspace
y \enspace \dot{y} \enspace \phi \enspace \dot{\phi} \enspace
\enspace \enspace
z \enspace \dot{z}
\end{bmatrix} \in \real^{10}
$$
The input vector $\vec{u}$ to the linear model contains reference roll
$\theta^{r}$, pitch $\phi^{r}$ and thrust $T^{r}$, or written
as
Since the controller is implemented in discrete time, the equations are
discretized using zero order hold for the input $u$. The discrete equivalent of
the matrices $\mat{A}$, $\mat{B}$ and $\mat{C}$ can be
obtained by.
The discretization step $dt$ coincides with the control update rate and was set
to 50ms.
MPC Formulated as a QP Problem
Our goal is to obtain an optimal input sequence $\bar{\vec{u}}^{\ast} =
\vec{u}_{0}^{\ast} \dots \vec{u}_{N-1}^{\ast}$ which is the solution of the following
optimization problem:
The problem with the current formulation is the equality constraints
$\vec{x}_{k + 1}$, $\vec{y}_{k}$ and $\state_{0}$ may not be
valid in practice due to imperfect model, and/or sensor measurement noise. If
the equality constraints are invalid the optimized solution will not be
feasible. Instead, the equality constraints can be eliminated by rewriting
$\bar{\vec{y}}$ to depend only on the initial state $\state_{0}$
instead of $\state_{k - 1}$. In other words from this,
and substituting into the cost function $J$, collect the $\bar{\vec{u}}$
terms and rearrange so that it is in the form of $\mat{A}\vec{\state} -
\vec{b}$,