MTE 544: Autonomous Mobile Robots

Soo Jeon

Estimated study time: 2 hr 5 min

Table of contents

Sources and References

Primary references — R. Siegwart, I. Nourbakhsh & D. Scaramuzza, Introduction to Autonomous Mobile Robots, 2nd ed., MIT Press, 2011; K. Lynch & F. Park, Modern Robotics, Cambridge University Press, 2017 (preprint open access).

Online resources — S. Thrun, W. Burgard & D. Fox, Probabilistic Robotics, MIT Press (open access); Cyrill Stachniss robotics lecture notes (University of Bonn); MIT 6.141 Robotics Science and Systems; S. LaValle, Planning Algorithms, Cambridge University Press (open access).


Chapter 1: Coordinate Systems and Rigid-Body Transformations

1.1 Why Geometry Matters in Mobile Robotics

Every autonomous robot must answer a deceptively simple question: “Where am I, and where do I want to go?” Both the question and the answer are inherently geometric. A robot navigating a warehouse must track its position and orientation relative to shelves, loading bays, and other robots. A planetary rover must relate onboard sensor measurements to a global map. A self-driving car must fuse GPS, lidar, and camera data expressed in different coordinate frames before it can issue safe steering commands.

The mathematical language for all of this is the theory of rigid-body transformations. A rigid body is an idealised solid object whose internal distances never change; a transformation describes how one coordinate frame relates to another. In two dimensions this theory is relatively straightforward and forms the backbone of most practical navigation algorithms. In three dimensions the theory deepens considerably, because rotations in 3D do not commute — the order in which you apply them matters.

This chapter develops the 2D theory in detail (which is sufficient for most ground-robot problems), then sketches the 3D generalisation needed for aerial and underwater vehicles.

1.2 Coordinate Frames

A coordinate frame (or reference frame) \( \{A\} \) consists of an origin point \( O_A \) and a set of mutually orthogonal unit basis vectors. In 2D we write the basis vectors as \( \hat{x}_A \) and \( \hat{y}_A \); in 3D we add \( \hat{z}_A \). Throughout these notes, frames are right-handed: the cross product \( \hat{x} \times \hat{y} = \hat{z} \).

Two frames are fundamental in mobile robotics:

  1. The world frame (or inertial frame) \( \{W\} \) — fixed to the environment. GPS coordinates, map coordinates, and navigation waypoints are expressed in this frame.
  2. The body frame (or robot frame) \( \{B\} \) — attached to the robot. Sensor measurements such as lidar scans and camera images are expressed in this frame.

Additional frames arise for each sensor or limb: the camera frame \( \{C\} \), the IMU frame \( \{I\} \), and so on.

1.3 Planar Rigid-Body Poses

In the plane, the configuration (or pose) of a rigid body is fully described by three numbers: the \( x \)-coordinate of the origin, the \( y \)-coordinate of the origin, and the orientation angle \( \theta \) measured from the world \( x \)-axis to the body \( x \)-axis, counterclockwise positive. The pose vector is

\[ \xi = \begin{pmatrix} x \\ y \\ \theta \end{pmatrix} \in \mathbb{R}^2 \times \mathbb{S}^1 \]

where \( \mathbb{S}^1 \) denotes the circle (angles are defined modulo \( 2\pi \)).

1.3.1 The 2D Rotation Matrix

A rotation by angle \( \theta \) maps world-frame coordinates to body-frame coordinates. If a vector \( \mathbf{p} \) is expressed in the world frame as \( \mathbf{p}^W \), then its representation in the rotated body frame is

\[ \mathbf{p}^B = R(\theta)^\top \mathbf{p}^W \]

where

\[ R(\theta) = \begin{pmatrix} \cos\theta & -\sin\theta \\ \sin\theta & \cos\theta \end{pmatrix} \]

The columns of \( R(\theta) \) are the body-frame basis vectors \( \hat{x}_B \) and \( \hat{y}_B \) expressed in the world frame. Key properties:

  • \( R(\theta) \in SO(2) \): the special orthogonal group of \( 2 \times 2 \) matrices.
  • \( R(\theta)^\top = R(-\theta) = R(\theta)^{-1} \): the inverse is the transpose.
  • \( \det R(\theta) = 1 \): rotations preserve orientation and lengths.

1.3.2 Homogeneous Coordinates and SE(2)

A pure rotation is not enough: the robot body frame is also translated by \( (x, y) \) relative to the world frame. A clean way to represent the full rigid transformation (rotation + translation) is to use homogeneous coordinates and the \( 3 \times 3 \) transformation matrix

\[ T = \begin{pmatrix} R(\theta) & \mathbf{t} \\ \mathbf{0}^\top & 1 \end{pmatrix} = \begin{pmatrix} \cos\theta & -\sin\theta & x \\ \sin\theta & \cos\theta & y \\ 0 & 0 & 1 \end{pmatrix} \]

where \( \mathbf{t} = (x, y)^\top \). In homogeneous coordinates a point \( \mathbf{p} = (p_x, p_y)^\top \) is written \( \tilde{\mathbf{p}} = (p_x, p_y, 1)^\top \), and transformation is simply matrix-vector multiplication:

\[ \tilde{\mathbf{p}}^W = T \, \tilde{\mathbf{p}}^B \]

The set of all such matrices \( T \) forms the Special Euclidean group \( SE(2) \), which is a Lie group — a smooth manifold that is also a group under matrix multiplication.

1.3.3 Composing Transformations

Suppose we know the transform from frame \( B \) to frame \( W \) (call it \( T_{WB} \)) and from frame \( C \) to frame \( B \) (call it \( T_{BC} \)). Then the transform from \( C \) to \( W \) is simply

\[ T_{WC} = T_{WB} \, T_{BC} \]

This chain rule is enormously useful: a robot may carry a camera that is fixed to an arm, which is fixed to the chassis. The camera-to-world transform is the product of three individual transforms.

Example 1.1 — Composing Two 2D Transforms.

Suppose the robot body frame is at \( (3, 2)^\top \) in the world, oriented at \( \theta = 30^\circ \). A landmark is at \( (1, 0)^\top \) in the body frame. Find the landmark’s world-frame coordinates.

We have

\[ T_{WB} = \begin{pmatrix} \cos 30^\circ & -\sin 30^\circ & 3 \\ \sin 30^\circ & \cos 30^\circ & 2 \\ 0 & 0 & 1 \end{pmatrix} = \begin{pmatrix} 0.866 & -0.5 & 3 \\ 0.5 & 0.866 & 2 \\ 0 & 0 & 1 \end{pmatrix} \]

The landmark in homogeneous coordinates is \( \tilde{\mathbf{p}}^B = (1, 0, 1)^\top \). Then

\[ \tilde{\mathbf{p}}^W = T_{WB} \begin{pmatrix} 1 \\ 0 \\ 1 \end{pmatrix} = \begin{pmatrix} 0.866 + 3 \\ 0.5 + 2 \\ 1 \end{pmatrix} = \begin{pmatrix} 3.866 \\ 2.5 \\ 1 \end{pmatrix} \]

So the landmark is at \( (3.866, 2.5)^\top \) in the world frame.

1.4 3D Rotations and the SO(3) Group

For aerial robots, underwater vehicles, and arms, we need 3D rigid-body kinematics. The rotation group in 3D is \( SO(3) \), the set of all \( 3 \times 3 \) orthogonal matrices with determinant \( +1 \). Unlike \( SO(2) \), the group \( SO(3) \) is non-Abelian: matrix multiplication is not commutative. This has profound consequences — you cannot simply add angles to compose 3D rotations.

1.4.1 Euler Angles

One common parameterisation uses three successive rotations about body axes. The ZYX Euler angles (also called yaw-pitch-roll) define:

\[ R = R_z(\psi) R_y(\phi) R_x(\theta) \]

where \( \psi \) is yaw (rotation about \( z \)), \( \phi \) is pitch (rotation about \( y \)), and \( \theta \) is roll (rotation about \( x \)).

Euler angles are intuitive but suffer from a coordinate singularity called gimbal lock: when the pitch angle is \( \pm 90^\circ \), yaw and roll become indistinguishable and the parameterisation loses a degree of freedom.

1.4.2 Quaternions

Quaternions provide a singularity-free, compact representation of 3D rotations. A unit quaternion \( \mathbf{q} = (q_w, q_x, q_y, q_z) \) with \( \|\mathbf{q}\| = 1 \) encodes a rotation of angle \( \alpha \) about unit axis \( \hat{n} \) as

\[ \mathbf{q} = \left( \cos\frac{\alpha}{2},\; \hat{n}\sin\frac{\alpha}{2} \right) \]

Quaternion multiplication (which implements rotation composition) is given by the Hamilton product and requires only 16 multiplications and 12 additions — cheaper than full matrix multiplication and numerically more stable for iterated updates.

1.4.3 Angular Velocity

The time derivative of a rotation matrix satisfies

\[ \dot{R} = R \left[\boldsymbol{\omega}\right]_\times \]

where \( \boldsymbol{\omega} \) is the angular velocity vector expressed in the body frame and \( \left[\boldsymbol{\omega}\right]_\times \) is the skew-symmetric matrix

\[ \left[\boldsymbol{\omega}\right]_\times = \begin{pmatrix} 0 & -\omega_z & \omega_y \\ \omega_z & 0 & -\omega_x \\ -\omega_y & \omega_x & 0 \end{pmatrix} \]

Gyroscopes measure \( \boldsymbol{\omega} \) directly, which is why they are crucial for attitude tracking.


Chapter 2: Kinematic Models of Wheeled Mobile Robots

2.1 Wheels, Constraints, and Configuration Space

A wheeled mobile robot moves through a 2D environment whose configuration is described by \( (x, y, \theta) \). However, not every curve in this 3D space is achievable by the robot: the wheels impose velocity constraints. Understanding these constraints is the starting point for kinematic modelling.

Definition — Holonomic and Non-holonomic Constraints.

A constraint on the generalised velocities \( \dot{\mathbf{q}} \) is holonomic if it can be integrated to give a constraint on the configuration \( \mathbf{q} \) itself, i.e., it is of the form \( f(\mathbf{q}) = 0 \). A constraint is non-holonomic if it involves velocities in a way that cannot be so integrated. In mobile robotics, the no-slip rolling constraint for a wheel is non-holonomic: it constrains the instantaneous velocity direction but does not restrict the reachable configurations — a car can reach any \( (x, y, \theta) \) despite having a non-holonomic constraint, simply by manoeuvring.

The practical consequence is that a non-holonomic robot has fewer velocity degrees of freedom than configuration degrees of freedom. A car has 3 DOF in configuration space but only 2 velocity DOF (forward speed and steering rate). This is why parallel parking requires a sequence of back-and-forth motions rather than a single straight slide.

2.2 Differential Drive Robots

The most common platform in mobile robotics research is the differential drive (or two-wheeled drive). Two independently driven wheels are mounted on a common axle. A caster or ball bearing provides a third point of contact for stability.

Let:

  • \( r \) — wheel radius
  • \( l \) — half the axle length (distance from centre to each wheel)
  • \( \omega_R, \omega_L \) — right and left wheel angular velocities
  • \( v_R = r\omega_R, v_L = r\omega_L \) — right and left wheel contact-point speeds

Derivation of the Differential Drive Kinematic Model.

The robot’s forward (linear) speed is the average of the two contact-point speeds:

\[ v = \frac{v_R + v_L}{2} \]

This is because, for pure rolling on flat ground, the centre of the axle moves at the mean of the two wheel speeds.

The angular (turning) rate is determined by the speed difference. If the robot turns with angular velocity \( \dot\theta \), the right wheel travels faster and the left wheel slower. In time \( dt \), the right contact point sweeps arc \( v_R \, dt \) and the left sweeps \( v_L \, dt \). The angular change satisfies

\[ \dot\theta \cdot 2l = v_R - v_L \]

because the two contact points are separated by \( 2l \). Solving:

\[ \dot\theta = \frac{v_R - v_L}{2l} \]

In the world frame, the body-frame forward velocity \( v \) resolves into

\[ \dot{x} = v \cos\theta, \quad \dot{y} = v \sin\theta \]

Collecting all three equations gives the full kinematic model:

\[ \begin{pmatrix} \dot{x} \\ \dot{y} \\ \dot{\theta} \end{pmatrix} = \begin{pmatrix} \cos\theta & 0 \\ \sin\theta & 0 \\ 0 & 1 \end{pmatrix} \begin{pmatrix} v \\ \dot\theta \end{pmatrix} \]

Substituting the wheel-speed expressions:

\[ \begin{pmatrix} \dot{x} \\ \dot{y} \\ \dot{\theta} \end{pmatrix} = \frac{r}{2} \begin{pmatrix} \cos\theta & \cos\theta \\ \sin\theta & \sin\theta \\ \tfrac{1}{l} & -\tfrac{1}{l} \end{pmatrix} \begin{pmatrix} \omega_R \\ \omega_L \end{pmatrix} \]

This is the standard differential drive kinematic model.

Example 2.1 — Differential Drive Pose Update.

A robot starts at \( (0, 0, 0) \). Over a 1-second interval, the right wheel rotates at \( \omega_R = 2 \) rad/s and the left at \( \omega_L = 1 \) rad/s. The wheel radius is \( r = 0.1 \) m and \( l = 0.2 \) m. Find the new pose.

Compute:

\[ v_R = 0.2 \text{ m/s}, \quad v_L = 0.1 \text{ m/s} \]\[ v = \frac{0.2 + 0.1}{2} = 0.15 \text{ m/s}, \quad \dot\theta = \frac{0.2 - 0.1}{2 \times 0.2} = 0.25 \text{ rad/s} \]

After \( \Delta t = 1 \) s (assuming small angle, treating as straight for simplicity):

\[ \Delta\theta = 0.25 \text{ rad}, \quad \Delta x \approx 0.15 \cos(0) = 0.15 \text{ m}, \quad \Delta y \approx 0.15 \sin(0) = 0 \text{ m} \]

New pose: \( (0.15, 0, 0.25) \). For a more accurate answer the arc should be integrated (see Section 2.7 on odometry).

2.3 Bicycle Model (Ackermann Steering)

Automobiles and car-like robots are steered by turning the front wheels. The kinematic model is often derived under the so-called bicycle approximation: replace each axle’s pair of wheels with a single wheel at the axle centre, giving a two-wheel bicycle.

Let \( L \) be the wheelbase, \( \delta \) the front steering angle, \( v \) the speed at the rear axle, and \( (x, y, \theta) \) the pose of the rear axle centre. The instantaneous centre of curvature is at distance

\[ \rho = \frac{L}{\tan\delta} \]

from the rear axle. The kinematic equations are

\[ \dot{x} = v \cos\theta, \quad \dot{y} = v \sin\theta, \quad \dot\theta = \frac{v}{L} \tan\delta \]

The curvature is \( \kappa = 1/\rho = \tan\delta / L \), bounded by the maximum steering angle. This model is non-holonomic for the same reasons as the differential drive: the no-slip condition forces the velocity vector to be aligned with the wheel heading.

The bicycle model assumes no wheel slip — a reasonable approximation at low speeds on flat terrain. At high speeds, tyre flexibility, lateral forces, and weight transfer introduce significant dynamic effects. For truly high-speed autonomous driving, a dynamic model that includes tyre slip angles and lateral forces is required. The kinematic model treated here is nonetheless the foundation for most path-planning and low-speed control algorithms.

2.4 Tracked Vehicles

Tracked vehicles (like military tanks or the Mars rovers) use two continuous tracks instead of discrete wheels. The kinematic model resembles the differential drive: independent speed control of the left and right tracks. However, tracked vehicles can also slip laterally on soft ground (sand, snow), meaning the no-slip assumption is more likely to be violated. Terrain-adaptive odometry models must account for track slip.

2.5 Omnidirectional Drives

Some robots break free of non-holonomic constraints by using special wheel designs. A robot with omnidirectional (omni) wheels can translate in any direction while simultaneously rotating.

2.5.1 Omniwheel

An omniwheel has small passive rollers arranged around its circumference at 90° to the main wheel axis. The rollers allow the wheel to slide freely in the axial direction while providing traction in the rolling direction. A three-wheel omni-robot with wheels at \( 120^\circ \) intervals satisfies

\[ \begin{pmatrix} \omega_1 \\ \omega_2 \\ \omega_3 \end{pmatrix} = \frac{1}{r} \begin{pmatrix} -\sin\alpha_1 & \cos\alpha_1 & d \\ -\sin\alpha_2 & \cos\alpha_2 & d \\ -\sin\alpha_3 & \cos\alpha_3 & d \end{pmatrix} \begin{pmatrix} \dot{x} \\ \dot{y} \\ \dot\theta \end{pmatrix} \]

where \( \alpha_i \) are the wheel orientations and \( d \) is the distance from the robot centre to each wheel. Since this robot is holonomic, planning and control are much simpler. The trade-off is mechanical complexity and reduced load capacity.

2.5.2 Mecanum Wheel

Mecanum wheels place rollers at \( 45^\circ \) to the wheel axis. A four-wheel mecanum robot can produce any combination of \( \dot{x}, \dot{y}, \dot\theta \), making it holonomic. Mecanum platforms are popular in warehouse robotics where precise lateral movement is required.

2.6 Forward and Inverse Kinematics

For any wheeled platform, we distinguish two directions of the kinematic map:

  • Forward kinematics: given wheel speeds \( \boldsymbol{\omega} \), compute body velocities \( (\dot{x}, \dot{y}, \dot\theta) \). This is used to integrate the pose from encoder readings.
  • Inverse kinematics: given desired body velocities, compute required wheel speeds. This is used to execute planned motions.

For the differential drive, forward kinematics is the derivation in Section 2.2. Inverse kinematics inverts that relation:

\[ \omega_R = \frac{v + l\dot\theta}{r}, \quad \omega_L = \frac{v - l\dot\theta}{r} \]

2.7 Odometry: Dead Reckoning from Wheel Encoders

Definition — Odometry.

Odometry is the process of estimating a robot’s pose by integrating incremental motion measurements from wheel encoders (or other proprioceptive sensors). It is also called dead reckoning by analogy with maritime navigation. Starting from a known pose, each encoder tick produces a small pose increment; the cumulative sum estimates the current pose.

For the differential drive, after a small time step \( \Delta t \), the encoder readings give wheel travel distances \( \Delta s_R \) and \( \Delta s_L \). The incremental motion is

\[ \Delta s = \frac{\Delta s_R + \Delta s_L}{2}, \quad \Delta\theta = \frac{\Delta s_R - \Delta s_L}{2l} \]

For large \( \Delta\theta \), the pose update should follow the arc of a circle:

\[ x' = x + \Delta s \, \frac{\sin(\theta + \Delta\theta/2)}{\Delta\theta/2} \cos\left(\theta + \frac{\Delta\theta}{2}\right) \]

In the limit \( \Delta\theta \to 0 \) (nearly straight motion), this reduces to:

\[ x' = x + \Delta s \cos\theta, \quad y' = y + \Delta s \sin\theta, \quad \theta' = \theta + \Delta\theta \]

Example 2.2 — Odometry Accumulation and Error.

Consider a robot making a square: four straight segments of 2 m each with \( 90^\circ \) left turns. Suppose each segment has an encoder-measurement error of \( 1\% \) in distance and \( 0.5^\circ \) in heading per segment.

After four segments, the heading errors accumulate to \( 4 \times 0.5^\circ = 2^\circ \). The final position error is bounded by the accumulation of all segment errors projected along their respective headings. A heading error \( \delta\theta \) after segment \( k \) causes subsequent segments to be misaligned, leading to a lateral displacement that grows quadratically with the number of segments in the worst case. After completing the square, the robot would return to approximately its starting point, but with a positional error of several centimetres and an orientation error of \( 2^\circ \).

This illustrates the fundamental limitation of odometry: error accumulates without bound over time and distance. External sensing (GPS, landmarks, lidar) is required to periodically correct the estimate.

Odometry error has two primary sources. Systematic errors arise from imperfect calibration: unequal wheel radii, misalignment of the axle, encoder quantisation. These can be partially corrected by careful calibration. Non-systematic errors arise from wheel slip (on slippery or uneven terrain), wheel bounce, and shock loads. These are fundamentally stochastic and require probabilistic treatment.


Chapter 3: Sensors and Perception

3.1 Sensor Taxonomy

Sensors are the robot’s interface with the physical world. They are categorised along several axes:

  • Proprioceptive vs. Exteroceptive: Proprioceptive sensors measure internal states (wheel encoder, IMU, battery voltage). Exteroceptive sensors measure the external environment (lidar, camera, sonar).
  • Active vs. Passive: Active sensors emit energy and measure the return (sonar, lidar, radar). Passive sensors measure ambient energy (camera, thermometer).
  • Contact vs. Non-contact: Tactile bumpers, force-torque sensors are contact sensors; lidar and cameras are non-contact.

Good sensor selection requires matching sensor characteristics (range, field of view, resolution, update rate, noise model) to task and environment requirements.

3.2 Inertial Measurement Units (IMU)

An IMU integrates at minimum a three-axis accelerometer and a three-axis gyroscope. Some units also include magnetometers (giving a 9-DOF IMU).

3.2.1 Accelerometers

An accelerometer measures specific force: the sum of applied acceleration and gravitational acceleration. A stationary accelerometer with its \( z \)-axis pointing up measures \( +g = 9.81 \) m/s² due to the normal force counteracting gravity. During motion, the reading is

\[ \mathbf{a}_{\text{meas}} = \mathbf{a}_{\text{body}} - \mathbf{g} \]

where \( \mathbf{g} \) is the gravity vector in the body frame. To obtain linear acceleration, we must subtract the gravitational component using a known (or estimated) orientation.

Key error sources: bias (DC offset), scale factor error, noise, and temperature drift. Accelerometer noise integrates to velocity error that grows as \( \sqrt{t} \) and to position error growing as \( t^{3/2} \).

3.2.2 Gyroscopes

Gyroscopes measure angular velocity \( \boldsymbol{\omega} \). MEMS (Micro-Electro-Mechanical Systems) gyroscopes use the Coriolis effect: a vibrating mass experiences a force proportional to angular velocity. Fibre-optic gyroscopes (FOG) and ring-laser gyroscopes (RLG) use the Sagnac effect for much higher accuracy but at greater cost and size.

The main error source is the bias drift: a slowly varying DC offset in the angular velocity reading. Integrating a constant bias \( b \) over time \( t \) produces an orientation error that grows linearly as \( bt \). For a MEMS gyroscope, typical bias instability is \( 1–10 \) °/hour, while a tactical-grade FOG achieves \( 0.01 \) °/hour.

3.2.3 The IMU Error Model

The standard stochastic model for IMU measurements is

\[ \tilde{\boldsymbol{\omega}} = \boldsymbol{\omega}_{\text{true}} + \mathbf{b}_g + \mathbf{n}_g, \quad \dot{\mathbf{b}}_g = \mathbf{n}_{bg} \]\[ \tilde{\mathbf{a}} = \mathbf{a}_{\text{true}} + \mathbf{b}_a + \mathbf{n}_a, \quad \dot{\mathbf{b}}_a = \mathbf{n}_{ba} \]

where \( \mathbf{b} \) are slowly-drifting biases modelled as random walks, and \( \mathbf{n} \) are white Gaussian noise terms. This model is the starting point for IMU preintegration and EKF-based inertial navigation.

3.3 Position Sensors

3.3.1 GPS / GNSS

Global Navigation Satellite Systems (GPS, GLONASS, Galileo) triangulate position from time-of-flight measurements to multiple satellites. Standard consumer GPS gives \( 3–5 \) m horizontal accuracy; differential GPS (DGPS) corrects for atmospheric delay using ground reference stations to achieve \( 0.1–1 \) m; Real-Time Kinematic (RTK) GPS resolves carrier-phase ambiguities for centimetre-level accuracy.

GPS is invaluable outdoors but fails indoors and in urban canyons. Update rates are typically \( 1–10 \) Hz, too slow for high-speed control without sensor fusion.

3.3.2 Wheel Encoders

Incremental optical encoders produce quadrature pulse trains as wheels rotate. A quadrature encoder with \( N \) pulses per revolution achieves angular resolution of \( 2\pi/N \) rad. The encoder count difference between two time steps, multiplied by the wheel radius, gives the arc-length displacement used in odometry.

3.3.3 Lidar

Lidar (Light Detection and Ranging) measures range by time-of-flight or phase-shift of laser pulses. A 2D lidar (e.g., Hokuyo URG) scans a horizontal plane, producing a range profile at rates of \( 10–25 \) Hz with \( 0.25^\circ –1^\circ \) angular resolution and \( 1–30 \) m range. A 3D lidar (e.g., Velodyne HDL-64E) stacks multiple 2D scans, producing a full 3D point cloud at \( 10–20 \) Hz.

Lidar is the primary sensor for autonomous vehicle mapping and obstacle detection. It is unaffected by lighting conditions but is expensive and struggles with transparent surfaces (glass) and retro-reflective materials.

3.4 Vision Sensors and Camera Optics

3.4.1 The Pinhole Camera Model

The pinhole camera model describes how a 3D world point is projected onto the 2D image plane. A point \( \mathbf{P} = (X, Y, Z)^\top \) in the camera frame is projected to image coordinates \( (u, v) \) by

\[ u = f_x \frac{X}{Z} + c_x, \quad v = f_y \frac{Y}{Z} + c_y \]

where \( f_x, f_y \) are the focal lengths in pixels and \( (c_x, c_y) \) is the principal point. In matrix form:

\[ \lambda \begin{pmatrix} u \\ v \\ 1 \end{pmatrix} = K \begin{pmatrix} X \\ Y \\ Z \end{pmatrix}, \quad K = \begin{pmatrix} f_x & 0 & c_x \\ 0 & f_y & c_y \\ 0 & 0 & 1 \end{pmatrix} \]

The matrix \( K \) is the camera intrinsic matrix, estimated during camera calibration (e.g., using a checkerboard pattern with Zhang’s method).

3.4.2 Lens Distortion

Real lenses introduce radial distortion (barrel or pincushion) and tangential distortion. The distorted image coordinates \( (u_d, v_d) \) relate to ideal pinhole coordinates by

\[ r^2 = x^2 + y^2, \quad x_d = x(1 + k_1 r^2 + k_2 r^4 + k_3 r^6) + 2p_1 xy + p_2(r^2 + 2x^2) \]

where \( x = (u - c_x)/f_x \), and \( k_1, k_2, k_3, p_1, p_2 \) are distortion coefficients. Undistorting an image is a prerequisite for accurate 3D reconstruction.

3.4.3 Stereo Vision

A stereo camera pair (two cameras with a known baseline \( b \)) recovers depth by triangulation. For a point at disparity \( d = u_L - u_R \) (difference in left and right image \( u \)-coordinates), the depth is

\[ Z = \frac{f \cdot b}{d} \]

Disparity computation is the central challenge: it requires matching corresponding points across the two images, which fails on textureless surfaces.

3.5 Visual Odometry and Structure from Motion

Visual odometry (VO) estimates the robot’s motion from sequential camera images, analogous to wheel odometry but using pixel correspondences instead of encoder ticks.

The pipeline is:

  1. Detect and describe feature points in each image (SIFT, ORB, FAST+BRIEF).
  2. Match features between consecutive frames.
  3. Estimate the essential matrix \( E \) (monocular) or directly compute the 3D motion (stereo) from matched point pairs.
  4. Decompose the essential matrix to extract rotation \( R \) and translation \( \mathbf{t} \) (up to scale for monocular).
  5. Optionally, refine via bundle adjustment (non-linear least squares over multiple frames).

Structure from Motion (SfM) extends VO to offline batch processing over many images, simultaneously recovering the 3D scene structure and camera trajectory. SfM is the basis of photogrammetry and is used to build large-scale maps from aerial imagery.

Example 3.1 — Feature Matching and Motion Estimation.

A monocular camera captures frames \( I_k \) and \( I_{k+1} \). ORB detects 200 keypoints in each frame; descriptor matching yields 80 putative correspondences. RANSAC with the 5-point essential matrix algorithm filters to 60 inliers. The recovered essential matrix is decomposed into four candidate \( (R, \mathbf{t}) \) pairs; the correct one is selected by the positive-depth cheirality check (all triangulated points must be in front of both cameras). The rotation and (unit-norm) translation from this decomposition estimate the camera’s angular change and direction of motion between the two frames.

3.6 Image Processing Fundamentals

3.6.1 Edge Detection

Edges correspond to abrupt changes in image intensity, often arising at object boundaries. The gradient magnitude \( \|\nabla I\| \) is large at edges. Canny edge detection applies Gaussian smoothing (to reduce noise), computes the gradient, applies non-maximum suppression (to thin edges), and applies hysteresis thresholding to link edge segments.

3.6.2 Optical Flow

Optical flow describes the apparent motion of pixels between consecutive frames. The Lucas-Kanade method assumes that the intensity of a small patch is constant under motion:

\[ I(x, y, t) = I(x + u \Delta t, y + v \Delta t, t + \Delta t) \]

Taylor expansion and the brightness constancy assumption give the optical flow constraint equation:

\[ I_x u + I_y v + I_t = 0 \]

where \( I_x, I_y \) are spatial image gradients and \( I_t \) is the temporal derivative. This is one equation in two unknowns (\( u, v \)), resolved by assuming constant flow over a local patch and solving the overdetermined system in a least-squares sense.


Chapter 4: Probabilistic Foundations

4.1 Why Probability?

Real robots operate in uncertain environments. Sensor measurements are noisy. Actuator commands are executed imperfectly. The world itself may change. A purely geometric approach that treats all quantities as exact will produce overconfident, brittle algorithms. Probabilistic robotics (Thrun, Burgard, Fox) frames every relevant quantity — the robot’s pose, the map, sensor readings — as a random variable with an associated probability distribution. Algorithms then propagate and update these distributions, yielding estimates that are explicitly aware of their own uncertainty.

4.2 Probability Review

4.2.1 Random Variables and PDFs

A continuous random variable \( X \) is characterised by its probability density function (PDF) \( p(x) \geq 0 \) with

\[ \int_{-\infty}^\infty p(x) \, dx = 1 \]

The mean (expected value) and variance are

\[ \mu = E\left[X\right] = \int x \, p(x) \, dx, \quad \sigma^2 = \text{Var}(X) = E\left[(X - \mu)^2\right] \]

A multivariate Gaussian with mean \( \boldsymbol{\mu} \) and covariance \( \Sigma \) has PDF

\[ p(\mathbf{x}) = \frac{1}{(2\pi)^{n/2} |\Sigma|^{1/2}} \exp\!\left(-\frac{1}{2}(\mathbf{x} - \boldsymbol{\mu})^\top \Sigma^{-1} (\mathbf{x} - \boldsymbol{\mu})\right) \]

The Gaussian is the most important distribution in estimation because: (1) it is closed under linear transformations and Bayesian updates; (2) the central limit theorem makes it a good approximation for sums of independent random variables; (3) it is fully characterised by just two parameters \( (\boldsymbol{\mu}, \Sigma) \).

4.2.2 Joint, Marginal, and Conditional Distributions

For random variables \( X \) and \( Y \), their joint PDF \( p(x, y) \) encodes all probabilistic information. The marginal of \( X \) is

\[ p(x) = \int p(x, y) \, dy \]

The conditional PDF of \( Y \) given \( X = x \) is

\[ p(y \mid x) = \frac{p(x, y)}{p(x)} \]

Two random variables are independent if \( p(x, y) = p(x) p(y) \), i.e., \( p(y \mid x) = p(y) \).

4.2.3 Bayes’ Rule

Bayes’ rule inverts conditional probabilities:

\[ p(x \mid y) = \frac{p(y \mid x) \, p(x)}{p(y)} = \frac{p(y \mid x) \, p(x)}{\int p(y \mid x') \, p(x') \, dx'} \]

In estimation, \( x \) is the unknown state (robot pose), \( y \) is the observation, \( p(x) \) is the prior (what we believed before seeing the data), \( p(y \mid x) \) is the likelihood (how probable is observation \( y \) if the state is \( x \)), and \( p(x \mid y) \) is the posterior (updated belief after the observation).

4.2.4 Total Probability and Marginalisation

The law of total probability is essential for prediction:

\[ p(x') = \int p(x' \mid x) \, p(x) \, dx \]

This marginalises out the current state \( x \) to predict the next state \( x' \), weighted by how likely each transition \( x \to x' \) is under the motion model \( p(x' \mid x) \).

4.3 Statistical Estimation

4.3.1 Maximum Likelihood Estimation

The maximum likelihood estimate (MLE) finds the state value that makes the observed data most probable:

\[ \hat{x}_{\text{ML}} = \arg\max_x p(y \mid x) \]

For Gaussian noise with zero mean and covariance \( R \), the MLE reduces to least-squares: minimise \( \|y - h(x)\|^2_{R^{-1}} \).

4.3.2 Maximum A Posteriori Estimation

MAP estimation incorporates a prior:

\[ \hat{x}_{\text{MAP}} = \arg\max_x p(x \mid y) = \arg\max_x \left[ p(y \mid x) + \log p(x) \right] \]

For Gaussian prior and likelihood, MAP reduces to a weighted least-squares problem.

4.3.3 MMSE Estimation

Definition — MMSE Estimator.

The Minimum Mean Square Error (MMSE) estimator minimises the expected squared error between the estimate \( \hat{x} \) and the true state \( x \):

\[ \hat{x}_{\text{MMSE}} = \arg\min_{\hat{x}} E\left[\left\|\hat{x} - x\right\|^2 \mid y\right] \]

The solution is the conditional mean of the posterior distribution:

\[ \hat{x}_{\text{MMSE}} = E\left[x \mid y\right] = \int x \, p(x \mid y) \, dx \]

For a Gaussian posterior, the mean, mode (MAP), and MMSE estimate all coincide.


Chapter 5: Bayes Filter and Kalman Filter

5.1 The State Estimation Problem

Mobile robot state estimation asks: given a history of control inputs \( u_{1:t} = \{u_1, u_2, \ldots, u_t\} \) and sensor observations \( z_{1:t} = \{z_1, z_2, \ldots, z_t\} \), what is the probability distribution over the robot’s current state \( x_t \)?

We model the robot as a discrete-time dynamical system with Markov state:

\[ p(x_t \mid x_{0:t-1}, u_{1:t}, z_{1:t-1}) = p(x_t \mid x_{t-1}, u_t) \quad \text{(motion model)} \]\[ p(z_t \mid x_{0:t}, u_{1:t}, z_{1:t-1}) = p(z_t \mid x_t) \quad \text{(measurement model)} \]

These Markov assumptions state that the current state depends only on the previous state and control input, and the current observation depends only on the current state — not the full history. This is the hidden Markov model (HMM) structure.

5.2 Bayes Filter Derivation

Definition — Bayes Filter.

The Bayes filter is a recursive algorithm that maintains the belief distribution \( \text{bel}(x_t) = p(x_t \mid z_{1:t}, u_{1:t}) \) and updates it alternately using the motion model (prediction step) and the measurement model (update step). It is the theoretical optimal solution to the recursive state estimation problem under the Markov assumption.

Theorem — Bayes Filter Recursion.

Given the prior belief \( \text{bel}(x_{t-1}) \), control input \( u_t \), and observation \( z_t \), the updated belief satisfies

\[ \overline{\text{bel}}(x_t) = \int p(x_t \mid x_{t-1}, u_t) \, \text{bel}(x_{t-1}) \, dx_{t-1} \]\[ \text{bel}(x_t) = \eta \, p(z_t \mid x_t) \, \overline{\text{bel}}(x_t) \]

where \( \eta \) is a normalisation constant and \( \overline{\text{bel}}(x_t) \) is the predicted belief before incorporating \( z_t \).

Proof of the Bayes Filter Recursion from Total Probability.

We want to derive \( \text{bel}(x_t) = p(x_t \mid z_{1:t}, u_{1:t}) \).

Apply Bayes’ rule, splitting off the latest observation \( z_t \):

\[ p(x_t \mid z_{1:t}, u_{1:t}) = \frac{p(z_t \mid x_t, z_{1:t-1}, u_{1:t}) \, p(x_t \mid z_{1:t-1}, u_{1:t})}{p(z_t \mid z_{1:t-1}, u_{1:t})} \]

By the measurement Markov assumption, \( p(z_t \mid x_t, z_{1:t-1}, u_{1:t}) = p(z_t \mid x_t) \). The denominator is a constant with respect to \( x_t \), so call it \( \eta^{-1} \). The remaining numerator factor is

\[ p(x_t \mid z_{1:t-1}, u_{1:t}) = \int p(x_t \mid x_{t-1}, z_{1:t-1}, u_{1:t}) \, p(x_{t-1} \mid z_{1:t-1}, u_{1:t}) \, dx_{t-1} \]

By the motion Markov assumption, \( p(x_t \mid x_{t-1}, z_{1:t-1}, u_{1:t}) = p(x_t \mid x_{t-1}, u_t) \). Also, \( u_t \) contains no information about \( x_{t-1} \) beyond \( u_{1:t-1} \), so \( p(x_{t-1} \mid z_{1:t-1}, u_{1:t}) = p(x_{t-1} \mid z_{1:t-1}, u_{1:t-1}) = \text{bel}(x_{t-1}) \). Therefore

\[ \overline{\text{bel}}(x_t) = \int p(x_t \mid x_{t-1}, u_t) \, \text{bel}(x_{t-1}) \, dx_{t-1} \]

Substituting back:

\[ \text{bel}(x_t) = \eta \, p(z_t \mid x_t) \, \overline{\text{bel}}(x_t) \]

This completes the derivation. The first equation is the prediction step (total probability), and the second is the update step (Bayes’ rule). \( \square \)

The Bayes filter is the optimal recursive estimator under the Markov assumption — but it is generally intractable because the integrals have no closed form. Special cases with tractable solutions are: the Kalman filter (linear-Gaussian system), the particle filter (nonparametric approximation), and the histogram filter (discretised state space).

5.3 The Kalman Filter: Linear-Gaussian Case

The Kalman filter is the exact implementation of the Bayes filter for linear systems with Gaussian noise. It is the most important result in estimation theory and underlies GPS receivers, inertial navigation systems, and a vast range of engineering applications.

5.3.1 System Model

The linear discrete-time system model is

\[ x_t = A_t x_{t-1} + B_t u_t + w_t, \quad w_t \sim \mathcal{N}(0, Q_t) \]\[ z_t = C_t x_t + v_t, \quad v_t \sim \mathcal{N}(0, R_t) \]

where \( A_t \) is the state transition matrix, \( B_t \) is the control matrix, \( C_t \) is the observation matrix, \( Q_t \) is the process noise covariance, and \( R_t \) is the measurement noise covariance.

The belief at each step is Gaussian: \( \text{bel}(x_t) = \mathcal{N}(\mu_t, \Sigma_t) \).

5.3.2 Kalman Filter Equations

Prediction step (propagate Gaussian through linear motion model):

\[ \bar{\mu}_t = A_t \mu_{t-1} + B_t u_t \]\[ \bar{\Sigma}_t = A_t \Sigma_{t-1} A_t^\top + Q_t \]

Update step (Bayesian update with linear observation):

\[ K_t = \bar{\Sigma}_t C_t^\top \left(C_t \bar{\Sigma}_t C_t^\top + R_t\right)^{-1} \]\[ \mu_t = \bar{\mu}_t + K_t (z_t - C_t \bar{\mu}_t) \]\[ \Sigma_t = (I - K_t C_t) \bar{\Sigma}_t \]

The quantity \( \nu_t = z_t - C_t \bar{\mu}_t \) is the innovation (the discrepancy between the predicted and actual observation). The matrix \( K_t \) is the Kalman gain, which weights how much to trust the new observation versus the prediction.

Definition — Kalman Gain.

The Kalman gain \( K_t \in \mathbb{R}^{n \times m} \) is the optimal weight matrix that minimises the trace of the posterior covariance \( \Sigma_t \) given the prior covariance \( \bar{\Sigma}_t \), measurement matrix \( C_t \), and measurement noise covariance \( R_t \):

\[ K_t = \bar{\Sigma}_t C_t^\top (C_t \bar{\Sigma}_t C_t^\top + R_t)^{-1} \]

When \( R_t \to 0 \) (perfect sensor), \( K_t \to C_t^{-1} \) and the estimate fully trusts the measurement. When \( \bar{\Sigma}_t \to 0 \) (perfect prediction), \( K_t \to 0 \) and the measurement is ignored.

Derivation of the Kalman Gain by Minimising Posterior Covariance Trace.

We seek the linear update \( \mu_t = \bar{\mu}_t + K(z_t - C\bar{\mu}_t) \) that minimises \( \text{tr}(\Sigma_t) \). The posterior covariance (for any gain \( K \)) is

\[ \Sigma_t = (I - KC)\bar{\Sigma}(I - KC)^\top + KRK^\top \]

This is the Joseph form, valid for any \( K \). Expand:

\[ \Sigma_t = \bar{\Sigma} - KC\bar{\Sigma} - \bar{\Sigma}C^\top K^\top + KC\bar{\Sigma}C^\top K^\top + KRK^\top \]

Differentiate \( \text{tr}(\Sigma_t) \) with respect to \( K \) using the trace identities \( \partial \text{tr}(AB) / \partial A = B^\top \) and \( \partial \text{tr}(ABA^\top) / \partial A = 2AB \):

\[ \frac{\partial \text{tr}(\Sigma_t)}{\partial K} = -2\bar{\Sigma}C^\top + 2K(C\bar{\Sigma}C^\top + R) \]

Setting to zero and solving:

\[ K = \bar{\Sigma}C^\top (C\bar{\Sigma}C^\top + R)^{-1} \]

This is indeed the Kalman gain. Substituting back into the Joseph form recovers the simpler expression \( \Sigma_t = (I - KC)\bar{\Sigma}_t \). \( \square \)

5.3.3 Kalman Filter Optimality

Theorem — Kalman Filter Optimality (BLUE).

For a linear-Gaussian system, the Kalman filter produces the Best Linear Unbiased Estimator (BLUE). Specifically, no other estimator that is linear in the observations \( z_{1:t} \) and unbiased achieves a smaller covariance matrix (in the positive-semidefinite sense). Moreover, among all estimators (not just linear ones), the Kalman filter is the MMSE estimator when the prior and noise distributions are Gaussian.

The proof follows from the Gauss-Markov theorem and the fact that Gaussian distributions are closed under linear transformations and conditioning.

Example 5.1 — Kalman Filter: 1D Position Tracking.

A robot moves in 1D. State \( x_t \) is position. Motion model: \( x_t = x_{t-1} + u_t + w_t \), where \( u_t = 0.5 \) m (commanded step), \( Q = 0.01 \) m². Observation: \( z_t = x_t + v_t \), \( R = 0.1 \) m². Initial belief: \( \mu_0 = 0 \), \( \Sigma_0 = 1 \) m².

Step 1 (Prediction): \( \bar\mu_1 = 0 + 0.5 = 0.5 \), \( \bar\Sigma_1 = 1 + 0.01 = 1.01 \).

Step 1 (Update), suppose \( z_1 = 0.6 \) m:

\[ K_1 = \frac{1.01}{1.01 + 0.1} = \frac{1.01}{1.11} \approx 0.91 \]\[ \mu_1 = 0.5 + 0.91 \times (0.6 - 0.5) = 0.5 + 0.091 = 0.591 \text{ m} \]\[ \Sigma_1 = (1 - 0.91) \times 1.01 \approx 0.091 \text{ m}^2 \]

The high Kalman gain (0.91) reflects that the prediction covariance (1.01) is much larger than the measurement noise (0.1), so we trust the sensor strongly.

5.4 Steady-State Kalman Filter

For a time-invariant linear system, the Riccati equation for \( \Sigma_t \) converges to a steady-state covariance \( \Sigma^\infty \) satisfying the discrete algebraic Riccati equation (DARE):

\[ \Sigma^\infty = A(\Sigma^\infty - \Sigma^\infty C^\top (C\Sigma^\infty C^\top + R)^{-1} C\Sigma^\infty)A^\top + Q \]

The steady-state Kalman gain \( K^\infty \) is then fixed, making the filter extremely cheap to compute online. This is the basis of most real-time navigation filters.


Chapter 6: Extended Kalman Filter and Particle Filter

6.1 Limitations of the Linear Kalman Filter

Real robots operate under nonlinear dynamics and nonlinear observation models. The differential drive kinematic model is nonlinear in \( \theta \). A lidar sensor measures range and bearing — a nonlinear function of the 2D pose. The linear Kalman filter cannot handle these cases directly.

The Extended Kalman Filter (EKF) addresses this by linearising the nonlinear functions around the current estimate via first-order Taylor expansion.

6.2 Nonlinear System Model

The general nonlinear state-space model is

\[ x_t = f(x_{t-1}, u_t) + w_t, \quad w_t \sim \mathcal{N}(0, Q_t) \]\[ z_t = h(x_t) + v_t, \quad v_t \sim \mathcal{N}(0, R_t) \]

where \( f \) and \( h \) are differentiable but generally nonlinear functions.

Definition — EKF Linearisation.

The Extended Kalman Filter approximates the nonlinear functions \( f \) and \( h \) by their first-order Taylor expansions around the current mean estimate:

\[ f(x_{t-1}, u_t) \approx f(\mu_{t-1}, u_t) + F_t (x_{t-1} - \mu_{t-1}) \]\[ h(x_t) \approx h(\bar{\mu}_t) + H_t (x_t - \bar{\mu}_t) \]

where \( F_t = \left.\frac{\partial f}{\partial x}\right|_{\mu_{t-1}, u_t} \) and \( H_t = \left.\frac{\partial h}{\partial x}\right|_{\bar{\mu}_t} \) are the Jacobian matrices of \( f \) and \( h \) respectively.

6.3 EKF Algorithm

Prediction step:

\[ \bar{\mu}_t = f(\mu_{t-1}, u_t) \]\[ \bar{\Sigma}_t = F_t \Sigma_{t-1} F_t^\top + Q_t \]

Update step:

\[ K_t = \bar{\Sigma}_t H_t^\top (H_t \bar{\Sigma}_t H_t^\top + R_t)^{-1} \]\[ \mu_t = \bar{\mu}_t + K_t (z_t - h(\bar{\mu}_t)) \]\[ \Sigma_t = (I - K_t H_t) \bar{\Sigma}_t \]

The EKF is identical in structure to the Kalman filter, with \( A_t \) replaced by \( F_t \) and \( C_t \) replaced by \( H_t \), except that the predicted mean uses the actual nonlinear function rather than a linear approximation.

Theorem — EKF Consistency.

An EKF is consistent if the true state \( x_t \) falls within the confidence ellipsoid defined by the estimated covariance \( \Sigma_t \) with the expected frequency: the normalised innovation squared \( \nu_t^\top S_t^{-1} \nu_t \) (where \( S_t = H_t \bar{\Sigma}_t H_t^\top + R_t \) is the innovation covariance) should be chi-squared distributed with degrees of freedom equal to the observation dimension. In practice, EKF linearisation errors cause the filter to underestimate uncertainty, making it overconfident (inconsistent). More sophisticated methods such as the Unscented Kalman Filter (UKF) or the Invariant EKF (IEKF) improve consistency.

6.4 EKF for Robot Localization

For a differential drive robot with pose \( x_t = (x, y, \theta)^\top \), control input \( u_t = (v_t, \omega_t)^\top \) (linear and angular velocity), and small timestep \( \Delta t \), the motion model is

\[ f(x_{t-1}, u_t) = \begin{pmatrix} x + v_t \Delta t \cos\theta \\ y + v_t \Delta t \sin\theta \\ \theta + \omega_t \Delta t \end{pmatrix} \]

The Jacobian with respect to state is

\[ F_t = \begin{pmatrix} 1 & 0 & -v_t \Delta t \sin\theta \\ 0 & 1 & v_t \Delta t \cos\theta \\ 0 & 0 & 1 \end{pmatrix} \]

For a range-bearing measurement to a landmark at known position \( (m_x, m_y) \):

\[ h(x_t) = \begin{pmatrix} \sqrt{(m_x - x)^2 + (m_y - y)^2} \\ \text{atan2}(m_y - y, m_x - x) - \theta \end{pmatrix} \]

The Jacobian \( H_t \) is computed by differentiating \( h \) with respect to \( (x, y, \theta) \):

\[ H_t = \frac{1}{q} \begin{pmatrix} -(m_x - x) & -(m_y - y) & 0 \\ (m_y - y)/\sqrt{q} & -(m_x - x)/\sqrt{q} & -q \end{pmatrix} \]

where \( q = (m_x - x)^2 + (m_y - y)^2 \).

Example 6.1 — EKF Localization: Predict and Update Step.

Robot initial estimate: \( \mu_0 = (0, 0, 0)^\top \), \( \Sigma_0 = \text{diag}(0.1, 0.1, 0.01) \). Motion command: \( v = 1 \) m/s, \( \omega = 0 \) rad/s, \( \Delta t = 0.1 \) s. Process noise: \( Q = \text{diag}(0.01, 0.01, 0.001) \).

Prediction:

\[ \bar\mu_1 = (0 + 0.1 \times 1 \times \cos 0,\; 0 + 0.1 \times 1 \times \sin 0,\; 0) = (0.1, 0, 0)^\top \]\[ F_1 = \begin{pmatrix} 1 & 0 & -0.1 \times \sin 0 \\ 0 & 1 & 0.1 \times \cos 0 \\ 0 & 0 & 1 \end{pmatrix} = \begin{pmatrix} 1 & 0 & 0 \\ 0 & 1 & 0.1 \\ 0 & 0 & 1 \end{pmatrix} \]\[ \bar\Sigma_1 = F_1 \Sigma_0 F_1^\top + Q \approx \begin{pmatrix} 0.11 & 0 & 0 \\ 0 & 0.111 & 0.001 \\ 0 & 0.001 & 0.011 \end{pmatrix} \]

Now a landmark at \( (2, 0)^\top \) is observed with range \( r = 1.9 \) m, bearing \( \beta = 0.05 \) rad. The measurement noise is \( R = \text{diag}(0.01, 0.001) \). The predicted measurement is \( h(\bar\mu_1) = (1.9, 0)^\top \). The innovation is \( \nu = (1.9 - 1.9,\; 0.05 - 0)^\top = (0, 0.05)^\top \). The update step would then move the bearing estimate closer to 0.05 rad, slightly rotating the estimated pose to account for the observed off-axis landmark.

6.5 Particle Filter

The particle filter (PF), also called Sequential Monte Carlo (SMC), is a non-parametric implementation of the Bayes filter. Instead of maintaining a Gaussian, it represents the belief as a set of \( N \) weighted samples (particles) \( \{x_t^{[i]}, w_t^{[i]}\}_{i=1}^N \), where \( \sum_i w_t^{[i]} = 1 \).

Definition — Particle Filter.

The particle filter is a sequential Monte Carlo algorithm that approximates the posterior belief \( \text{bel}(x_t) \) by a finite set of weighted samples (particles). Each particle \( x_t^{[i]} \) represents a hypothesis about the state, with weight \( w_t^{[i]} \) proportional to the likelihood of the observations under that hypothesis. As \( N \to \infty \), the particle approximation converges to the true posterior.

6.5.1 Sequential Importance Sampling (SIS)

At each step, particles from the previous belief are propagated through the motion model and re-weighted by the measurement likelihood:

  1. Sample: Draw \( x_t^{[i]} \sim p(x_t \mid x_{t-1}^{[i]}, u_t) \) for each \( i \).
  2. Weight: Compute \( w_t^{[i]} = w_{t-1}^{[i]} \cdot p(z_t \mid x_t^{[i]}) \).
  3. Normalise: \( w_t^{[i]} \leftarrow w_t^{[i]} / \sum_j w_t^{[j]} \).

Over many steps, SIS suffers from weight degeneracy: all weight concentrates on a single particle. The remedy is resampling.

6.5.2 Resampling

Resampling replaces the weighted particle set with a new unweighted set drawn with replacement proportional to the weights. After resampling, all weights are reset to \( 1/N \). This eliminates low-weight particles and duplicates high-weight particles, focusing computational resources on high-probability regions.

Common resampling algorithms: multinomial resampling (O(N log N)), systematic resampling (O(N), recommended), and residual resampling.

Adaptive resampling: resample only when the effective sample size \( N_{\text{eff}} = 1/\sum_i (w^{[i]})^2 \) falls below a threshold (typically \( N/2 \)). This avoids unnecessary variance introduced by resampling when weight degeneracy is not yet severe.

Particle filters can represent arbitrary, multi-modal distributions — a crucial advantage over the EKF, which is confined to a single Gaussian. Consider a robot that wakes up in a symmetric hallway with identical left and right walls. Its belief is genuinely bimodal: it could be at either end. A particle filter naturally maintains two clusters of particles (one at each hypothesis), while the EKF would be forced to represent this as a single Gaussian centred between the two hypotheses — a point that may not correspond to any physically plausible location.

Example 6.2 — Particle Filter Resampling.

Suppose \( N = 5 \) particles have weights \( (0.4, 0.3, 0.2, 0.05, 0.05) \) after measurement update. Using systematic resampling: draw a single uniform sample \( u \sim U(0, 1/5) \), say \( u = 0.09 \). The cumulative weights are \( (0.4, 0.7, 0.9, 0.95, 1.0) \). The resampling indices are obtained by checking which cumulative weight bracket \( u + k/N \) falls into for \( k = 0, 1, 2, 3, 4 \): values \( 0.09, 0.29, 0.49, 0.69, 0.89 \) fall in bins 1, 1, 2, 2, 3 respectively. So particles 1 (×2), 2 (×2), 3 (×1) survive, and particles 4, 5 are eliminated. All surviving particles have weight \( 1/5 \). The new set reflects that particles 1 and 2 (highest weights) represent the most probable hypotheses.

6.5.3 Monte Carlo Localization (MCL)

MCL applies the particle filter to robot localization given a known map. The robot begins with a uniform particle distribution (global localization) or a concentrated Gaussian around a known start. As the robot moves and observes landmarks (or lidar scans), the particle cloud converges to the true pose. MCL is robust to kidnapping (sudden displacement) because new particles can be injected when the overall likelihood drops.


Chapter 7: Localization and Mapping

7.1 The Localization and Mapping Problems

The localization problem assumes the map is known and asks for the robot’s pose. The mapping problem assumes the pose is known and asks for the map. In practice, both are unknown, giving rise to the Simultaneous Localization and Mapping (SLAM) problem.

7.1.1 Map Representations

  • Metric maps: explicit geometric representation. Examples: occupancy grids (2D arrays where each cell stores the probability that it is occupied), point clouds (3D lidar maps), and signed distance fields.
  • Topological maps: graph-based, with nodes representing places and edges representing traversable paths. Compact and efficient for planning but less precise.
  • Semantic maps: label regions with meaning (room, corridor, exit). Used for high-level task planning.

7.1.2 Occupancy Grid Mapping

An occupancy grid divides the environment into a regular grid of cells. Each cell \( m_i \) stores the log-odds of occupancy:

\[ l_{t,i} = \log \frac{p(m_i = 1 \mid z_{1:t})}{p(m_i = 0 \mid z_{1:t})} \]

The log-odds update from a new sensor reading is additive:

\[ l_{t,i} = l_{t-1,i} + \log \frac{p(m_i = 1 \mid z_t)}{p(m_i = 0 \mid z_t)} - l_0 \]

where \( l_0 \) is the prior log-odds. This is computationally cheap and naturally handles sensor uncertainty.

7.2 SLAM: Simultaneous Localization and Mapping

SLAM is one of the fundamental problems in mobile robotics. The joint posterior over the robot trajectory and the map is

\[ p(x_{0:t}, m \mid z_{1:t}, u_{1:t}) \]

For a map with \( N \) landmarks, this posterior grows in dimension with time, making it intractable for large maps without structural assumptions.

7.2.1 EKF-SLAM

EKF-SLAM represents the joint state \( \mathbf{s}_t = (x_t, m_1, m_2, \ldots, m_N)^\top \) as a single Gaussian. The state dimension grows as \( 3 + 2N \). The full covariance matrix is \( (3 + 2N) \times (3 + 2N) \), so updating it takes \( O(N^2) \) time — cubic complexity overall, limiting EKF-SLAM to a few hundred landmarks.

The key insight of EKF-SLAM: robot-landmark cross-correlations in the covariance matrix encode the shared uncertainty between the robot pose and landmark positions. Revisiting a previously seen landmark reduces uncertainty in both the robot pose and the map.

7.2.2 Particle-Filter SLAM (FastSLAM)

FastSLAM (Montemerlo et al., 2002) exploits a factorisation of the SLAM posterior:

\[ p(x_{1:t}, m \mid z_{1:t}, u_{1:t}) = p(x_{1:t} \mid z_{1:t}, u_{1:t}) \prod_{j=1}^N p(m_j \mid x_{1:t}, z_{1:t}) \]

The trajectory posterior is handled by a particle filter; conditioned on each particle’s trajectory, each landmark’s posterior is independent and can be tracked by a separate 2D EKF. This reduces complexity to \( O(MN) \) where \( M \) is the number of particles.

7.2.3 Graph-Based SLAM

Modern SLAM systems represent the problem as a factor graph: nodes are robot poses and landmark positions, edges are constraints from odometry or landmark observations. SLAM reduces to maximum a posteriori estimation, which becomes a nonlinear least-squares problem solved by Gauss-Newton or Levenberg-Marquardt. Efficient sparse matrix solvers (iSAM, g2o, GTSAM) exploit the sparse structure of the factor graph, achieving near-linear scaling.


Chapter 8: Graph-Based Motion Planning

8.1 The Motion Planning Problem

Motion planning seeks a path (or trajectory) from a start configuration to a goal configuration that avoids obstacles and satisfies robot constraints. The robot’s configuration space \( \mathcal{C} \) is the set of all possible configurations \( q \). Obstacles in the workspace map to obstacle regions \( \mathcal{C}_{\text{obs}} \) in configuration space. The free space is \( \mathcal{C}_{\text{free}} = \mathcal{C} \setminus \mathcal{C}_{\text{obs}} \).

For a point robot in a 2D environment, \( \mathcal{C} = \mathbb{R}^2 \) and the problem is straightforward. For a rigid body that can rotate, \( \mathcal{C} = \mathbb{R}^2 \times \mathbb{S}^1 \) (3D configuration space). For a robot arm with \( n \) joints, \( \mathcal{C} = \mathbb{R}^n \).

8.2 Graph Search: Dijkstra’s Algorithm

Dijkstra’s algorithm finds the shortest path in a weighted graph from a source node to all other nodes. It maintains a priority queue of nodes ordered by their tentative shortest distance from the source.

8.2.1 Algorithm

Given a graph \( G = (V, E) \) with non-negative edge weights \( w(u, v) \) and source \( s \):

  1. Initialise \( d(s) = 0 \), \( d(v) = \infty \) for all \( v \neq s \). Priority queue \( Q \leftarrow V \).
  2. While \( Q \) is non-empty: a. Extract \( u = \arg\min_{v \in Q} d(v) \). b. For each neighbour \( v \) of \( u \): if \( d(u) + w(u,v) < d(v) \), update \( d(v) \leftarrow d(u) + w(u,v) \) and set parent \( \pi(v) = u \).
  3. The shortest path to any node is recovered by backtracking through \( \pi \).

Time complexity: \( O((V + E) \log V) \) with a binary heap.

Example 8.1 — Dijkstra on a Grid.

Consider a \( 4 \times 4 \) grid where each cell is a node and adjacent cells (4-connected) have edge weight 1, except that cells in the right column have weight 3. Start: top-left (0,0). Goal: bottom-right (3,3).

Dijkstra explores outward in rings of equal cost. It will prefer paths along the left and bottom edges (cost 6) over paths through the right column. The shortest path found is, for instance, (0,0) → (1,0) → (2,0) → (3,0) → (3,1) → (3,2) → (3,3) with total cost 3+3+3+1+1+1 — wait, costs depend on the destination edge. More precisely, Dijkstra correctly computes the globally optimal path by exhaustive exploration, unlike greedy methods.

A* augments Dijkstra’s algorithm with a heuristic \( h(n) \) that estimates the cost from node \( n \) to the goal. The key function is

\[ f(n) = g(n) + h(n) \]

where \( g(n) \) is the known cost from start to \( n \) and \( h(n) \) is the heuristic estimate of the remaining cost.

Definition — Admissible and Consistent Heuristic.

A heuristic \( h(n) \) is admissible if it never overestimates the true cost to the goal: \( h(n) \leq h^*(n) \) for all \( n \), where \( h^*(n) \) is the true optimal cost. A heuristic is consistent (or monotone) if for every node \( n \) and successor \( n' \): \( h(n) \leq c(n, n') + h(n') \), where \( c(n, n') \) is the edge cost. Consistency implies admissibility.

Theorem — A* Completeness and Optimality.

If the graph has finitely many nodes and all edge costs are positive, A* with an admissible heuristic is complete (finds a path if one exists) and optimal (finds the minimum-cost path). With a consistent heuristic, A* never re-expands a node, achieving the same optimality guarantee with lower computational overhead than inadmissible heuristics.

Proof of A* Admissibility with Consistent Heuristic.

Suppose A* terminates with path \( p^* \) and cost \( f^* = g(p^*) \). We show \( g(p^*) \leq g(p) \) for any other path \( p \) from start to goal.

At the time A* extracts the goal node \( G \), all nodes \( n \) on the optimal path \( p_{\text{opt}} \) that have not yet been expanded must still be in the open set (by completeness of the search and finiteness of the graph). Let \( n' \) be the first such unexpanded node on \( p_{\text{opt}} \). By the consistency condition applied along the path from \( n' \) to \( G \):

\[ h(n') \leq c(n', \ldots, G) = g_{\text{opt}}(G) - g(n') \]

Therefore \( f(n') = g(n') + h(n') \leq g_{\text{opt}}(G) \). Since A* selects the node with minimum \( f \) and it chose \( G \) over \( n' \):

\[ g(G) = f(G) \leq f(n') \leq g_{\text{opt}}(G) \]

But \( g(G) \geq g_{\text{opt}}(G) \) by definition of optimal. Hence \( g(G) = g_{\text{opt}}(G) \). \( \square \)

Example 8.2 — A* on a Grid.

A \( 5 \times 5 \) grid with a vertical wall of obstacles at column 2, rows 1–3. Start: (0, 2). Goal: (4, 2). Heuristic: Euclidean distance \( h(n) = \|(n_x - 4, n_y - 2)\|_2 \), which is admissible since it never exceeds true path length.

A* first expands nodes towards the goal directly. On encountering the wall, it backtracks and finds the path around the wall (e.g., through row 0 or row 4). It expands fewer nodes than Dijkstra because the heuristic prunes dead-end directions efficiently. The optimal path has length 6 (going around the 3-cell wall requires 1 extra step up, 3 forward, 1 back down = 5 horizontal + 2 vertical = 7 steps total depending on geometry).

8.3.1 Choice of Heuristic

  • Euclidean distance: admissible for any robot on a continuous map with unit-cost moves.
  • Manhattan distance: admissible for 4-connected grids.
  • Diagonal distance: admissible for 8-connected grids.
  • Zero heuristic: \( h(n) = 0 \) reduces A* to Dijkstra — complete and optimal but slow.

8.4 D* and Dynamic Replanning

In dynamic environments, obstacles may appear or move after the initial plan was computed. Re-running A* from scratch is wasteful. D* (Dynamic A*, Stentz 1994) and its successor D*-Lite (Koenig & Likhachev 2002) maintain data structures that allow efficient replanning when the world changes.

D*-Lite works backwards from the goal. When the robot moves and discovers new obstacles, only the portions of the search tree affected by the change need to be updated. This makes D* efficient in environments that change incrementally — such as exploration scenarios where a robot discovers new walls as it advances.

The key invariant: maintain an estimate \( rhs(v) = \min_{u \in \text{pred}(v)} (c(u,v) + g(u)) \) as a one-step lookahead. Nodes where \( g(v) \neq rhs(v) \) are called “locally inconsistent” and are placed in the priority queue for update. This is the same idea as dynamic programming in reverse.


Chapter 9: Sampling-Based Motion Planning

9.1 Why Sampling-Based Methods?

Explicit computation of the obstacle-free configuration space \( \mathcal{C}_{\text{free}} \) is tractable only for low-dimensional configuration spaces and simple obstacle shapes. For high-DOF robots (manipulators, humanoids) or complex environments, configuration space has too many dimensions and obstacles are too irregular for exact computation. Sampling-based methods bypass explicit \( \mathcal{C}_{\text{free}} \) construction by randomly sampling configurations and using a collision checker to test whether a configuration is free. This approach scales to tens or hundreds of DOF.

9.2 Probabilistic Roadmap Method (PRM)

Definition — Probabilistic Roadmap Method (PRM).

PRM is a two-phase sampling-based planning algorithm. In the learning phase, random collision-free configurations are sampled and connected by local planner paths (typically straight-line segments in \( \mathcal{C} \)) to build a roadmap graph. In the query phase, start and goal configurations are connected to the roadmap and a graph search (e.g., A*) finds a path.

9.2.1 PRM Algorithm

Learning phase:

  1. Repeat until roadmap has \( N \) nodes: a. Sample \( q_{\text{rand}} \sim U(\mathcal{C}) \). b. If \( q_{\text{rand}} \in \mathcal{C}_{\text{free}} \) (collision-free), add to roadmap \( \mathcal{V} \). c. For each \( q \in \mathcal{V} \) within distance \( r \) of \( q_{\text{rand}} \), try to connect with a local planner. If the connecting path is collision-free, add edge \( (q, q_{\text{rand}}) \) to \( \mathcal{E} \).

Query phase:

  1. Connect \( q_{\text{start}} \) and \( q_{\text{goal}} \) to the nearest roadmap nodes.
  2. Search the roadmap graph for a path from \( q_{\text{start}} \) to \( q_{\text{goal}} \).

PRM is best suited for multi-query scenarios (many queries in the same static environment) because the roadmap construction cost is amortised over multiple queries.

PRM can fail to connect narrow passages — regions of \( \mathcal{C}_{\text{free}} \) that are thin relative to the sampling density. A random sample landing in a narrow passage is unlikely. Bridge sampling and Gaussian sampling near obstacle boundaries are strategies to improve narrow-passage coverage.

9.3 Rapidly-Exploring Random Trees (RRT)

Definition — Rapidly-Exploring Random Tree (RRT).

RRT is a single-query, tree-based planning algorithm that builds a tree rooted at the start configuration by iteratively extending towards random samples in \( \mathcal{C} \). The tree rapidly explores the free configuration space and is biased towards unexplored regions by the random sampling strategy.

9.3.1 RRT Algorithm

  1. Initialise tree \( \mathcal{T} \) with root \( q_{\text{start}} \).
  2. Repeat until goal is reached or iteration limit: a. Sample \( q_{\text{rand}} \sim U(\mathcal{C}) \) (with probability \( p_{\text{goal}} \), sample \( q_{\text{goal}} \) instead). b. Find nearest node \( q_{\text{near}} = \arg\min_{q \in \mathcal{T}} \|q - q_{\text{rand}}\| \). c. Extend: \( q_{\text{new}} = q_{\text{near}} + \epsilon \, \frac{q_{\text{rand}} - q_{\text{near}}}{\|q_{\text{rand}} - q_{\text{near}}\|} \) (step of size \( \epsilon \) towards \( q_{\text{rand}} \)). d. If the segment \( (q_{\text{near}}, q_{\text{new}}) \) is collision-free, add \( q_{\text{new}} \) to \( \mathcal{T} \).
  3. If \( q_{\text{new}} \) is within \( \epsilon \) of \( q_{\text{goal}} \), extract path by backtracking.

Theorem — RRT Probabilistic Completeness.

RRT is probabilistically complete: as the number of iterations \( n \to \infty \), the probability that RRT fails to find a path (when one exists) goes to zero. This follows from the fact that, in an obstacle-free region, the Voronoi diagram of the tree nodes ensures that each region of free space is sampled at a rate proportional to its volume, so the tree eventually covers the entire free space.

Example 9.1 — RRT Tree Expansion.

A 2D environment is \( [0, 10]^2 \). Start: \( (0.5, 0.5) \), Goal: \( (9.5, 9.5) \). A rectangular obstacle occupies \( [3, 7] \times [3, 7] \). Step size \( \epsilon = 0.5 \).

Iteration 1: \( q_{\text{rand}} = (6.2, 1.4) \). Nearest node is the root \( (0.5, 0.5) \). New node: \( (0.5, 0.5) + 0.5 \times (5.7, 0.9)/5.77 = (0.99, 0.58) \). Collision-free — added.

After many iterations, the tree grows outward, fills the free space around the obstacle, and eventually a node falls near the goal. The resulting path is collision-free but likely non-smooth and suboptimal. Post-processing with path shortcutting can reduce path length.

9.3.2 Bidirectional RRT (RRT-Connect)

RRT-Connect (Kuffner & LaValle 2000) grows two trees simultaneously — one from the start, one from the goal — and attempts to connect them. This dramatically reduces planning time in practice, especially in high-dimensional spaces.

9.4 RRT*: Asymptotically Optimal RRT

Definition — RRT* (Asymptotically Optimal RRT).

RRT* (Karaman & Frazzoli, 2011) extends RRT with two modifications: (1) instead of connecting \( q_{\text{new}} \) to \( q_{\text{near}} \), it finds the minimum-cost parent among all nodes within a ball of radius \( r_n = \gamma (\log n / n)^{1/d} \); (2) after adding \( q_{\text{new}} \), it rewires the tree: for each node \( q \) in the ball, if routing through \( q_{\text{new}} \) reduces \( q \)’s cost-to-come, update \( q \)’s parent to \( q_{\text{new}} \).

Theorem — RRT* Asymptotic Optimality.

RRT* is asymptotically optimal: as \( n \to \infty \), the cost of the path found by RRT* converges almost surely to the optimal path cost. In contrast, RRT (without rewiring) converges almost surely to a suboptimal path because it cannot improve a node’s connection after insertion. The convergence rate of RRT* is \( O((\log n / n)^{1/d}) \) in \( d \)-dimensional configuration space.

The computational overhead of RRT* relative to RRT is modest: the rewiring step has expected \( O(\log n) \) neighbours to check per iteration, so the overall complexity remains \( O(n \log n) \) versus \( O(n) \) for RRT. For most practical planning problems, the improvement in path quality justifies the extra computation. Variants such as Informed RRT* (sampling only within the ellipsoidal region that could contain the optimal path) converge even faster.


Chapter 10: Local Planning and Reactive Navigation

10.1 Global vs. Local Planning

The planning hierarchy in autonomous robots typically has two levels:

  • Global planner: uses the full map (or a prior map) to compute a reference path from start to goal. Runs offline or infrequently. Algorithms: A*, PRM, RRT*.
  • Local planner: executes the global plan while reacting to dynamic obstacles and local terrain variations. Runs at high frequency (10–50 Hz). Algorithms: Trajectory Rollout, Dynamic Window Approach, Potential Fields.

The local planner must solve the real-time problem: given the current state (pose, velocity) and a short-horizon local map (from sensor data), generate feasible, safe, goal-directed velocity commands within the robot’s kinematic and dynamic constraints.

10.2 Potential Field Methods

Definition — Artificial Potential Field.

An artificial potential field is a scalar function \( U : \mathcal{C}_{\text{free}} \to \mathbb{R} \) defined over the robot’s configuration space. It has an attractive component \( U_{\text{att}} \) that pulls the robot towards the goal and a repulsive component \( U_{\text{rep}} \) that pushes it away from obstacles. The robot follows the negative gradient (steepest descent):

\[ \mathbf{F} = -\nabla U = -\nabla U_{\text{att}} - \nabla U_{\text{rep}} \]

10.2.1 Attractive Potential

The attractive potential grows with distance from the goal \( q_g \):

\[ U_{\text{att}}(q) = \frac{1}{2} k_{\text{att}} \|q - q_g\|^2 \]

The corresponding attractive force is \( \mathbf{F}_{\text{att}} = -k_{\text{att}} (q - q_g) \), a spring force directed towards the goal. For a simple goal without obstacles, gradient descent on this potential drives the robot directly to \( q_g \).

A capped (conical) attractive potential avoids large forces far from the goal:

\[ U_{\text{att}}(q) = \begin{cases} \frac{1}{2} k_{\text{att}} d(q, q_g)^2 & d \leq d^* \\ k_{\text{att}} d^* \left(d - \frac{1}{2} d^*\right) & d > d^* \end{cases} \]

10.2.2 Repulsive Potential

The repulsive potential grows as the robot approaches an obstacle. Let \( D(q) = \min_c \|q - c\| \) be the distance from configuration \( q \) to the nearest obstacle boundary. Define:

\[ U_{\text{rep}}(q) = \begin{cases} \frac{1}{2} k_{\text{rep}} \left(\frac{1}{D(q)} - \frac{1}{D_0}\right)^2 & D(q) \leq D_0 \\ 0 & D(q) > D_0 \end{cases} \]

where \( D_0 \) is the influence distance. The repulsive force is

\[ \mathbf{F}_{\text{rep}} = \begin{cases} k_{\text{rep}} \left(\frac{1}{D(q)} - \frac{1}{D_0}\right) \frac{1}{D(q)^2} \nabla D(q) & D(q) \leq D_0 \\ 0 & \text{otherwise} \end{cases} \]

Example 10.1 — Potential Field Gradient Descent.

Robot at \( q = (1, 1) \), goal at \( q_g = (5, 5) \), circular obstacle centred at \( (3, 3) \) with radius 1. Parameters: \( k_{\text{att}} = 1 \), \( k_{\text{rep}} = 2 \), \( D_0 = 2 \).

At \( q = (1, 1) \): \( d(q, q_g) = 4\sqrt{2} \approx 5.66 \). Attractive force: \( \mathbf{F}_{\text{att}} = -(1,1) - (5,5) = (4, 4) \) (towards goal).

Distance to obstacle boundary: \( \|(1,1) - (3,3)\| - 1 = 2\sqrt{2} - 1 \approx 1.83 < D_0 = 2 \). So repulsive potential is active. \( \nabla D(q) = (1,1) - (3,3) / \|(1,1) - (3,3)\| = (-1/\sqrt{2}, -1/\sqrt{2}) \) (pointing away from obstacle centre). Repulsive force pushes the robot away from the obstacle (towards lower-left), partially counteracting the attractive force.

The robot follows the combined gradient, tracing a curved path that avoids the obstacle while advancing towards the goal.

10.2.3 Local Minima Problem

Potential field methods suffer from local minima: configurations where \( \nabla U = 0 \) but which are not the goal. This occurs when the repulsive force exactly cancels the attractive force, trapping the robot. Common scenarios: U-shaped obstacles facing the goal, narrow corridors.

Remedies include: random perturbations (escape by random walk), wavefront propagation (compute potential as shortest path on a grid — no local minima, but expensive), harmonic potential functions (guaranteed no local minima by construction, but computationally intensive), and switching to a global planner when stuck. In practice, potential fields are best used for smooth obstacle avoidance in known-obstacle-free regions rather than for global navigation.

10.3 Trajectory Rollout

Trajectory Rollout (and the related method Dynamic Window Approach) sample possible velocity commands, simulate the robot’s motion forward in time, and select the command that best balances goal-directedness and obstacle avoidance.

In Trajectory Rollout, for a differential drive robot, candidate velocities \( (v, \omega) \) are sampled from the admissible velocity space. For each candidate, the robot’s trajectory is simulated for a horizon \( T \) seconds. Each trajectory is scored by a cost function:

\[ J(v, \omega) = \alpha \cdot \text{heading}(\tau) + \beta \cdot \text{clearance}(\tau) + \gamma \cdot \text{velocity}(v) \]

where heading measures alignment with the goal, clearance measures distance to the nearest obstacle along the trajectory, and velocity rewards higher speed. The command \( (v^*, \omega^*) \) with the lowest cost is executed, and the process repeats.

10.4 Dynamic Window Approach (DWA)

Definition — Dynamic Window Approach (DWA).

DWA (Fox, Burgard & Thrun, 1997) is a local planning method that restricts the search space to velocity commands reachable within one time step, given the robot’s current velocity and acceleration limits. The dynamic window \( DW \) is the set of velocities \( (v, \omega) \) satisfying kinematic feasibility (within acceleration bounds from current velocity) and safety (all obstacle-free within the admissible stopping distance). Within this window, DWA searches for the command that maximises an objective function balancing goal heading, clearance, and velocity.

10.4.1 DWA Algorithm

  1. Compute the dynamic window \( DW \) from current \( (v_c, \omega_c) \) and acceleration bounds \( (\dot{v}_{\max}, \dot{\omega}_{\max}) \):
\[ DW = \left\{ (v, \omega) : |v - v_c| \leq \dot{v}_{\max} \Delta t,\; |\omega - \omega_c| \leq \dot{\omega}_{\max} \Delta t \right\} \]
  1. Further restrict to safe velocities: \( (v, \omega) \) is safe if the robot can stop before hitting any obstacle assuming maximum deceleration.

  2. For each \( (v, \omega) \in DW \cap \mathcal{V}_{\text{safe}} \), simulate the circular arc trajectory and evaluate:

\[ G(v, \omega) = \sigma\left(\alpha \cdot \text{heading}(v, \omega) + \beta \cdot \text{dist}(v, \omega) + \gamma \cdot v\right) \]

where \( \sigma \) is a normalising function and all three terms are normalised to \( [0, 1] \).

  1. Execute \( (v^*, \omega^*) = \arg\max G \).

Example 10.2 — DWA Velocity Selection.

Current robot velocity: \( v_c = 0.4 \) m/s, \( \omega_c = 0 \) rad/s. Acceleration limits: \( \dot{v}_{\max} = 0.5 \) m/s², \( \dot\omega_{\max} = 1 \) rad/s². Time step: \( \Delta t = 0.1 \) s.

Dynamic window: \( v \in [0.35, 0.45] \) m/s, \( \omega \in [-0.1, 0.1] \) rad/s.

Sample 25 candidate \( (v, \omega) \) pairs on a \( 5 \times 5 \) grid within \( DW \). For each, simulate 1 s of arc trajectory. Suppose obstacle is to the right; the candidate with \( \omega = 0.1 \) rad/s (turning left) achieves the highest clearance score while maintaining reasonable heading towards goal. DWA selects \( (v^*, \omega^*) = (0.42, 0.1) \), slightly increasing speed while turning left to avoid the obstacle.

10.4.2 DWA vs. Potential Fields

DWA is inherently safe: it only considers velocities from which the robot can stop in time. Potential fields, in contrast, output a force that must be converted to velocity via a separate control law, and can produce unsafe commands near obstacles with sharp repulsive gradients. DWA also handles the robot’s kinodynamic constraints (acceleration limits) explicitly, making it suitable for fast robots. Potential fields are computationally cheaper but require careful tuning to avoid local minima and instabilities.

10.5 Integrating Local and Global Planning

A complete navigation system integrates all four pillars:

  1. Sensing: lidar and camera perceive the local environment; IMU and wheel encoders measure motion.
  2. Estimation: EKF or particle filter maintains a pose estimate relative to the map.
  3. Global planning: A* or RRT* finds a waypoint sequence from current position to goal on the known map.
  4. Local planning: DWA or Trajectory Rollout executes the path in real time, reacting to dynamic obstacles while tracking the global waypoints.

The ROS (Robot Operating System) Navigation Stack implements exactly this architecture: move_base integrates a global planner (e.g., Dijkstra on an occupancy grid) with a local planner (DWA) and interfaces with the AMCL particle-filter localiser.

Example 10.3 — End-to-End Navigation.

A differential drive robot must navigate from a loading bay to a storage shelf in a warehouse.

Map: occupancy grid at 5 cm resolution, built offline with lidar SLAM. Localisation: AMCL particle filter with 500 particles, laser scan matcher. Global plan: A* on inflated occupancy grid (obstacles inflated by robot radius). Local plan: DWA at 10 Hz with \( \alpha = 0.8, \beta = 0.2, \gamma = 0.4 \). A forklift crosses the corridor unexpectedly; DWA detects it via lidar and turns to avoid, then resumes tracking the global plan waypoint. The robot reaches the shelf within 3 cm of the goal position.


Chapter 11: Advanced Topics and Integration

11.1 Unscented Kalman Filter

The Unscented Kalman Filter (UKF) addresses the EKF’s Jacobian computation burden and linearisation error by using the Unscented Transform (UT). Instead of linearising \( f \) and \( h \), the UT propagates a carefully chosen set of deterministic sigma points through the nonlinear functions:

Choose \( 2n+1 \) sigma points:

\[ \mathcal{X}^{(0)} = \mu, \quad \mathcal{X}^{(i)} = \mu \pm \sqrt{(n + \lambda)\Sigma}_i, \quad i = 1, \ldots, n \]

where \( \lambda = \alpha^2(n + \kappa) - n \) is a scaling parameter. Each sigma point is propagated: \( \mathcal{Y}^{(i)} = f(\mathcal{X}^{(i)}) \). The mean and covariance of the output are recovered as weighted sums:

\[ \bar{y} = \sum_{i=0}^{2n} W_m^{(i)} \mathcal{Y}^{(i)}, \quad P_{yy} = \sum_{i=0}^{2n} W_c^{(i)} (\mathcal{Y}^{(i)} - \bar{y})(\mathcal{Y}^{(i)} - \bar{y})^\top + Q \]

The UKF captures the mean and covariance of the transformed distribution to at least second order (versus first order for EKF), without requiring Jacobian computation. It is typically more accurate than the EKF for highly nonlinear systems and is preferred in aerospace and robotics applications.

11.2 SLAM Frontiers and Modern Methods

11.2.1 Visual-Inertial Odometry

Modern autonomous systems fuse camera imagery with IMU measurements for accurate, drift-reduced state estimation. Visual-Inertial Odometry (VIO) methods (e.g., MSCKF, OKVIS, VINS-Mono) jointly optimise visual feature tracks and IMU preintegration factors in a sliding-window factor graph. The IMU provides high-rate attitude and short-term position estimates; camera features provide absolute-scale information (in stereo) or up-to-scale constraints (monocular) and long-term drift correction.

11.2.2 LiDAR SLAM

Modern lidar SLAM systems (LOAM, LeGO-LOAM, LIO-SAM) extract geometric features (edge points and planar points) from each lidar scan and register them with a local map using point-to-line and point-to-plane ICP variants. These methods achieve centimetre-level accuracy at real-time rates and are the dominant approach for autonomous vehicle localisation.

11.2.3 Deep Learning in Perception

Convolutional neural networks (CNNs) have transformed robot perception. Semantic segmentation assigns a class label to every pixel, enabling the robot to identify driveable surfaces, pedestrians, and obstacles. Object detection (YOLO, Faster R-CNN) localises and classifies obstacles in real time. Depth estimation from monocular images (using self-supervised learning on video sequences) opens the door to metric mapping without stereo or lidar.

11.3 Putting It All Together: Autonomous Navigation Architecture

A production autonomous mobile robot architecture follows this data flow:

  1. Raw sensors: IMU at 200 Hz, lidar at 10 Hz, cameras at 30 Hz, wheel encoders at 500 Hz.
  2. Low-level fusion: IMU + encoder fusion via EKF at 200 Hz for low-latency velocity and attitude.
  3. Mid-level localisation: lidar scan-matching against the map, updating a particle filter or EKF at 10 Hz.
  4. Mapping: occupancy grid updates as new lidar scans arrive; SLAM graph optimisation when loop closures are detected.
  5. Global planning: A* or RRT* recomputed when goal changes or major map updates occur; waypoint sequence output at \( \sim 1 \) Hz.
  6. Local planning: DWA or model predictive control (MPC) at 10–50 Hz, consuming current velocity, local obstacle map, and next global waypoint.
  7. Control: PID or feedback linearisation translates velocity commands to wheel motor currents at 500–1000 Hz.

Each layer runs asynchronously; careful design of message-passing interfaces (e.g., ROS topics) is required to handle timing mismatches and ensure safety.

Example 11.1 — Sensor Fusion Architecture for an AMR.

An autonomous mobile robot (AMR) in a factory floor carries a 2D lidar, a stereo camera, an IMU, and wheel encoders. The EKF state is \( (x, y, \theta, v, \omega)^\top \), where \( v \) and \( \omega \) are linear and angular velocity.

Prediction (200 Hz, from IMU): propagate pose and velocity using IMU angular rate and wheel speed as control inputs. The Jacobian \( F_t \) is the 5×5 kinematic Jacobian.

Update 1 (500 Hz, from encoders): observation model \( h(x) = (v, \omega)^\top \), \( H = \left[\begin{smallmatrix} 0 & 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 0 & 1 \end{smallmatrix}\right] \). Measurement noise \( R_{\text{enc}} = \text{diag}(0.001, 0.001) \).

Update 2 (10 Hz, from lidar scan-matching): observation model gives a refined \( (x, y, \theta) \) estimate from ICP. Measurement noise \( R_{\text{lidar}} = \text{diag}(0.01, 0.01, 0.001) \).

The multi-rate EKF maintains a smooth, accurate pose estimate at high rate (IMU/encoder) while incorporating accurate but slow lidar corrections.

11.4 Safety and Uncertainty in Autonomous Navigation

Robust autonomous navigation requires explicit reasoning about uncertainty at every level:

  • Perception uncertainty: false positives and false negatives in obstacle detection. Probabilistic occupancy grids accumulate evidence over multiple observations, reducing both types of error.
  • Localisation uncertainty: the covariance ellipsoid from the EKF/particle filter should be inflated before passing to the planner. If the robot is uncertain about its own position, it should maintain larger safety margins.
  • Planning under uncertainty: chance-constrained planning formulates the path-finding problem as ensuring that the probability of collision remains below a threshold \( \delta \), given the state distribution.
  • Fail-safe behaviours: the local planner must always maintain a safe stopping trajectory. DWA’s safety criterion (velocity admissibility) ensures this. E-stop hardware interrupts prevent collisions even if software fails.

The four pillars of MTE 544 — modelling, sensing, estimation, and planning — are not merely sequential stages but deeply interdependent. The kinematic model determines what controls the estimation filter uses as inputs. The sensor model determines the measurement update in the Bayes filter. The map representation used in planning must be compatible with the output of the estimation/mapping module. Errors in any one pillar cascade through the system. Designing a robust autonomous robot requires understanding all four pillars and the interfaces between them.


Appendix A: Mathematical Reference

A.1 Matrix Calculus Identities

For matrices and vectors, the following identities are used throughout:

\[ \frac{\partial}{\partial \mathbf{x}} (\mathbf{a}^\top \mathbf{x}) = \mathbf{a}, \quad \frac{\partial}{\partial \mathbf{x}} (\mathbf{x}^\top A \mathbf{x}) = 2A\mathbf{x} \; (A \text{ symmetric}) \]\[ \frac{\partial}{\partial A} \text{tr}(AB) = B^\top, \quad \frac{\partial}{\partial A} \text{tr}(ABA^\top) = 2AB \]

A.2 Woodbury Matrix Identity

The Woodbury identity allows efficient inversion of rank-\( k \) updates:

\[ (A + UCV)^{-1} = A^{-1} - A^{-1}U(C^{-1} + VA^{-1}U)^{-1}VA^{-1} \]

Applied to the Kalman gain innovation covariance inversion: the innovation covariance \( S = H\bar\Sigma H^\top + R \) is \( m \times m \) (where \( m \) is the observation dimension), much smaller than the full state covariance when \( m \ll n \). Inverting \( S \) rather than \( \bar\Sigma \) gives the computationally preferred form of the Kalman update.

A.3 Jacobian Reference: Differential Drive

For the differential drive model \( f(\mathbf{x}, u) = (x + v\Delta t \cos\theta,\; y + v\Delta t \sin\theta,\; \theta + \omega\Delta t)^\top \):

\[ F = \frac{\partial f}{\partial \mathbf{x}} = \begin{pmatrix} 1 & 0 & -v\Delta t \sin\theta \\ 0 & 1 & v\Delta t \cos\theta \\ 0 & 0 & 1 \end{pmatrix} \]\[ L = \frac{\partial f}{\partial u} = \begin{pmatrix} \Delta t \cos\theta & 0 \\ \Delta t \sin\theta & 0 \\ 0 & \Delta t \end{pmatrix} \]

The noise propagation into state space uses \( L \): if the control noise covariance is \( M = \text{diag}(\sigma_v^2, \sigma_\omega^2) \), then the process noise in state space is \( Q = LML^\top \).

A.4 Summary of Filter Comparison

PropertyKFEKFUKFParticle Filter
System typeLinearNonlinear (mild)Nonlinear (moderate)Any
DistributionGaussianGaussian (approx)Gaussian (approx)Arbitrary
Jacobian requiredNoYesNoNo
Complexity per step\( O(n^3) \)\( O(n^3) \)\( O(n^3) \)\( O(Nn) \)
Multi-modalNoNoNoYes
OptimalityBLUEApproximateBetter approxAsymptotically optimal
Typical useLinear navigationRobot localisationSpacecraft attitudeGlobal localisation

Appendix B: Worked Problem Sets

B.1 Comprehensive EKF Localization Problem

Example B.1 — Full EKF Localization Cycle.

Setup: differential drive robot in 2D plane. Known landmark at \( m = (4, 3)^\top \). Robot initial state \( \mu_0 = (0, 0, 0)^\top \), \( \Sigma_0 = \text{diag}(0.5, 0.5, 0.1) \). Command: \( v = 1 \) m/s, \( \omega = 0.1 \) rad/s, \( \Delta t = 0.5 \) s. Process noise: \( Q = \text{diag}(0.1, 0.1, 0.05) \). Measurement noise: \( R = \text{diag}(0.05, 0.01) \) (range std 0.22 m, bearing std 5.7°).

Prediction:

\[ \bar\mu_1 = \begin{pmatrix} 0 + 1 \times 0.5 \times \cos(0) \\ 0 + 1 \times 0.5 \times \sin(0) \\ 0 + 0.1 \times 0.5 \end{pmatrix} = \begin{pmatrix} 0.5 \\ 0 \\ 0.05 \end{pmatrix} \]

Jacobian at \( \mu_0 \):

\[ F_1 = \begin{pmatrix} 1 & 0 & -0.5\sin(0) \\ 0 & 1 & 0.5\cos(0) \\ 0 & 0 & 1 \end{pmatrix} = \begin{pmatrix} 1 & 0 & 0 \\ 0 & 1 & 0.5 \\ 0 & 0 & 1 \end{pmatrix} \]\[ \bar\Sigma_1 = F_1 \Sigma_0 F_1^\top + Q = \begin{pmatrix} 0.6 & 0 & 0 \\ 0 & 0.6 & 0.05 \\ 0 & 0.05 & 0.15 \end{pmatrix} \]

(Simplified for clarity; off-diagonal elements from \( F_1 \Sigma_0 F_1^\top \) are small but nonzero.)

Expected measurement from \( \bar\mu_1 = (0.5, 0, 0.05) \):

\[ \delta_x = 4 - 0.5 = 3.5, \quad \delta_y = 3 - 0 = 3 \]\[ q = 3.5^2 + 3^2 = 21.25, \quad \sqrt{q} = 4.61 \]\[ h(\bar\mu_1) = \begin{pmatrix} 4.61 \\ \text{atan2}(3, 3.5) - 0.05 \end{pmatrix} = \begin{pmatrix} 4.61 \\ 0.709 \end{pmatrix} \text{ (approx)} \]

Suppose the actual measurement is \( z_1 = (4.55, 0.72)^\top \). Innovation: \( \nu = (4.55 - 4.61,\; 0.72 - 0.709)^\top = (-0.06,\; 0.011)^\top \).

The Jacobian \( H_1 \), Kalman gain \( K_1 \), and update follow the formulas in Section 6.3. The update shifts the estimated position slightly towards the measured landmark direction, reducing uncertainty.

B.2 Graph Search Comparison

Example B.2 — Dijkstra vs. A* on the Same Graph.

A \( 3 \times 3 \) grid with uniform edge weight 1 (4-connected). Start: (0,0). Goal: (2,2). No obstacles.

Dijkstra expands nodes in order of \( g \): first (0,0) with \( g=0 \), then (1,0) and (0,1) with \( g=1 \), then all nodes at \( g=2 \), then (2,2) at \( g=4 \). Total nodes expanded: all 9 nodes (in worst case).

A* with \( h = \)Chebyshev distance: \( h(0,0) = 2 \), \( h(1,0) = 1+1=2 \), \( h(0,1) = 2 \), \( h(1,1) = 1 \), \( h(2,2) = 0 \). \( f \) values: (0,0): 2, (1,0): 3, (0,1): 3. A* first expands (1,1) with \( f = 1+1 = 2 \) after reaching it via (1,0) or (0,1), then reaches (2,2) with \( f = 2+2 = 4 \). A* expands only 5 nodes, half as many as Dijkstra, demonstrating the efficiency gain from the heuristic.

B.3 RRT* Path Improvement

Example B.3 — RRT vs RRT* Path Cost Comparison.

In a 2D environment \( [0,10]^2 \) with no obstacles, the optimal path from \( (0,0) \) to \( (10,10) \) has length \( 10\sqrt{2} \approx 14.14 \). After 500 iterations:

RRT finds a path of length approximately 18–22 (30–55% above optimal) due to the zigzag nature of tree edges and lack of rewiring.

RRT* with \( \gamma = 5 \) finds a path of length approximately 14.5–15.5 (2–10% above optimal) as rewiring progressively straightens the path. After 5000 iterations, RRT* typically achieves within 1% of optimal while RRT’s path remains suboptimal.

This illustrates the asymptotic optimality theorem in practice: the improvement is clear but requires sufficient iterations, especially in high-dimensional spaces.


Glossary

Admissible heuristic: A heuristic that never overestimates the true cost to the goal; required for A* optimality.

Autonomous mobile robot (AMR): A robot capable of navigating its environment without human guidance, using sensors, estimation, and planning.

Bayes filter: The optimal recursive estimator for state estimation under the Markov assumption, maintaining a belief distribution over the state.

Configuration space (\( \mathcal{C} \)): The space of all possible robot configurations (joint angles, pose); planning occurs in this space.

Dead reckoning: Estimating current position by integrating known motion increments from a known starting position, without external correction.

Differential drive: A mobile robot platform with two independently driven wheels on a common axle, capable of turning in place.

Dynamic Window Approach (DWA): A local planning algorithm that restricts velocity commands to kinodynamically feasible, safe options and scores them by a multi-objective function.

Extended Kalman Filter (EKF): A nonlinear extension of the Kalman filter that linearises the system model via Jacobians at each time step.

Factor graph: A bipartite graphical model representing variables and factors (constraints); used in modern SLAM for efficient sparse optimisation.

Holonomic: A system whose constraints can be expressed as constraints on configuration (position) rather than velocity; a holonomic robot can move instantaneously in any direction.

Innovation: The difference between the actual measurement and the predicted measurement in a Kalman filter; \( \nu = z - h(\bar\mu) \).

Kalman gain: The optimal weighting matrix in the Kalman filter update that minimises posterior covariance; balances trust between prediction and measurement.

Kinematic model: A model relating robot joint velocities (or wheel speeds) to body-frame velocities, without considering forces or inertia.

Landmark: A known, distinguishable feature in the environment used for localisation; may be natural (corner) or artificial (QR code).

Lidar: Light Detection and Ranging sensor; measures distances by timing laser pulses, producing 2D or 3D point clouds.

MMSE estimator: The Minimum Mean Square Error estimator; minimises expected squared error; equals the posterior mean for any distribution.

Non-holonomic: A constraint on velocities that cannot be integrated to a configuration constraint; a car is non-holonomic (cannot slide sideways instantly).

Occupancy grid: A grid-based map where each cell stores the probability of being occupied by an obstacle.

Odometry: Estimation of robot pose by integrating incremental motion from wheel encoder readings; accumulates error over time.

Particle filter: A non-parametric implementation of the Bayes filter using weighted random samples (particles) to represent the belief distribution.

Potential field: A scalar function over configuration space whose gradient drives the robot towards the goal while repelling it from obstacles.

PRM: Probabilistic Roadmap Method; a multi-query sampling-based planner that builds a roadmap in free configuration space offline.

Process noise: Stochastic uncertainty in the motion model, representing unmodelled dynamics, wheel slip, and actuation errors.

RRT: Rapidly-exploring Random Tree; a single-query sampling-based planner that grows a tree towards random samples in configuration space.

RRT*: An asymptotically optimal extension of RRT that rewires the tree to minimise path costs, converging to the optimal path as iterations increase.

SE(2): The Special Euclidean group in 2D; the group of all rigid-body transformations (rotation + translation) in the plane.

SLAM: Simultaneous Localisation and Mapping; the problem of concurrently building a map of an unknown environment and estimating the robot’s pose within it.

SO(3): The Special Orthogonal group in 3D; the group of all \( 3\times 3 \) orthogonal matrices with determinant +1, representing 3D rotations.

Visual odometry: Estimation of robot motion from sequential camera images by tracking visual features between frames.

Back to top