MTE 546: Multi-Sensor Data Fusion
Estimated study time: 39 minutes
Table of contents
Introduction and the Fusion Framework
Multi-sensor data fusion is the discipline concerned with combining information from several sensing sources so that the resulting estimate, decision, or description is better than any one sensor could provide alone. “Better” is made precise in several ways: lower mean-squared error, tighter covariance, higher classification accuracy, wider dynamic range, greater robustness to faults, or finer spatial/temporal coverage. A fusion system is not merely a collection of sensors; it is a pipeline in which raw measurements are cleaned, modelled, aligned in space and time, and combined through a principled estimator or evidential rule.
The Joint Directors of Laboratories (JDL) model, originally formulated for defense applications and later generalised, partitions this pipeline into levels. Level 0 handles signal-level pre-processing (filtering, calibration compensation, unit conversion). Level 1 estimates the state of individual objects: position, velocity, identity. Level 2 performs situation assessment, inferring relations among objects. Level 3 is threat or impact assessment, projecting the present state into the future. Level 4 is process refinement, a feedback loop that retunes sensors, allocates resources, or selects fusion algorithms on the fly. Most of MTE 546 lives in Levels 0 and 1, with excursions into Level 2 (data association) and Level 4 (adaptive filtering).
A complementary taxonomy classifies fusion by the level of abstraction at which sources meet. Data-level or raw fusion combines commensurate measurements before any feature extraction; it demands precise registration and typically yields the highest information content but is fragile to sensor heterogeneity. Feature-level fusion first extracts descriptors (edges, corners, spectral peaks) from each source and concatenates or weights them. Decision-level fusion lets each sensor produce a classification or estimate and combines these at the output, using voting, Bayesian combination, or Dempster–Shafer rules. Choosing the right level is an engineering trade-off between bandwidth, calibration effort, and robustness.
Sensor Characterization and Uncertainty
Every fusion algorithm is only as good as the sensor model it assumes. A sensor returns a value \(z\) related to a physical quantity \(x\) through some transformation corrupted by imperfections. The canonical additive model is
\[ z = h(x) + b + v \]where \(h(\cdot)\) is the observation function, \(b\) is a bias (slowly varying or constant), and \(v\) is a zero-mean random error with covariance \(R\). Real devices add several further pathologies that must be characterised separately.
Bias is a systematic offset. It survives averaging and therefore cannot be removed by taking more samples; calibration against a trusted reference is the only remedy. Noise is the random component, usually assumed Gaussian and white, though coloured noise (correlated across time) is common in MEMS devices. Drift is slow time variation of bias, often modelled as a random walk driven by a small process noise. Scale-factor error multiplies the true input; nonlinearity and hysteresis cause the response curve to depart from a straight line. Quantisation introduces a uniformly distributed error of variance \(q^{2}/12\) for step size \(q\). Cross-axis coupling appears in multi-axis sensors when one axis responds to inputs on another.
Datasheet quantities must be interpreted carefully. Accuracy is closeness to truth (limited by bias plus drift); precision is repeatability (limited by noise). Resolution is the smallest change the sensor can report. Bandwidth limits how fast \(x\) can change before the sensor lags. Allan variance plots, popular for gyroscopes, reveal the different noise regimes—quantisation, angle random walk, bias instability, rate random walk—across integration times.
| Sensor type | Dominant error source | Typical model |
|---|---|---|
| MEMS gyro | Bias drift, angle random walk | Rate + bias RW + white noise |
| Consumer GPS | Multipath, ionospheric delay | Position + bias + correlated noise |
| LiDAR | Range noise, beam divergence | Range + Gaussian noise + outliers |
| Monocular camera | Lens distortion, pixel noise | Projective + radial distortion + Gaussian |
| Wheel encoder | Slip, wheel radius error | Odometry + scale error + non-Gaussian slip events |
Probability and Bayes Refresher
Fusion rests on probability theory. For a continuous random vector \(x \in \mathbb{R}^{n}\) with density \(p(x)\), the mean and covariance are
\[ \mu = E\left[x\right] = \int x\, p(x)\,dx, \quad \Sigma = E\left[\left(x-\mu\right)\left(x-\mu\right)^{\top}\right]. \]The Gaussian density
\[ p(x) = \frac{1}{\sqrt{\left(2\pi\right)^{n}\det\Sigma}} \exp\!\left(-\tfrac{1}{2}\left(x-\mu\right)^{\top}\Sigma^{-1}\left(x-\mu\right)\right) \]is central because it is closed under linear transforms, marginalisation, and conditioning, and because the sum of many independent errors tends towards it by the central limit theorem.
Bayes’ rule is the engine of sequential fusion:
\[ p\!\left(x \mid z\right) = \frac{p\!\left(z \mid x\right)\,p(x)}{p(z)}. \]The prior \(p(x)\) encodes what is known before the measurement; the likelihood \(p(z \mid x)\) is dictated by the sensor model; the posterior \(p(x \mid z)\) is the fused belief. The denominator \(p(z)\) is a normalising constant that does not depend on \(x\). In the Gaussian case, both prior and likelihood are quadratic in the exponent and the posterior remains Gaussian—the algebra of this fact is precisely the Kalman filter.
A second pillar is sufficient statistics. For Gaussian observations the mean and covariance are sufficient: one may throw away the raw data once these two quantities are updated, which is what makes recursive estimators tractable.
Random Variables, Gaussian Processes, and Noise Models
A Gaussian process is a distribution over functions such that any finite set of evaluations is jointly Gaussian. In sensor modelling, a Gaussian process can describe correlated noise in space (e.g., a magnetic disturbance field) or time (e.g., low-frequency vibration on an accelerometer). The covariance kernel \(k(t, t')\) determines the smoothness and correlation length.
White noise has \(k(t,t') = \sigma^{2}\delta(t-t')\): uncorrelated at distinct times. Brown noise (Wiener process) has increments that are Gaussian with variance proportional to elapsed time; integrating white noise yields brown noise, which is why integrating a gyro bias produces an unbounded angular error. Exponentially correlated noise has \(k(t,t') = \sigma^{2}e^{-\lvert t-t'\rvert / \tau}\) for correlation time \(\tau\), a first-order Gauss–Markov process often used to model bias drift. Explicitly augmenting the state vector with a Gauss–Markov bias state is one of the most useful tricks for improving an IMU filter.
Sensor and Observation Models
Let the true state be \(x_{k}\) at discrete time \(k\). The process model describes how the state evolves:
\[ x_{k} = f\!\left(x_{k-1}, u_{k-1}\right) + w_{k-1}, \quad w_{k-1} \sim \mathcal{N}\!\left(0, Q\right), \]with control input \(u_{k-1}\) and process noise \(w_{k-1}\). The observation model relates state to measurement:
\[ z_{k} = h\!\left(x_{k}\right) + v_{k}, \quad v_{k} \sim \mathcal{N}\!\left(0, R\right). \]Linear versions replace \(f\) and \(h\) by matrices \(F\) and \(H\). For a constant-velocity target in one dimension with state \(\left[p, \dot p\right]^{\top}\) and time step \(T\),
\[ F = \begin{bmatrix} 1 & T \\ 0 & 1 \end{bmatrix}, \quad H = \begin{bmatrix} 1 & 0 \end{bmatrix}. \]Process noise covariance for such a model often takes the discrete white-noise acceleration form
\[ Q = \sigma_{a}^{2} \begin{bmatrix} T^{4}/4 & T^{3}/2 \\ T^{3}/2 & T^{2} \end{bmatrix}. \]For nonlinear sensors, \(h\) may be a projection (camera), a nonlinear range-plus-bearing law (radar), or a lookup (magnetometer with soft-iron distortion). Choosing the right \(h\) is often the single most consequential design decision in a fusion system.
Recursive Estimation: Weighted Least Squares and BLUE
Before introducing the Kalman filter it pays to derive its static counterpart. Suppose \(n\) measurements \(z_{i} = H_{i} x + v_{i}\) with independent errors \(v_{i}\) of covariance \(R_{i}\) are available. The Best Linear Unbiased Estimator (BLUE) minimises
\[ J(x) = \sum_{i=1}^{n} \left(z_{i} - H_{i}x\right)^{\top} R_{i}^{-1}\left(z_{i} - H_{i}x\right). \]Stacking measurements into \(z\), \(H\), and block-diagonal \(R\), the closed-form solution is
\[ \hat x = \left(H^{\top}R^{-1}H\right)^{-1} H^{\top} R^{-1} z, \]with error covariance \(P = \left(H^{\top} R^{-1} H\right)^{-1}\). This is the weighted least squares (WLS) estimator: measurements are weighted by the inverse of their noise covariance, so a noisy sensor is trusted less than a clean one.
Two special cases are illuminating. When all sensors see the same scalar \(x\) with variances \(\sigma_{i}^{2}\), the estimate collapses to
\[ \hat x = \frac{\sum_{i} z_{i}/\sigma_{i}^{2}}{\sum_{i} 1/\sigma_{i}^{2}}, \quad \frac{1}{\sigma^{2}} = \sum_{i} \frac{1}{\sigma_{i}^{2}}. \]Precisions add. Secondly, if one sensor has infinite variance (no information), it drops out cleanly, which is the correct fusion behaviour.
WLS can be rewritten recursively: each new measurement updates \(\hat x\) and \(P\) without reprocessing the old data. The recursive form is the bridge to the Kalman filter, which extends it to a dynamic state.
The Kalman Filter
The Kalman filter is the recursive minimum mean-squared-error estimator for a linear Gaussian system. It alternates a prediction step that propagates the state through the dynamics and an update step that folds in a new measurement. With state estimate \(\hat x_{k \mid k}\) and covariance \(P_{k \mid k}\) at time \(k\), the predict equations are
\[ \begin{aligned} \hat x_{k+1 \mid k} &= F\hat x_{k \mid k} + B u_{k}, \\ P_{k+1 \mid k} &= F P_{k \mid k} F^{\top} + Q. \end{aligned} \]On receipt of \(z_{k+1}\) the update step computes the innovation \(y = z_{k+1} - H\hat x_{k+1 \mid k}\), its covariance \(S = H P_{k+1 \mid k} H^{\top} + R\), and the Kalman gain
\[ K = P_{k+1 \mid k} H^{\top} S^{-1}. \]The updated moments are
\[ \begin{aligned} \hat x_{k+1 \mid k+1} &= \hat x_{k+1 \mid k} + K y, \\ P_{k+1 \mid k+1} &= \left(I - K H\right) P_{k+1 \mid k}. \end{aligned} \]For numerical stability, the Joseph form
\[ P_{k+1 \mid k+1} = \left(I - KH\right) P_{k+1 \mid k} \left(I - KH\right)^{\top} + K R K^{\top} \]preserves symmetry and positive-definiteness even under rounding. The Kalman gain has an intuitive reading: when \(R\) is large relative to \(HPH^{\top}\), \(K\) is small and the filter trusts its prediction; when \(R\) is small, \(K\) is large and the measurement dominates.
Tuning Q and R
Setting \(R\) is usually easier: ride the sensor at rest and compute the sample covariance. Setting \(Q\) is harder because it must absorb everything the dynamic model ignores: unmodelled accelerations, slowly varying parameters, approximation errors. Too-small \(Q\) makes the filter overconfident and it diverges from reality; too-large \(Q\) makes it discount its history and effectively become a smoother of the current measurement alone. Chi-square tests on the innovation sequence, cross-validation on held-out runs, and grid search over candidate \((Q, R)\) pairs are all practical techniques.
The Extended Kalman Filter
When \(f\) or \(h\) is nonlinear, the Kalman algebra fails because a nonlinear map of a Gaussian is not Gaussian. The Extended Kalman Filter (EKF) sidesteps this by linearising about the current estimate via first-order Taylor expansion. Jacobians
\[ F_{k} = \left.\frac{\partial f}{\partial x}\right|_{\hat x_{k \mid k}, u_{k}}, \quad H_{k} = \left.\frac{\partial h}{\partial x}\right|_{\hat x_{k+1 \mid k}} \]replace \(F\) and \(H\) in the covariance propagation. The state is still propagated through the true nonlinear \(f\) and \(h\), but the covariance follows the linearised model.
The EKF is the workhorse of applied fusion: GPS/IMU integration, SLAM, spacecraft attitude determination, and target tracking all use it in some form. Its weaknesses are well documented. Linearisation around a biased estimate perpetuates the bias; strongly nonlinear dynamics or bimodal posteriors break the Gaussian assumption; Jacobians must be derived correctly and kept in sync with the models. A common failure is a consistent-looking but wrong filter: covariance shrinks while the true error grows, because linearisation errors are systematically excluded from \(P\).
The Unscented Kalman Filter
The Unscented Kalman Filter (UKF) abandons linearisation in favour of the unscented transform, which captures the mean and covariance of a nonlinear transform using a small deterministic set of sigma points. For an \(n\)-dimensional state, \(2n+1\) sigma points are placed along the principal axes of \(P\):
\[ \chi_{0} = \hat x, \quad \chi_{i} = \hat x \pm \left(\sqrt{\left(n+\lambda\right) P}\right)_{i}, \]where \(\lambda\) controls the spread. Each sigma point is propagated through the true \(f\) or \(h\) and the output mean and covariance are recovered by weighted averages. No Jacobians are required, and the approximation is accurate to third order in Taylor series (versus first for the EKF). The cost is roughly comparable to two EKF steps.
The UKF is often preferred when Jacobians are cumbersome (e.g., quaternion attitude) or when mild nonlinearity causes EKF divergence. It is still a Gaussian filter, however; strongly multimodal posteriors require particle methods.
Particle Filters
When the state space is non-Gaussian, multimodal, or low-dimensional with severe nonlinearity, the particle filter (sequential Monte Carlo) represents the belief by a weighted sample \(\{\left(x^{(i)}, w^{(i)}\right)\}_{i=1}^{N}\). The prediction step propagates each particle through \(f\) with a fresh draw of process noise. The update step reweights particles by the likelihood:
\[ w^{(i)} \propto w^{(i)} \, p\!\left(z_{k} \mid x^{(i)}\right), \]followed by normalisation. Over time, most weights collapse to near-zero—degeneracy—so a resampling step replaces the current set with one drawn with replacement according to the weights, discarding low-weight particles and multiplying high-weight ones.
Key design choices are the proposal distribution (the bootstrap filter uses the process model itself, which is simple but inefficient), the resampling schedule (every step vs. triggered by effective sample size \(N_{\mathrm{eff}} = 1 / \sum_{i} \left(w^{(i)}\right)^{2}\) dropping below a threshold), and the particle count, which must grow exponentially with state dimension to maintain coverage—the curse of dimensionality that limits particle filters to low-dimensional problems or to Rao-Blackwellised structures where only a nonlinear subset is sampled.
Information Filter and Decentralised Fusion
The Kalman filter has a dual formulation in the information form, where the state is described by the information vector \(\mathbf{y} = P^{-1}\hat x\) and the information matrix \(Y = P^{-1}\). The update step becomes additive:
\[ Y_{k \mid k} = Y_{k \mid k-1} + H^{\top} R^{-1} H, \quad \mathbf{y}_{k \mid k} = \mathbf{y}_{k \mid k-1} + H^{\top} R^{-1} z_{k}. \]Each sensor contributes an information matrix \(H_{i}^{\top} R_{i}^{-1} H_{i}\) and information vector \(H_{i}^{\top} R_{i}^{-1} z_{i}\), and these simply sum. This is tremendously convenient for decentralised or multi-sensor fusion: nodes compute local information contributions and a fusion centre adds them, sidestepping the matrix inversions the covariance form requires at update time. The tradeoff is that prediction is harder in information form, and uninitialised states (infinite covariance) are awkward in covariance form but natural in information form (zero information matrix).
Decentralised architectures go further: each node runs its own filter on local sensors, and nodes exchange information-form summaries. Done naively this double-counts shared information—the rumour propagation problem. Channel filters and covariance intersection address this by tracking what each node has already shared, or by conservatively bounding the fused covariance without requiring knowledge of cross-correlations.
Dempster–Shafer Evidential Reasoning
Not all uncertainty is probabilistic. A classifier may report that an object is “either a pedestrian or a cyclist” without committing to a split—a statement about ignorance rather than odds. Dempster–Shafer theory generalises probability by assigning masses not to singleton outcomes but to subsets of a frame of discernment \(\Theta\). A mass function \(m: 2^{\Theta} \to \left[0, 1\right]\) satisfies \(m(\emptyset) = 0\) and \(\sum_{A \subseteq \Theta} m(A) = 1\).
Two derived quantities summarise a mass: the belief \(\mathrm{Bel}(A) = \sum_{B \subseteq A} m(B)\), a lower bound on the support for \(A\), and the plausibility \(\mathrm{Pl}(A) = \sum_{B \cap A \neq \emptyset} m(B)\), an upper bound. A probability measure is the degenerate case where mass lies only on singletons.
Dempster’s rule of combination fuses two mass functions \(m_{1}, m_{2}\):
\[ m_{12}(A) = \frac{1}{1 - K} \sum_{B \cap C = A} m_{1}(B)\, m_{2}(C), \quad K = \sum_{B \cap C = \emptyset} m_{1}(B)\, m_{2}(C). \]\(K\) is the conflict mass; when two bodies of evidence flatly contradict, \(K\) approaches one and the rule becomes numerically unstable and philosophically suspect. Zadeh’s counterexample is the classic critique: two doctors give opposite diagnoses, and Dempster’s rule concludes with high confidence for a disease that both considered unlikely. Variants such as Yager’s rule or the transferable belief model keep conflict on the empty set rather than renormalising it away.
Dempster–Shafer suits decision-level fusion of classifiers whose outputs are sets of hypotheses, particularly when ignorance must be represented explicitly.
Fuzzy Fusion
Fuzzy set theory offers a third framework for uncertainty: membership functions \(\mu_{A}(x) \in \left[0, 1\right]\) indicate the degree to which \(x\) belongs to set \(A\). Fusion of fuzzy variables uses t-norms (generalised AND, e.g., \(\min\) or product) and t-conorms (generalised OR, e.g., \(\max\) or probabilistic sum). Fuzzy inference systems combine rules of the form “IF temperature is high AND pressure is low THEN risk is high” using these operators, producing a fused linguistic variable that can be defuzzified to a crisp decision via centroid or mean-of-maxima. Fuzzy methods are popular in sensor networks where measurements are imprecise rather than random, and in rule-based fusion of classifier outputs.
Classification-Level Fusion
Many fusion tasks are discrete: is this point cloud a car, pedestrian, or bicycle? Several sensors each output a classification, and fusion must yield a single decision. Simple schemes include majority vote (each sensor gets one ballot), weighted vote (ballots weighted by historical accuracy), and soft voting (average posterior probabilities).
Bayesian classification fusion assumes each classifier outputs \(p_{i}\!\left(c \mid z_{i}\right)\) and combines them assuming conditional independence:
\[ p\!\left(c \mid z_{1}, \ldots, z_{n}\right) \propto p(c) \prod_{i=1}^{n} \frac{p_{i}\!\left(c \mid z_{i}\right)}{p(c)}. \]When independence fails—for example, a radar and camera both fooled by the same heavy rain—this becomes overconfident. Stacking (training a meta-classifier on the outputs of base classifiers) or neural-network fusion can learn cross-sensor correlations without an explicit independence assumption.
Artificial neural networks are increasingly used for both feature-level and decision-level fusion. A typical architecture has one branch per sensor that extracts features, a fusion layer (concatenation, attention, or cross-attention between modalities), and a classification head. Training requires large labelled multi-modal datasets. From a fusion perspective the network is a learned \(h^{-1}\): instead of modelling each sensor analytically, one delegates that work to data. The price is opacity: the neural fuser does not expose a covariance nor a principled belief, so safety-critical systems usually embed learned perception inside a probabilistic state estimator rather than letting the network drive end-to-end.
Multi-Target Tracking and Data Association
Until now we assumed a known correspondence between measurements and states. In a multi-target scene the filter receives \(m\) measurements per scan, some originating from targets, some from clutter, with no labels. Data association decides which measurement belongs to which target.
Nearest-neighbour (NN) association assigns to each target the closest measurement by Mahalanobis distance, where
\[ d^{2}\!\left(z, \hat z\right) = \left(z - \hat z\right)^{\top} S^{-1} \left(z - \hat z\right). \]A validation gate \(d^{2} < \gamma\) rejects measurements too far to be plausible; \(\gamma\) is chosen from a chi-squared distribution to achieve the desired gate probability. NN is fast but brittle: two crossing targets confuse it irretrievably.
Global nearest neighbour (GNN) formulates assignment as a combinatorial optimisation: minimise total squared Mahalanobis distance subject to the constraint that each measurement pairs with at most one target. The Hungarian algorithm or the auction algorithm solves this in polynomial time.
Joint probabilistic data association (JPDA) acknowledges that the correct assignment is uncertain and computes the posterior probability of each feasible joint hypothesis, updating each target’s state with a probability-weighted mixture of candidate measurements. It is the filter of choice when targets are close and confusion is routine, at the cost of more computation and a slight bias toward blending nearby targets.
Multiple hypothesis tracking (MHT) defers decisions: every feasible assignment is propagated as a separate hypothesis, pruned when it becomes unlikely relative to competitors. Track scores accumulate over many scans, so MHT can eventually resolve ambiguities that myopic filters cannot. The hypothesis tree grows explosively if unpruned; practical implementations cap tree depth and merge similar hypotheses.
Modern alternatives include random finite set filters (PHD, CPHD) that track the whole target population as a set-valued random variable, avoiding explicit labels.
Outlier Rejection
Even with correct association, individual measurements can be outliers—multipath-tainted GPS fixes, specular LiDAR returns off shiny surfaces, dropped camera frames. The Mahalanobis gate introduced above is the basic line of defence: under Gaussian assumptions, \(d^{2}\) is chi-squared with dimension equal to that of \(z\), so choosing \(\gamma\) as the 99th percentile rejects roughly 1% of good measurements while eliminating most outliers.
More sophisticated schemes include chi-square CUSUM tests on the innovation sequence to detect slowly growing biases, RANSAC for feature-matching pipelines (sample a minimal set, fit a model, count inliers, repeat), and Huber-loss reformulations of the filter update that cap the influence of any one measurement. In EKF/UKF practice, a short moving window of innovations is monitored; when its mean diverges from zero or its covariance from the predicted \(S\), the filter flags a fault and may switch to a fallback sensor set.
| Technique | Detects | Typical use |
|---|---|---|
| Mahalanobis gate | Individual wild points | Every scan, all filters |
| Chi-square CUSUM | Slow bias drift | Long-running missions, INS |
| RANSAC | Mismatches in feature sets | Image alignment, point-cloud registration |
| Huber M-estimator | Heavy-tailed noise | Robust regression inside a filter |
| Interacting multiple model | Regime changes (maneuvering target) | Radar tracking |
Spatial Registration
Sensors seldom share a frame. A LiDAR is bolted to the roof; the GPS antenna is on the trunk; each camera is in a rig of its own. Bringing measurements into a common coordinate frame is spatial registration.
A rigid-body transformation combines a rotation \(R \in SO(3)\) and a translation \(t \in \mathbb{R}^{3}\):
\[ p' = R p + t. \]Representations of \(R\) include \(3 \times 3\) rotation matrices (9 parameters, 6 constraints), Euler angles (3 parameters, but subject to gimbal lock), axis–angle (3 parameters), and quaternions (4 parameters, 1 constraint, no singularities). Quaternions are the industry standard for filters precisely because they avoid the singularities of Euler angles while staying compact.
Hand-eye or sensor-to-body calibration estimates \(\left(R, t\right)\) between two sensor frames. With correspondences \(\{\left(p_{i}, p'_{i}\right)\}\), the closed-form least-squares solution uses the singular value decomposition: centre both point sets, compute \(H = \sum_{i} p_{i} p'^{\top}_{i}\), factor \(H = U \Sigma V^{\top}\), and set \(R = V U^{\top}\) (with a reflection correction if \(\det\left(VU^{\top}\right) < 0\)), followed by \(t = \bar p' - R \bar p\). This is Horn’s method, sometimes called the Kabsch algorithm.
When correspondences are unknown, the Iterative Closest Point (ICP) algorithm alternates between associating each point in one cloud to its nearest neighbour in the other and solving the closed-form rigid alignment, repeating until convergence. Variants—point-to-plane ICP, generalised ICP, colour ICP—use richer error metrics to improve convergence on flat or textured surfaces. ICP is a local method and requires a reasonable initial guess, so it is usually paired with a coarse feature-based alignment (e.g., FPFH descriptors) to seed the iteration.
Temporal Synchronisation
Two measurements at the same instant mean something fundamentally different from two measurements 50 ms apart. Temporal registration ensures that the timestamps attached to each measurement correspond to the same notion of time.
Hardware solutions are the gold standard: a common trigger line or a pulse-per-second (PPS) signal from GPS lets every sensor stamp acquisition with microsecond-level accuracy. In software, the Precision Time Protocol (PTP, IEEE 1588) synchronises networked clocks to sub-millisecond accuracy; NTP does so to tens of milliseconds.
When synchronisation is imperfect, fusion algorithms must compensate. Time propagation: predict each measurement forward or backward to a common reference using the dynamic model and inflate \(R\) to account for the extrapolation. Out-of-sequence measurement (OOSM) algorithms correctly fold in a measurement whose timestamp is earlier than the current filter state, without simply rerunning the whole filter. Bar-Shalom’s backward-prediction formulas give closed-form OOSM updates for linear Gaussian systems.
Synchronisation faults produce strange symptoms: residual biases that vary with speed (because a velocity error convolved with a time offset looks like a position error), or filter divergence in the presence of high-frequency manoeuvres. A disciplined time budget—what clock each sensor uses, what latency its driver adds, how the fusion node aligns them—belongs in every fusion design document.
Case Study: GPS/IMU Integration
The combination of an inertial measurement unit (accelerometers and gyros) with a GPS receiver is the textbook example of fusion. The IMU has high bandwidth (hundreds of Hz) but errors that grow unboundedly through double integration of accelerometer bias and gyro drift. The GPS has bounded absolute error (metres) but low bandwidth (1–10 Hz), vulnerability to dropouts, and poor attitude information.
Fusion schemes fall into three families. Loose coupling runs a GPS filter and an INS in parallel, fusing their position/velocity outputs in a second filter. It is modular and survives the loss of either module, but loses information because correlated errors in the GPS solution are discarded. Tight coupling feeds raw GPS pseudoranges and carrier phases directly into the fusion filter alongside IMU mechanisation; it uses information even when fewer than four satellites are visible, at the cost of tighter coupling to the GPS algorithm. Ultra-tight (deep) coupling closes the loop inside the GPS correlator itself using INS-derived Doppler predictions to aid signal tracking—the last resort in highly dynamic or jamming-prone environments.
The fusion filter is almost always an EKF or UKF on an error-state formulation. The error state tracks deviations of position, velocity, attitude, accelerometer bias, and gyro bias from a nominal trajectory propagated by the INS mechanisation. The error dynamics are well approximated as linear, making the EKF very accurate despite the nonlinear rotation kinematics. GPS measurements enter as linear functions of the position error; magnetometer measurements, if available, correct heading error.
Case Study: LiDAR–Camera Fusion
Autonomous vehicles typically rely on multiple LiDARs, cameras, and a radar suite. LiDAR delivers metric 3D geometry but lacks colour and struggles with dark or absorbing surfaces. Cameras deliver dense colour and texture but no absolute depth. Their fusion is complementary in the JDL sense.
Pipelines range from early to late. In early fusion the LiDAR point cloud is projected into the camera frame and each pixel inherits a depth, producing an RGB-D image fed to a single detector. Registration quality dominates: a 10 cm extrinsic calibration error at 30 m puts points on the wrong object. In mid-level (feature) fusion, a network extracts per-modality features and a cross-attention module mixes them before a shared detection head—this is the approach of many state-of-the-art 3D detectors. In late (decision) fusion, separate detectors produce 3D boxes and a final module associates and combines them using IOU matching and confidence reweighting; late fusion is the safest when one modality fails (a blinded camera, a rained-out LiDAR) because the other pipeline is unaffected.
Calibration combines intrinsic camera calibration (focal length, distortion) from a checkerboard, extrinsic LiDAR–camera calibration from simultaneous observations of a target visible to both sensors, and temporal calibration by correlating motion signals across modalities. Production systems continuously monitor calibration drift from thermal expansion and vibration.
Case Study: SLAM
Simultaneous Localisation and Mapping is the problem of building a map of an unknown environment while simultaneously locating oneself within it. It is a fusion problem par excellence: odometry, wheel encoders, IMU, and exteroceptive sensors (LiDAR, cameras) all contribute to a joint estimate of the robot trajectory and the map landmarks.
The early EKF-SLAM formulation places the robot pose and all landmark positions in a single state vector. The covariance matrix is dense—every landmark is correlated with every other through the shared robot pose—and the filter cost scales as \(O\!\left(n^{2}\right)\) in landmark count. This motivated alternatives.
FastSLAM uses a Rao-Blackwellised particle filter: each particle carries a sampled robot trajectory and a set of small independent EKFs, one per landmark. Graph-based SLAM treats poses and landmarks as nodes in a graph and odometry/observation constraints as edges, then solves a sparse nonlinear least squares (bundle adjustment) at back-end. Modern ORB-SLAM, LIO-SAM, and Cartographer follow the graph paradigm because it decouples front-end data association from back-end optimisation and exploits the sparsity of the constraint graph.
Loop closure is the hardest sub-problem: recognising when the robot has returned to a previously mapped area, producing a constraint that pins down long-term drift. It combines place recognition (bag-of-visual-words, NetVLAD) with geometric verification (RANSAC on matched features, ICP on point clouds).
Performance Metrics and System Design
A fusion estimator must be evaluated quantitatively. Standard metrics include root-mean-square error (RMSE) against a ground-truth source, normalised estimation error squared (NEES) to check filter consistency (its time-average should match the state dimension), and normalised innovation squared (NIS) to check measurement consistency. For tracking, optimal sub-pattern assignment (OSPA) and generalised OSPA (GOSPA) metrics combine localisation error with cardinality error (missed and spurious tracks) into a single figure of merit.
At system design time, key questions are: What sensors, and why each? What sampling rates, and are they synchronised? What coordinate frames, and who owns the calibration? What fusion architecture—centralised, decentralised, federated? What outlier and fault handling? What degradation modes, and what tests confirm graceful failure? Answering these explicitly in a requirements document prevents the most common fusion bug: a working system in the lab that diverges on the first drive because some quiet assumption—stationarity, independence, synchronisation, calibration—was silently violated.
Sources and References
- D. L. Hall and J. Llinas, Handbook of Multisensor Data Fusion (CRC Press).
- H. B. Mitchell, Multi-Sensor Data Fusion: An Introduction (Springer).
- Y. Bar-Shalom, X.-R. Li, and T. Kirubarajan, Estimation with Applications to Tracking and Navigation (Wiley).
- S. Thrun, W. Burgard, and D. Fox, Probabilistic Robotics (MIT Press).
- D. Simon, Optimal State Estimation: Kalman, H-infinity, and Nonlinear Approaches (Wiley).
- H. Durrant-Whyte, Multi-Sensor Data Fusion (lecture notes, University of Sydney).
- MIT OpenCourseWare 16.322, Stochastic Estimation and Control.
- Stanford CS 223A, Introduction to Robotics, and AA 273, State Estimation and Filtering for Aerospace Systems.
- S. Julier and J. Uhlmann, “A New Extension of the Kalman Filter to Nonlinear Systems,” SPIE AeroSense, 1997.
- G. Shafer, A Mathematical Theory of Evidence (Princeton).