ECE 313: Digital Signal Processing

Zhou Wang

Estimated study time: 1 hr 26 min

Table of contents

These notes provide a rigorous, self-contained treatment of digital signal processing at the level of a third- or fourth-year electrical and computer engineering course. The exposition follows the logical arc of the course: from the algebraic foundations of discrete-time signals and the z-transform, through spectral analysis via the DTFT and DFT, to practical filter design and multirate processing.

Sources and References

The following publicly available references underpin the technical content of these notes.

Oppenheim, A. V. & Schafer, R. W. Discrete-Time Signal Processing, 3rd ed. Prentice Hall, 2010. The standard graduate-level reference for the field; Chapters 2–8 cover z-transform, DTFT, DFT, FFT, and filter design in full rigor.
Proakis, J. G. & Manolakis, D. G. Digital Signal Processing: Principles, Algorithms, and Applications, 4th ed. Pearson, 2006. Comprehensive treatment with detailed examples; especially strong on IIR design and adaptive filters.
Smith, J. O. Mathematics of the Discrete Fourier Transform (DFT). CCRMA, Stanford University. Freely available at ccrma.stanford.edu/~jos/mdft/. Detailed mathematical development of the DFT including spectral leakage, windows, and FFT.
Smith, J. O. Introduction to Digital Filters with Audio Applications. CCRMA, Stanford University. Available at ccrma.stanford.edu/~jos/filters/. Covers z-transform, LTI systems, and filter design with worked examples.
McClellan, J. H., Parks, T. W. & Rabiner, L. R. “A computer program for designing optimum FIR linear phase digital filters.” IEEE Trans. Audio Electroacoust., vol. 21, no. 6, pp. 506–526, 1973. The original Parks–McClellan paper.
MIT OpenCourseWare 6.341: Discrete-Time Signal Processing (open access). Lecture notes and problem sets covering the core ECE 313 syllabus, freely available at ocw.mit.edu.
Vaidyanathan, P. P. Multirate Systems and Filter Banks. Prentice Hall, 1993. The authoritative treatment of polyphase decompositions, noble identities, and perfect-reconstruction filter banks.
Haykin, S. Adaptive Filter Theory, 4th ed. Prentice Hall, 2002. Full development of the LMS algorithm and its variants with convergence proofs.

Chapter 1: Discrete-Time Signals and Systems

1.1 The Discrete-Time Signal Model

A discrete-time signal is a function \(x : \mathbb{Z} \to \mathbb{C}\), written \(x[n]\) for \(n \in \mathbb{Z}\). This abstraction arises naturally whenever a continuous-time signal \(x_a(t)\) is sampled uniformly at period \(T_s\), producing the sequence \(x[n] = x_a(nT_s)\). The integer \(n\) is the sample index; the physical time instant is \(t = nT_s\).

Definition 1.1 (Elementary sequences).

Unit impulse: \(\delta[n] = 1\) if \(n = 0\), and \(\delta[n] = 0\) otherwise.
Unit step: \(u[n] = 1\) if \(n \ge 0\), and \(u[n] = 0\) otherwise.
Real sinusoid: \(x[n] = A\cos(\omega_0 n + \phi)\), where \(\omega_0 \in [0, \pi]\) is the discrete-time frequency in radians per sample.
Complex exponential: \(x[n] = e^{j\omega_0 n}\). Because \(e^{j(\omega_0 + 2\pi)n} = e^{j\omega_0 n}\) for all \(n \in \mathbb{Z}\), the complex exponential is periodic in \(\omega_0\) with period \(2\pi\). This periodicity is the fundamental difference from the continuous-time case.

Every signal can be decomposed into a weighted sum of shifted impulses:

\[ x[n] = \sum_{k=-\infty}^{\infty} x[k]\,\delta[n - k]. \]

This sifting representation is not merely a formula — it is the discrete-time analog of the Dirac-delta sifting property and underpins the entire theory of LTI systems.

1.1.1 Signal Energy and Power

The energy of a signal is

\[ E_x = \sum_{n=-\infty}^{\infty} |x[n]|^2. \]

If \(E_x < \infty\) the signal is called an energy signal. For signals with infinite energy the average power

\[ P_x = \lim_{N \to \infty} \frac{1}{2N+1} \sum_{n=-N}^{N} |x[n]|^2 \]

may be finite; such signals are power signals. Periodic signals are archetypal power signals; finite-length signals are archetypal energy signals.

1.2 Linear Time-Invariant Systems

Definition 1.2 (LTI system). A system \(H\) mapping sequences to sequences is linear if it obeys superposition: \[ H\{a\,x_1[n] + b\,x_2[n]\} = a\,H\{x_1[n]\} + b\,H\{x_2[n]\} \]

for all sequences \(x_1, x_2\) and scalars \(a, b \in \mathbb{C}\). It is time-invariant if a shift of the input causes an identical shift of the output:

\[ y[n] = H\{x[n]\} \implies H\{x[n - n_0]\} = y[n - n_0] \]

for every integer \(n_0\).

The decisive structural consequence of linearity and time-invariance is the convolution representation.

Theorem 1.1 (Convolution representation). Let \(h[n] = H\{\delta[n]\}\) denote the impulse response of an LTI system. Then for any input \(x[n]\), \[ y[n] = (x * h)[n] = \sum_{k=-\infty}^{\infty} x[k]\,h[n-k]. \]

Apply the sifting representation \(x[n] = \sum_k x[k]\,\delta[n-k]\) to the input. By linearity, \(H\{x[n]\} = \sum_k x[k]\,H\{\delta[n-k]\}\). By time-invariance, \(H\{\delta[n-k]\} = h[n-k]\). The result follows.

Convolution is commutative, associative, and distributive over addition. The commutativity \((x * h)[n] = (h * x)[n]\) is not obvious from the definition but follows directly by the substitution \(m = n - k\).

1.2.1 Properties of LTI Systems

Definition 1.3.

Causal: \(h[n] = 0\) for all \(n < 0\). Equivalently, the output at time \(n\) depends only on present and past inputs.
BIBO stable: Every bounded input produces a bounded output. The necessary and sufficient condition is absolute summability of the impulse response: \[ \sum_{n=-\infty}^{\infty} |h[n]| < \infty. \]
FIR (Finite Impulse Response): \(h[n] = 0\) outside a finite interval \([0, M]\). Equivalently, the system is described by a finite-length convolution sum.
IIR (Infinite Impulse Response): \(h[n]\) is nonzero for infinitely many \(n\). Typically realized by a linear constant-coefficient difference equation with feedback.

1.2.2 Linear Constant-Coefficient Difference Equations

IIR systems are described by equations of the form

\[ \sum_{k=0}^{N} a_k\,y[n-k] = \sum_{k=0}^{M} b_k\,x[n-k], \]

with \(a_0 = 1\) by convention. The system function (transfer function) is obtained by taking the z-transform of both sides, leading directly to Chapter 2.

Chapter 2: The Z-Transform

2.1 Definition and Region of Convergence

Definition 2.1 (Bilateral z-transform). For a sequence \(x[n]\), the bilateral z-transform is \[ X(z) = \mathcal{Z}\{x[n]\} = \sum_{n=-\infty}^{\infty} x[n]\,z^{-n}, \]

where \(z \in \mathbb{C}\) is a complex variable. The region of convergence (ROC) is the set of \(z\) for which the sum converges absolutely.

The ROC has the form of an annulus \(R_- < |z| < R_+\) in the complex plane. The inner and outer radii are determined by the growth rate of \(x[n]\) for \(n \to -\infty\) and \(n \to +\infty\) respectively. Key geometric facts:

For a right-sided sequence (zero for \(n < N_1\)), the ROC is the exterior of a circle: \(|z| > R_-\).
For a left-sided sequence (zero for \(n > N_2\)), the ROC is the interior of a circle: \(|z| < R_+\).
For a two-sided sequence, the ROC is an annulus (possibly empty).
For a finite-length sequence, the ROC is the entire complex plane except possibly \(z = 0\) or \(z = \infty\).

The ROC is not merely a technicality. Different sequences can share the same algebraic expression for \(X(z)\) but differ in their ROC — and these sequences are fundamentally different. The inverse transform is only well-defined once the ROC is specified. The ROC also encodes stability: an LTI system is BIBO stable if and only if the ROC of \(H(z)\) includes the unit circle \(|z| = 1\).

2.1.1 Common Z-Transform Pairs

The following pairs are derived by direct summation of geometric series.

\[ \delta[n] \;\longleftrightarrow\; 1, \quad \text{ROC: all } z. \]\[ u[n] \;\longleftrightarrow\; \frac{1}{1 - z^{-1}}, \quad |z| > 1. \]\[ a^n u[n] \;\longleftrightarrow\; \frac{1}{1 - az^{-1}}, \quad |z| > |a|. \]\[ -a^n u[-n-1] \;\longleftrightarrow\; \frac{1}{1 - az^{-1}}, \quad |z| < |a|. \]

The last two entries demonstrate that the same rational function corresponds to two different sequences depending on the ROC.

2.2 Properties of the Z-Transform

Theorem 2.1 (Core properties). Let \(x[n] \leftrightarrow X(z)\) and \(y[n] \leftrightarrow Y(z)\).

Linearity: \(ax[n] + by[n] \leftrightarrow aX(z) + bY(z)\).
Time shift: \(x[n - n_0] \leftrightarrow z^{-n_0} X(z)\).
Multiplication by \(a^n\): \(a^n x[n] \leftrightarrow X(z/a)\), ROC scaled by \(|a|\).
Time reversal: \(x[-n] \leftrightarrow X(1/z)\), ROC inverted.
Convolution: \((x * h)[n] \leftrightarrow X(z)\,H(z)\).
Differentiation in z: \(n\,x[n] \leftrightarrow -z\,\frac{dX}{dz}(z)\).

The convolution property is the algebraic engine of LTI analysis: convolution in time becomes multiplication in the z-domain.

2.3 System Function and Poles/Zeros

For an LTI system described by the difference equation \(\sum_k a_k y[n-k] = \sum_k b_k x[n-k]\), the system function is the rational function

\[ H(z) = \frac{\sum_{k=0}^{M} b_k\,z^{-k}}{\sum_{k=0}^{N} a_k\,z^{-k}} = \frac{B(z^{-1})}{A(z^{-1})}. \]

Multiplying numerator and denominator by \(z^{\max(M,N)}\) yields a ratio of polynomials in \(z\). The roots of \(B(z)\) are the zeros of \(H\); the roots of \(A(z)\) are the poles. The pole-zero plot in the \(z\)-plane is the primary tool for understanding the frequency-selective behavior of a filter.

A pole at radius \(r_p\) makes the frequency response large near the angle \(\omega\) of that pole. A zero at radius \(r_z\) makes the frequency response small near its angle. This geometric intuition — evaluate \(|H(e^{j\omega})|\) by the ratio of products of distances from \(e^{j\omega}\) to zeros over distances from \(e^{j\omega}\) to poles — is a powerful design heuristic.

2.4 Inverse Z-Transform

Three methods are standard:

Partial-fraction expansion. Express \(X(z)\) as a sum of simple terms, each corresponding to a known transform pair (with ROC consistent with the stated ROC). For a simple pole at \(z = a\):

\[ \frac{A}{1 - az^{-1}} \longleftrightarrow \begin{cases} A\,a^n\,u[n] & \text{if } |z| > |a|, \\ -A\,a^n\,u[-n-1] & \text{if } |z| < |a|. \end{cases} \]

Long division. Dividing numerator by denominator as power series in \(z^{-1}\) yields the causal sequence \(x[0], x[1], x[2], \ldots\) directly. This is particularly useful for computing the first few terms.

Contour integral. By the theory of complex analysis, \(x[n] = \frac{1}{2\pi j}\oint_C X(z)\,z^{n-1}\,dz\) where \(C\) is any counterclockwise contour inside the ROC. Application of the residue theorem recovers the partial-fraction result.

Chapter 3: The Discrete-Time Fourier Transform

3.1 Definition and Convergence

Definition 3.1 (DTFT). The discrete-time Fourier transform of a sequence \(x[n]\) is \[ X(e^{j\omega}) = \sum_{n=-\infty}^{\infty} x[n]\,e^{-j\omega n}, \quad \omega \in [-\pi, \pi]. \]

The inverse DTFT recovers the sequence:

\[ x[n] = \frac{1}{2\pi} \int_{-\pi}^{\pi} X(e^{j\omega})\,e^{j\omega n}\,d\omega. \]

The DTFT converges if \(x[n]\) is absolutely summable (\(\ell^1\)). In \(\ell^2\), convergence holds in the mean-square sense with the Parseval relation

\[ \sum_{n=-\infty}^{\infty} |x[n]|^2 = \frac{1}{2\pi} \int_{-\pi}^{\pi} |X(e^{j\omega})|^2\,d\omega. \]

The DTFT is the z-transform evaluated on the unit circle: \(X(e^{j\omega}) = X(z)\big|_{z = e^{j\omega}}\). This is valid only when the unit circle lies inside the ROC — exactly the stability condition.

3.1.1 Periodicity and Symmetry

\(X(e^{j\omega})\) is periodic in \(\omega\) with period \(2\pi\). This is the discrete-time counterpart of the discrete (line) spectrum of periodic continuous-time signals, but here the roles are exchanged: a discrete (sampled) time domain gives a continuous, periodic frequency domain.

For real sequences, \(X(e^{-j\omega}) = X^*(e^{j\omega})\) (conjugate symmetry), which implies \(|X(e^{j\omega})|\) is even and \(\angle X(e^{j\omega})\) is odd.

3.2 Properties of the DTFT

The DTFT inherits linearity, time-shift (\(x[n-n_0] \leftrightarrow e^{-j\omega n_0} X(e^{j\omega})\)), frequency-shift (\(e^{j\omega_0 n} x[n] \leftrightarrow X(e^{j(\omega - \omega_0)})\)), and convolution (\((x * h)[n] \leftrightarrow X(e^{j\omega})\,H(e^{j\omega})\)) properties analogous to those of the z-transform.

Two properties deserve emphasis.

Theorem 3.1 (Parseval's theorem). \[ \sum_{n=-\infty}^{\infty} x[n]\,y^*[n] = \frac{1}{2\pi}\int_{-\pi}^{\pi} X(e^{j\omega})\,Y^*(e^{j\omega})\,d\omega. \]

With \(y = x\) this gives the energy identity \(E_x = \frac{1}{2\pi}\int_{-\pi}^{\pi} |X(e^{j\omega})|^2\,d\omega\). The quantity \(S_{xx}(\omega) = |X(e^{j\omega})|^2\) is the energy spectral density.

Theorem 3.2 (Duality / modulation). Multiplication in time by \(e^{j\omega_0 n}\) corresponds to a cyclic shift in frequency. Multiplication by a real cosine \(\cos(\omega_0 n)\) splits the spectrum: \[ \cos(\omega_0 n)\,x[n] \;\longleftrightarrow\; \frac{1}{2}\left[X(e^{j(\omega - \omega_0)}) + X(e^{j(\omega + \omega_0)})\right]. \]

3.3 Frequency Response of LTI Systems

For a stable LTI system with impulse response \(h[n]\), the DTFT \(H(e^{j\omega})\) is the frequency response. The output to the input \(x[n]\) satisfies

\[ Y(e^{j\omega}) = H(e^{j\omega})\,X(e^{j\omega}). \]

The frequency response admits the polar decomposition \(H(e^{j\omega}) = |H(e^{j\omega})|\,e^{j\angle H(e^{j\omega})}\), where \(|H(e^{j\omega})|\) is the magnitude response (gain) and \(\angle H(e^{j\omega})\) is the phase response.

Definition 3.2 (Group delay). The group delay is \[ \tau(\omega) = -\frac{d}{d\omega}\angle H(e^{j\omega}). \]

It measures the delay (in samples) experienced by a narrowband signal centered at \(\omega\). A linear-phase filter has constant group delay, so all frequency components are delayed equally — a desirable property for applications requiring undistorted pulse shapes.

Chapter 4: Sampling Theory and Digital Processing of Analog Signals

4.1 The Sampling Theorem

The bridge between continuous-time and discrete-time processing is uniform sampling. Let \(x_a(t)\) be a continuous-time signal with Fourier transform \(X_a(\Omega)\), and suppose \(x_a\) is bandlimited: \(X_a(\Omega) = 0\) for \(|\Omega| > \Omega_N\).

Theorem 4.1 (Nyquist–Shannon Sampling Theorem). If the sampling frequency satisfies \(\Omega_s = 2\pi/T_s > 2\Omega_N\), then \(x_a(t)\) can be exactly recovered from its samples \(x[n] = x_a(nT_s)\) via \[ x_a(t) = \sum_{n=-\infty}^{\infty} x[n]\,\text{sinc}\!\left(\frac{t - nT_s}{T_s}\right), \]

where \(\text{sinc}(t) = \sin(\pi t)/(\pi t)\).

The condition \(\Omega_s > 2\Omega_N\) is the Nyquist criterion. The frequency \(\Omega_N\) is the Nyquist frequency (not to be confused with the Nyquist rate \(2\Omega_N\)).

The relationship between discrete-time frequency \(\omega\) and analog frequency \(\Omega\) is

\[ \omega = \Omega\,T_s = \frac{2\pi\Omega}{\Omega_s}. \]

The normalized Nyquist frequency \(\omega = \pi\) corresponds to half the sampling rate \(\Omega_s/2\).

4.1.1 Aliasing

When the Nyquist criterion is violated (\(\Omega_s \le 2\Omega_N\)), the copies of \(X_a(\Omega)\) centered at multiples of \(\Omega_s\) overlap, causing aliasing. The aliased spectrum of the sampled sequence is

\[ X(e^{j\omega}) = \frac{1}{T_s}\sum_{k=-\infty}^{\infty} X_a\!\left(\frac{\omega - 2\pi k}{T_s}\right). \]

Aliasing is irreversible: once samples are taken at too low a rate, the information lost cannot be recovered. The remedy is to apply an analog anti-aliasing filter (lowpass with cutoff \(\Omega_s/2\)) before sampling.

4.2 Quantization

Practical analog-to-digital conversion also involves quantization: rounding the sampled amplitude to the nearest representable value. For a uniform \(b\)-bit quantizer with full-scale range \([-X_{\max}, X_{\max}]\), the step size is

\[ \Delta = \frac{2X_{\max}}{2^b}. \]

Under the standard model — quantization error is uniformly distributed on \((-\Delta/2, \Delta/2)\) and uncorrelated with the signal — the quantization noise power is

\[ \sigma_q^2 = \frac{\Delta^2}{12} = \frac{X_{\max}^2}{3 \cdot 2^{2b}}. \]

For a sinusoid with amplitude \(X_{\max}\), the signal power is \(X_{\max}^2/2\), and the signal-to-quantization-noise ratio is

\[ \text{SQNR} = \frac{X_{\max}^2/2}{X_{\max}^2/(3 \cdot 2^{2b})} = \frac{3}{2}\cdot 2^{2b}. \]

In decibels: \(\text{SQNR} \approx 6.02\,b + 1.76\) dB. Each additional bit of resolution adds approximately 6 dB of SQNR — one of the most useful rules of thumb in DSP engineering.

Chapter 5: The Discrete Fourier Transform

5.1 Motivation: From DTFT to DFT

The DTFT \(X(e^{j\omega})\) is a continuous function of frequency and cannot be directly computed. For computational purposes we need a finite, discrete representation. The Discrete Fourier Transform (DFT) is obtained by sampling the DTFT at \(N\) uniformly spaced frequencies on the unit circle.

Definition 5.1 (N-point DFT and IDFT). For a length-\(N\) sequence \(x[n]\), \(n = 0, 1, \ldots, N-1\), the DFT is \[ X[k] = \sum_{n=0}^{N-1} x[n]\,W_N^{kn}, \quad k = 0, 1, \ldots, N-1, \]

where \(W_N = e^{-j2\pi/N}\) is the primitive \(N\)th root of unity. The inverse DFT (IDFT) is

\[ x[n] = \frac{1}{N}\sum_{k=0}^{N-1} X[k]\,W_N^{-kn}, \quad n = 0, 1, \ldots, N-1. \]

The DFT sample at index \(k\) corresponds to the analog frequency \(\Omega_k = k\,\Omega_s/N\) or the discrete-time frequency \(\omega_k = 2\pi k/N\). The frequency resolution is \(\Delta\omega = 2\pi/N\), which in analog terms is \(\Delta\Omega = \Omega_s/N = 1/(NT_s)\). To improve frequency resolution, one either increases \(N\) or decreases \(\Omega_s\) (slower sampling).

5.1.1 Relationship to the DTFT

For a length-\(N\) sequence padded with zeros if necessary, the DFT is a sampled version of the DTFT:

\[ X[k] = X(e^{j\omega})\big|_{\omega = 2\pi k/N}. \]

This is the key interpretive equation: the DFT gives exact DTFT values at the analysis frequencies, but interpolation is required to estimate the DTFT between those frequencies.

5.2 Circular Convolution

The DFT converts convolution into pointwise multiplication — but only circular convolution, not the linear convolution relevant to filtering.

Definition 5.2 (Circular convolution). The \(N\)-point circular convolution of two length-\(N\) sequences \(x[n]\) and \(h[n]\) is \[ y[n] = \sum_{k=0}^{N-1} x[k]\,h\!\left[\langle n - k \rangle_N\right], \quad n = 0, \ldots, N-1, \]

where \(\langle m \rangle_N = m \bmod N\).

Theorem 5.1 (Circular convolution theorem). If \(y[n]\) is the \(N\)-point circular convolution of \(x[n]\) and \(h[n]\), then \[ Y[k] = X[k]\,H[k], \quad k = 0, 1, \ldots, N-1. \]

Circular convolution wraps the output: values that would extend beyond index \(N-1\) are folded back modulo \(N\). This differs from linear convolution unless specific precautions are taken.

5.2.1 Linear Convolution via the DFT: Zero-Padding

To compute the linear convolution of a length-\(L\) sequence \(x[n]\) with a length-\(M\) sequence \(h[n]\) using the DFT, one zero-pads both sequences to length \(N \ge L + M - 1\) before computing DFTs. The inverse DFT of the product \(X[k]\,H[k]\) then yields the linear convolution.

Zero-padding to \(N \ge L + M - 1\) ensures that the circular convolution coincides with linear convolution — the wrapped-around "aliasing" in time disappears because the sequences are short enough that their overlap, when periodically extended, does not interfere. This is the cornerstone of efficient convolution algorithms such as overlap-add and overlap-save, which are essential for real-time FIR filtering of long streams.

Zero-padding also increases the apparent frequency resolution of the DFT display (by inserting more sample points between existing DTFT samples), but it does not improve the fundamental spectral resolution, which is determined by the data length.

5.3 Spectral Leakage and Windowing

5.3.1 The Windowing Phenomenon

In practice, the DFT is applied to a finite-length observation of a signal. If the signal is a pure sinusoid \(x[n] = e^{j\omega_0 n}\) and \(\omega_0\) is not exactly one of the DFT analysis frequencies \(2\pi k/N\), the DFT exhibits spectral leakage: energy from the true frequency “leaks” into neighboring bins.

The mechanism is explicit. Applying an \(N\)-point rectangular window is equivalent to multiplying by \(w[n] = 1\) for \(n = 0, \ldots, N-1\) and zero elsewhere. In the frequency domain, multiplication becomes convolution with the DTFT of the window:

\[ W_R(e^{j\omega}) = \sum_{n=0}^{N-1} e^{-j\omega n} = e^{-j\omega(N-1)/2}\,\frac{\sin(\omega N/2)}{\sin(\omega/2)}. \]

This is a Dirichlet kernel (discrete sinc function) with a main lobe of width approximately \(4\pi/N\) radians and sidelobes that decay slowly (only as \(1/\omega\)). When the signal frequency does not fall on a DFT bin, the Dirichlet kernel is evaluated off its peak, and its sidelobes spread energy into all DFT bins.

5.3.2 Window Functions

The remedy is to replace the rectangular window with a smoothly tapered window that has smaller sidelobes, at the cost of a wider main lobe.

Definition 5.3 (Common window functions). For \(n = 0, 1, \ldots, N-1\):

Rectangular: \(w_R[n] = 1\). Main-lobe width \(\approx 4\pi/N\); peak sidelobe level \(-13\) dB.
Hann (Hanning): \(w_H[n] = 0.5 - 0.5\cos(2\pi n/(N-1))\). Peak sidelobe \(-31\) dB; main-lobe width \(\approx 8\pi/N\).
Hamming: \(w_{Ha}[n] = 0.54 - 0.46\cos(2\pi n/(N-1))\). Peak sidelobe \(-41\) dB; main-lobe width \(\approx 8\pi/N\). The coefficients \(0.54, 0.46\) are chosen to null the first sidelobe.
Blackman: \(w_B[n] = 0.42 - 0.5\cos(2\pi n/(N-1)) + 0.08\cos(4\pi n/(N-1))\). Peak sidelobe \(-57\) dB; main-lobe width \(\approx 12\pi/N\).
Kaiser: \(w_K[n] = I_0(\beta\sqrt{1 - (2n/(N-1) - 1)^2})/I_0(\beta)\), where \(I_0\) is the zeroth-order modified Bessel function. The parameter \(\beta\) controls the sidelobe–main-lobe tradeoff continuously.

The fundamental tradeoff is universal: reducing sidelobe level widens the main lobe and hence reduces the ability to resolve two closely spaced frequencies. There is no window that simultaneously achieves arbitrarily small sidelobes and arbitrarily narrow main lobe — this is a consequence of the uncertainty principle.

Example 5.1. Consider two sinusoids at frequencies \(\omega_1 = 0.4\pi\) and \(\omega_2 = 0.5\pi\), observed over \(N = 64\) samples. The rectangular window resolves them but shows strong leakage from each into the other. The Hann window reduces leakage dramatically but makes the peaks broader. If the two frequencies were separated by less than the main-lobe width \(\approx 8\pi/N = \pi/8 \approx 0.39\) rad/sample of the Hann window, they would not be resolvable.

Chapter 6: The Fast Fourier Transform

6.1 Computational Complexity of the DFT

The direct implementation of the \(N\)-point DFT requires \(N^2\) complex multiplications and \(N(N-1)\) complex additions — \(O(N^2)\) operations. For \(N = 2^{10} = 1024\) this is over a million complex multiplications. The Fast Fourier Transform (FFT) reduces this to \(O(N\log N)\): approximately \(\frac{N}{2}\log_2 N\) complex multiplications for a radix-2 algorithm. The speedup at \(N = 1024\) is by a factor of about 200.

6.2 Radix-2 Decimation-in-Time FFT

The Cooley–Tukey radix-2 decimation-in-time (DIT) FFT exploits the periodicity and symmetry of \(W_N\). For \(N = 2^m\), split the DFT sum into even-indexed and odd-indexed samples:

\[ X[k] = \sum_{n=0}^{N/2-1} x[2n]\,W_N^{k(2n)} + \sum_{n=0}^{N/2-1} x[2n+1]\,W_N^{k(2n+1)}. \]

Since \(W_N^{2kn} = W_{N/2}^{kn}\), each sum is an \(N/2\)-point DFT:

\[ X[k] = G[k] + W_N^k\,H[k], \quad k = 0, \ldots, N-1, \]

where \(G[k] = \text{DFT}_{N/2}\{x[2n]\}\) and \(H[k] = \text{DFT}_{N/2}\{x[2n+1]\}\), both extended periodically to length \(N\) (so \(G[k + N/2] = G[k]\), \(H[k + N/2] = H[k]\)).

Theorem 6.1 (Butterfly structure). Using the symmetry \(W_N^{k + N/2} = -W_N^k\), each of the \(N\) outputs can be computed from the two sub-DFTs as: \[ X[k] = G[k] + W_N^k\,H[k], \qquad X[k + N/2] = G[k] - W_N^k\,H[k], \quad k = 0, \ldots, \frac{N}{2}-1. \]

This two-element computation — one complex multiplication by \(W_N^k\) and two complex additions — is called a butterfly.

Applying the decomposition recursively for \(\log_2 N\) stages yields the complete radix-2 DIT FFT. The total arithmetic count is:

Complex multiplications: \(\frac{N}{2}\log_2 N\) (exactly, when the trivial twiddle factors \(W_N^0 = 1\) are counted).
Complex additions: \(N\log_2 N\).

6.2.1 Bit-Reversal Permutation

The recursion rearranges the input in a specific order: the input \(x[n]\) is placed at position given by bit-reversing the binary representation of \(n\). For \(N = 8\):

Natural order \(n\)	Binary	Bit-reversed	Scrambled index
0	000	000	0
1	001	100	4
2	010	010	2
3	011	110	6
4	100	001	1
5	101	101	5
6	110	011	3
7	111	111	7

The output is in natural order.

6.3 Decimation-in-Frequency FFT

An alternative factorization splits the output indices into even and odd, giving the decimation-in-frequency (DIF) algorithm. For \(N = 2^m\):

\[ X[2k] = \sum_{n=0}^{N/2-1} \left(x[n] + x\!\left[n + \frac{N}{2}\right]\right) W_{N/2}^{kn}, \]\[ X[2k+1] = \sum_{n=0}^{N/2-1} \left(x[n] - x\!\left[n + \frac{N}{2}\right]\right) W_N^n\,W_{N/2}^{kn}. \]

The structure is complementary: the input is in natural order and the output requires bit-reversal permutation. The arithmetic count is identical to DIT.

6.4 Extensions: Prime Factor and Split-Radix FFT

When \(N\) is not a power of two, prime-factor algorithms (Good–Thomas) decompose the DFT using the Chinese Remainder Theorem when \(N = N_1 N_2\) with \(\gcd(N_1, N_2) = 1\). The split-radix FFT (Yavne, 1968) uses radix-2 for the even-indexed output and radix-4 for the odd-indexed output, achieving the lowest known exact multiply count for power-of-two transforms.

Chapter 7: FIR Filter Design

7.1 Design Specifications

A digital filter is specified by its desired frequency response \(H_d(e^{j\omega})\). For a lowpass filter with passband edge \(\omega_p\), stopband edge \(\omega_s\), passband ripple \(\delta_p\), and stopband attenuation \(\delta_s\), the specifications are

\[ 1 - \delta_p \le |H(e^{j\omega})| \le 1 + \delta_p \quad \text{for } |\omega| \le \omega_p, \]\[ |H(e^{j\omega})| \le \delta_s \quad \text{for } \omega_s \le |\omega| \le \pi. \]

The transition band \([\omega_p, \omega_s]\) is a “don’t-care” region. The four parameters \((\omega_p, \omega_s, \delta_p, \delta_s)\) fully specify the design problem.

7.2 Linear-Phase FIR Filters

A length-\((M+1)\) FIR filter has impulse response \(h[n]\) for \(n = 0, \ldots, M\). The system function is a polynomial in \(z^{-1}\) and therefore has no poles inside the unit circle (except at the origin), guaranteeing BIBO stability.

Theorem 7.1 (Linear-phase condition). An FIR filter has exactly linear phase response \(\angle H(e^{j\omega}) = -\omega M/2\) if and only if its impulse response satisfies the symmetry condition \[ h[n] = \pm\,h[M - n] \quad \text{for all } n. \]

Symmetric (\(+\)) filters are called Type I (odd \(M\)) or Type II (even \(M\)); antisymmetric (\(-\)) filters are Type III (odd \(M\)) or Type IV (even \(M\)).

Write the frequency response: \[ H(e^{j\omega}) = \sum_{n=0}^{M} h[n]\,e^{-j\omega n} = e^{-j\omega M/2} \sum_{n=0}^{M} h[n]\,e^{-j\omega(n - M/2)}. \]

Under the symmetry \(h[n] = h[M-n]\), the inner sum satisfies \(\tilde{H}(\omega) = \sum_{n=0}^{M} h[n]\cos(\omega(n - M/2))\), which is a real cosine sum. Therefore \(H(e^{j\omega}) = e^{-j\omega M/2}\tilde{H}(\omega)\) with \(\tilde{H}(\omega)\) real, giving exactly linear phase.

Linear-phase filters introduce constant group delay \(\tau = M/2\) samples at all frequencies — a critical property for audio and communications applications.

7.3 Window Method

The window method is the simplest approach to FIR design.

Step 1. Specify the ideal (infinite-duration) filter. For a lowpass filter with cutoff \(\omega_c\):

\[ h_d[n] = \frac{\omega_c}{\pi}\,\text{sinc}(\omega_c n / \pi) = \frac{\sin(\omega_c n)}{\pi n}, \quad n \ne 0, \quad h_d[0] = \frac{\omega_c}{\pi}. \]

Step 2. Multiply by a window of length \(M+1\):

\[ h[n] = h_d\!\left[n - \frac{M}{2}\right] w[n], \quad n = 0, \ldots, M. \]

The shift by \(M/2\) centers the ideal response to achieve causality while preserving linear phase.

Step 3. The frequency response is the convolution of the ideal response with the window’s spectrum. The window determines the transition-band width and stopband attenuation.

The window method is easy to apply but gives only approximate control over the passband and stopband independently. The Kaiser window with adjustable \(\beta\) offers better control: approximate formulas relate \(\beta\) and the minimum filter length \(M\) to the desired attenuation \(A_s = -20\log_{10}(\delta_s)\) (in dB): \[ \beta = \begin{cases} 0 & A_s \le 21, \\ 0.5842(A_s - 21)^{0.4} + 0.07886(A_s - 21) & 21 < A_s \le 50, \\ 0.1102(A_s - 8.7) & A_s > 50, \end{cases} \]\[ M \approx \frac{A_s - 7.95}{2.285(\omega_s - \omega_p)}. \]

7.4 Frequency Sampling Method

In the frequency-sampling method, the desired frequency response is specified at the \(N\) DFT frequencies \(H_d[k] = H_d(e^{j2\pi k/N})\), \(k = 0, \ldots, N-1\). The filter impulse response is the IDFT:

\[ h[n] = \frac{1}{N}\sum_{k=0}^{N-1} H_d[k]\,e^{j2\pi kn/N}. \]

The resulting filter matches the specification exactly at the sampled frequencies. The behavior between samples depends on interpolation by the DFT basis functions. By optimizing the value of \(H_d[k]\) in the transition band (allowing them to vary freely), one can sometimes achieve better stopband attenuation than the window method.

7.5 Optimal Equiripple Design: Parks–McClellan Algorithm

The Parks–McClellan algorithm finds the FIR filter of given order \(M\) that minimizes the maximum weighted approximation error in the passband and stopband — a minimax optimization problem.

7.5.1 Chebyshev Approximation Theory

For a Type I linear-phase FIR filter of odd length \(M+1\) (even order \(M\)), the frequency response is

\[ H(e^{j\omega}) = e^{-j\omega M/2} P(\omega), \]

where

\[ P(\omega) = \sum_{k=0}^{M/2} a_k\cos(k\omega) \]

is a real-valued cosine polynomial. The design problem becomes: find the coefficients \(a_k\) that minimize

\[ \max_{\omega \in F} W(\omega)\left|P(\omega) - D(\omega)\right|, \]

where \(F = [0, \omega_p] \cup [\omega_s, \pi]\) is the set of constrained frequencies, \(D(\omega)\) is the desired response (1 in the passband, 0 in the stopband), and \(W(\omega)\) is a positive weight function.

Theorem 7.2 (Alternation / Equiripple theorem — Chebyshev). The optimal solution \(P^*(\omega)\) is characterized by having at least \(M/2 + 2\) alternations: there exist at least \(M/2 + 2\) frequency points \(\omega_1 < \omega_2 < \cdots\) in \(F\) where the weighted error alternates in sign and achieves its maximum magnitude: \[ W(\omega_i)\left[P^*(\omega_i) - D(\omega_i)\right] = (-1)^i\,\delta, \quad i = 1, 2, \ldots, M/2 + 2. \]

This theorem guarantees that the optimal filter has equal ripple height in the passband and stopband (after weighting), which is why the algorithm produces equiripple filters.

7.5.2 The Remez Exchange Algorithm

The Parks–McClellan algorithm implements the Remez exchange iteration to find the optimal alternation points:

Initialize a trial set of \(M/2 + 2\) alternation frequencies.
Compute the optimal ripple \(\delta\) and the polynomial \(P(\omega)\) that interpolates the alternation condition at the trial frequencies (using the barycentric form of Lagrange interpolation, which is numerically stable).
Find the actual extrema of \(W(\omega)[P(\omega) - D(\omega)]\) over \(F\).
Replace the trial set with the new extremal frequencies.
Repeat until convergence (typically 5–10 iterations).

The algorithm converges to the unique optimal solution. Compared to the window method, Parks–McClellan filters achieve lower order for given specifications, at the cost of more computational design effort (a one-time cost).

Chapter 8: IIR Filter Design

8.1 Analog Prototype Filters

IIR digital filters are typically designed by transforming a classical analog prototype. The analog prototype is a continuous-time lowpass filter with system function \(H_a(s)\), where \(s = \sigma + j\Omega\) is the Laplace variable.

8.1.1 Butterworth Filters

Definition 8.1 (Butterworth filter). The Nth-order Butterworth filter has magnitude-squared response \[ |H_a(j\Omega)|^2 = \frac{1}{1 + (\Omega/\Omega_c)^{2N}}, \]

where \(\Omega_c\) is the 3 dB cutoff frequency.

Key properties:

The magnitude response is maximally flat at \(\Omega = 0\): all derivatives of \(|H_a(j\Omega)|^2\) up to order \(2N-1\) vanish at \(\Omega = 0\).
The response is monotone in both passband and stopband.
The poles lie on a circle of radius \(\Omega_c\) in the left half of the \(s\)-plane, at angles \(\pi/2 + (2k-1)\pi/(2N)\) for \(k = 1, \ldots, 2N\); the causal stable system takes the \(N\) left-half-plane poles.

The filter order required to meet specifications \(|\Omega_p|, |\Omega_s|, \delta_p, \delta_s\) is

\[ N \ge \frac{\log\!\left[\sqrt{(1-\delta_p)^{-2} - 1}\,/\,\sqrt{(1/\delta_s)^{-2} - 1}\right]}{\log(\Omega_s/\Omega_p)}. \]

(Round up to the next integer.)

8.1.2 Chebyshev Type I Filters

The Type I Chebyshev filter has equiripple behavior in the passband and monotone rolloff in the stopband:

\[ |H_a(j\Omega)|^2 = \frac{1}{1 + \varepsilon^2\,T_N^2(\Omega/\Omega_p)}, \]

where \(T_N(\cdot)\) is the \(N\)th Chebyshev polynomial and \(\varepsilon^2 = (1-\delta_p)^{-2} - 1\). Chebyshev polynomials satisfy the recurrence

\[ T_0(x) = 1, \quad T_1(x) = x, \quad T_N(x) = 2x\,T_{N-1}(x) - T_{N-2}(x), \]

and oscillate between \(-1\) and \(1\) for \(|x| \le 1\), growing as \(\cosh(N\cosh^{-1}(x))\) for \(x > 1\). For the same specifications, a Type I Chebyshev filter requires lower order than a Butterworth.

8.1.3 Chebyshev Type II and Elliptic Filters

The Type II Chebyshev filter is equiripple in the stopband and monotone in the passband (complementary to Type I). The elliptic (Cauer) filter achieves equiripple in both passband and stopband, giving the lowest possible order for given specifications — but at the cost of a more complex pole-zero structure involving Jacobi elliptic functions. For a given filter order, the hierarchy of sharpness is:

\[ \text{Butterworth} < \text{Chebyshev I or II} < \text{Elliptic}. \]

8.2 Analog-to-Digital Transformations

8.2.1 Impulse Invariance

The impulse-invariance method constructs a digital filter whose impulse response is a sampled version of the analog prototype’s impulse response:

\[ h[n] = T_s\,h_a(nT_s). \]

For an analog prototype with partial-fraction expansion

\[ H_a(s) = \sum_{k=1}^{N} \frac{A_k}{s - s_k}, \]

the impulse-invariance mapping gives

\[ H(z) = \sum_{k=1}^{N} \frac{A_k}{1 - e^{s_k T_s}\,z^{-1}}. \]

Each pole at \(s_k\) in the \(s\)-plane maps to a pole at \(e^{s_k T_s}\) in the \(z\)-plane.

Impulse invariance preserves the time-domain shape of the impulse response but suffers from aliasing in the frequency domain: the digital filter's frequency response is the aliased sum of the analog prototype's frequency response. For a lowpass prototype that decays rapidly above the passband (like Butterworth or Chebyshev), the aliasing is small. However, the method is inappropriate for highpass or bandstop designs, where aliasing creates severe distortion. The frequency relationship is also linear: \(\omega = \Omega T_s\), so there is no warping of the frequency axis.

8.2.2 Bilinear Transform

The bilinear transform is the standard method for IIR design, avoiding aliasing entirely. It replaces the Laplace variable \(s\) by

\[ s = \frac{2}{T_s}\,\frac{1 - z^{-1}}{1 + z^{-1}}. \]

Theorem 8.1 (Bilinear mapping properties).

The left half of the \(s\)-plane (\(\text{Re}(s) < 0\)) maps to the interior of the unit disk in the \(z\)-plane (\(|z| < 1\)).
The imaginary axis \(s = j\Omega\) maps to the unit circle \(z = e^{j\omega}\).
The mapping between analog and digital frequencies is the nonlinear (warping) relation: \[ \Omega = \frac{2}{T_s}\tan\!\left(\frac{\omega}{2}\right). \]

Property 2 guarantees no aliasing: the entire analog frequency axis \((-\infty, \infty)\) maps bijectively onto one period \((-\pi, \pi]\) of the digital frequency axis. Property 3 is the price: the frequency axis is nonlinearly compressed (“warped”). High analog frequencies are mapped to digital frequencies near \(\pm\pi\).

8.2.3 Pre-Warping

To compensate for warping, the analog prototype’s critical frequencies (passband and stopband edges) are pre-warped before the prototype is designed. If the digital filter must have a passband edge at \(\omega_p\), the analog prototype is designed with edge

\[ \Omega_p = \frac{2}{T_s}\tan\!\left(\frac{\omega_p}{2}\right). \]

After designing \(H_a(s)\) with this analog specification, applying the bilinear substitution \(s \to \frac{2}{T_s}\frac{1-z^{-1}}{1+z^{-1}}\) gives a digital filter \(H(z)\) with the digital passband edge exactly at \(\omega_p\).

Example 8.1 (First-order lowpass). Design a first-order Butterworth lowpass filter with \(-3\) dB at \(\omega_p = 0.3\pi\) using the bilinear transform with \(T_s = 1\).

Pre-warp: \(\Omega_p = 2\tan(0.15\pi) \approx 2 \times 0.5095 = 1.019\) rad/s.

The first-order analog prototype with cutoff \(\Omega_p\) is \(H_a(s) = \Omega_p / (s + \Omega_p)\).

Apply bilinear substitution \(s = 2(1 - z^{-1})/(1 + z^{-1})\):

\[ H(z) = \frac{\Omega_p}{\frac{2(1-z^{-1})}{1+z^{-1}} + \Omega_p} = \frac{\Omega_p(1 + z^{-1})}{(2 + \Omega_p) + (\Omega_p - 2)z^{-1}}. \]

With \(\Omega_p \approx 1.019\): numerator \(\approx 1.019(1 + z^{-1})\), denominator \(\approx 3.019 + (-0.981)z^{-1}\).

Normalizing: \(H(z) = \frac{0.3376(1 + z^{-1})}{1 - 0.3249\,z^{-1}}\). The \(-3\) dB point is exactly at \(\omega = 0.3\pi\).

Chapter 9: Multirate Signal Processing

9.1 Downsampling (Decimation)

Definition 9.1 (Downsampling by factor \(M\)). The downsampler (by integer factor \(M \ge 2\)) takes an input sequence \(x[n]\) and produces the output \[ y[n] = x[Mn]. \]

That is, one in every \(M\) samples is retained and the rest are discarded.

In the frequency domain, downsampling causes the spectrum to be aliased. If \(X(e^{j\omega})\) is the DTFT of \(x[n]\), the DTFT of \(y[n] = x[Mn]\) is

\[ Y(e^{j\omega}) = \frac{1}{M}\sum_{k=0}^{M-1} X\!\left(e^{j(\omega - 2\pi k)/M}\right). \]

This is the sum of \(M\) shifted and scaled copies of the original spectrum, now compressed to fit in \([0, 2\pi]\). Aliasing occurs if \(x[n]\) has spectral content above \(\pi/M\). The remedy is to apply a lowpass anti-aliasing filter with cutoff \(\pi/M\) before downsampling; the combination is called a decimator.

9.2 Upsampling (Expansion)

Definition 9.2 (Upsampling by factor \(L\)). The upsampler (by integer factor \(L \ge 2\)) inserts \(L-1\) zeros between each pair of input samples: \[ x_e[n] = \begin{cases} x[n/L] & n = 0, \pm L, \pm 2L, \ldots, \\ 0 & \text{otherwise.} \end{cases} \]

The DTFT of \(x_e[n]\) is

\[ X_e(e^{j\omega}) = X(e^{jL\omega}). \]

The spectrum is stretched by \(L\) in frequency, creating \(L-1\) images of the baseband spectrum at multiples of \(2\pi/L\). Removing these images requires a lowpass filter with cutoff \(\pi/L\), applied after upsampling; the combination is an interpolator.

9.3 Noble Identities

The noble identities allow moving filters through downsamplers and upsamplers, enabling more efficient filter implementations.

Theorem 9.1 (Noble identities).

Downsampling noble identity: Filtering by \(H(z)\) followed by downsampling by \(M\) equals downsampling by \(M\) followed by filtering by \(H(z^M)\): \[ \text{LPF}\{H(z)\} \to \downarrow M \quad \equiv \quad \downarrow M \to \text{LPF}\{H(z^M)\}. \]
Upsampling noble identity: Upsampling by \(L\) followed by filtering by \(H(z)\) equals filtering by \(H(z^L)\) followed by upsampling by \(L\): \[ \uparrow L \to H(z) \quad \equiv \quad H(z^L) \to \uparrow L. \]

The noble identities are the algebraic foundation for the polyphase representation, which allows computationally efficient implementation of decimators and interpolators.

9.4 Polyphase Decomposition

Any filter \(H(z)\) can be decomposed into \(M\) polyphase components:

\[ H(z) = \sum_{k=0}^{M-1} z^{-k}\,E_k(z^M), \]

where the \(k\)th polyphase component is

\[ E_k(z) = \sum_{n=0}^{\infty} h[nM + k]\,z^{-n}. \]

Each \(E_k(z)\) is a decimated version of the prototype impulse response, shifted by \(k\) samples.

9.4.1 Efficient Decimator

For a decimator (anti-aliasing LPF followed by downsampling by \(M\)), direct implementation requires \(N\) multiplications per input sample (where \(N\) is the filter length). Using the polyphase decomposition and the noble identity, the operations can be reordered: downsample first, then apply the polyphase filters \(E_k(z)\) operating at the output rate \(f_s/M\). The computational cost is now \(N\) multiplications per output sample — a factor of \(M\) reduction.

9.4.2 Efficient Interpolator

Symmetrically, for an interpolator (upsampling by \(L\) followed by LPF), the polyphase decomposition combined with the noble identity yields implementation at the input rate, again saving a factor of \(L\) in computation.

These savings are not trivial engineering details — they make the difference between feasible and infeasible real-time implementation. A decimator by \(M = 256\) using direct convolution at the input rate would require 256 times more computation than the polyphase implementation. In practice, all serious multirate DSP software and hardware uses polyphase structures.

9.5 Filter Banks

A filter bank is a collection of bandpass filters \(H_k(e^{j\omega})\), \(k = 0, \ldots, K-1\), that partition the frequency band \([0, \pi]\) into \(K\) subbands. In an analysis filter bank, the signal is filtered and downsampled in each subband, producing subband signals at reduced rates. A synthesis filter bank upsamples and filters the subband signals to reconstruct the original.

The key design objective for many applications (audio coding, communications) is perfect reconstruction: the reconstructed signal equals the original (possibly with a delay). The theory of perfect-reconstruction filter banks connects to the theory of wavelets.

9.5.1 Two-Channel QMF Filter Bank

The simplest filter bank uses two channels with a lowpass filter \(H_0(z)\) and a highpass filter \(H_1(z) = H_0(-z)\) (quadrature mirror filter), each followed by downsampling by 2. Under the condition that the aliasing cancels in the synthesis bank and the distortion function \(T(z) = H_0(z)F_0(z) + H_1(z)F_1(z)/2\) satisfies \(T(z) = cz^{-n_0}\), perfect reconstruction is achieved (up to a delay of \(n_0\) samples and a gain of \(c\)).

Chapter 10: Short-Time Fourier Transform and Spectrogram

10.1 Limitation of the DTFT for Nonstationary Signals

The DTFT reveals the global frequency content of a signal but gives no information about when particular frequencies occur. Many real-world signals — speech, music, radar returns, vibration data — are nonstationary: their spectral content changes over time. The Short-Time Fourier Transform (STFT) addresses this by analyzing the signal in successive short segments.

10.2 The STFT

Definition 10.1 (STFT). The Short-Time Fourier Transform of a discrete-time signal \(x[n]\) with analysis window \(w[n]\) is \[ X[m, \omega] = \sum_{n=-\infty}^{\infty} x[n]\,w[n - mR]\,e^{-j\omega n}, \]

where \(m\) is the frame index and \(R\) is the hop size (number of samples between successive frames).

At each frame \(m\), the STFT multiplies the signal by a shifted copy of the window centered at \(n = mR\) and computes the DTFT of the windowed segment. In practice, the DTFT is evaluated at the DFT frequencies by using an \(N\)-point DFT, giving the discrete-time, discrete-frequency STFT

\[ X[m, k] = \sum_{n=0}^{N-1} x[n + mR]\,w[n]\,e^{-j2\pi kn/N}. \]

10.2.1 Time-Frequency Resolution Tradeoff

The STFT is subject to a fundamental time-frequency uncertainty: a short window gives good time resolution but poor frequency resolution; a long window gives good frequency resolution but poor time resolution. Formally, if \(\Delta t\) denotes the time width of the window and \(\Delta\omega\) denotes the bandwidth of its Fourier transform, then

\[ \Delta t \cdot \Delta\omega \ge \frac{1}{2}, \]

with equality achieved by Gaussian windows. This is the Heisenberg–Gabor uncertainty principle for discrete-time signals. The practical consequence is that there is no single “ideal” window length — the choice depends on the signal characteristics.

10.3 The Spectrogram

Definition 10.2 (Spectrogram). The spectrogram is the squared magnitude of the STFT: \[ S[m, k] = |X[m, k]|^2. \]

It represents the signal’s energy in the time-frequency plane as a function of frame \(m\) and frequency bin \(k\).

The spectrogram is the standard tool for visualizing speech (showing formants, voiced/unvoiced regions, fricatives), music (showing harmonics and their time evolution), and machinery vibration signals. A variant, the log spectrogram \(10\log_{10}(S[m,k] + \epsilon)\) (in dB), compresses the dynamic range for visual clarity.

10.3.1 Overlap and the OLA Reconstruction

When the hop size \(R < N\), consecutive frames overlap. The signal can be reconstructed from its STFT by the overlap-add (OLA) method, provided the window satisfies the Constant Overlap-Add (COLA) constraint:

\[ \sum_{m=-\infty}^{\infty} w[n - mR] = 1 \quad \text{for all } n. \]

The Hann window with hop size \(R = N/2\) and the rectangular window with \(R = N\) satisfy COLA.

Chapter 11: Adaptive Filters and the LMS Algorithm

11.1 The Adaptive Filtering Problem

In many applications the desired filter characteristics are not known in advance or change over time: channel equalization in communications (the channel changes with the environment), noise cancellation in audio (the noise path varies), echo cancellation in telephony. Adaptive filters adjust their coefficients automatically based on an error signal, without requiring prior knowledge of signal statistics.

Definition 11.1 (Adaptive FIR filter). Let \(\mathbf{w}[n] = [w_0[n], w_1[n], \ldots, w_{M-1}[n]]^T\) be the time-varying coefficient vector of an \(M\)-tap FIR filter, and let \(\mathbf{x}[n] = [x[n], x[n-1], \ldots, x[n-M+1]]^T\) be the input vector. The filter output is \[ y[n] = \mathbf{w}^T[n]\,\mathbf{x}[n]. \]

Given a desired (reference) signal \(d[n]\), the error is \(e[n] = d[n] - y[n]\). The adaptive filter seeks to minimize some measure of \(|e[n]|^2\) by updating \(\mathbf{w}[n]\).

11.2 The Wiener Filter: Optimal Solution

The theoretically optimal (in the minimum mean-square-error sense) filter coefficients for a stationary environment are given by the Wiener–Hopf equation:

\[ \mathbf{R}_{xx}\,\mathbf{w}^* = \mathbf{r}_{dx}, \]

where \(\mathbf{R}_{xx} = E[\mathbf{x}[n]\,\mathbf{x}^T[n]]\) is the input autocorrelation matrix and \(\mathbf{r}_{dx} = E[d[n]\,\mathbf{x}[n]]\) is the cross-correlation vector. The Wiener solution is

\[ \mathbf{w}^* = \mathbf{R}_{xx}^{-1}\,\mathbf{r}_{dx}. \]

In a nonstationary environment, \(\mathbf{R}_{xx}\) and \(\mathbf{r}_{dx}\) change over time; moreover, computing \(\mathbf{R}_{xx}^{-1}\) requires \(O(M^3)\) operations. The LMS algorithm circumvents both problems.

11.3 The Least Mean Squares (LMS) Algorithm

The LMS algorithm is a stochastic gradient descent algorithm for minimizing \(E[|e[n]|^2]\). Instead of using the true gradient of the mean-square error, it uses an instantaneous estimate: \(\widehat{\nabla}_{\mathbf{w}} |e[n]|^2 = -2e[n]\,\mathbf{x}[n]\).

Definition 11.2 (LMS update rule). Starting from an initial estimate \(\mathbf{w}[0]\) (typically zero), update at each step: \[ y[n] = \mathbf{w}^T[n]\,\mathbf{x}[n], \]\[ e[n] = d[n] - y[n], \]\[ \mathbf{w}[n+1] = \mathbf{w}[n] + \mu\,e[n]\,\mathbf{x}[n], \]

where \(\mu > 0\) is the step size (also called the adaptation constant or learning rate).

The LMS update requires only \(O(M)\) operations per sample — a dramatic reduction from the \(O(M^3)\) Wiener solution.

11.3.1 Convergence and Stability

Theorem 11.1 (LMS convergence conditions). For a stationary environment with input autocorrelation matrix \(\mathbf{R}_{xx}\) having eigenvalues \(\lambda_1 \ge \lambda_2 \ge \cdots \ge \lambda_M > 0\), the LMS algorithm converges in the mean to the Wiener solution if and only if \[ 0 < \mu < \frac{2}{\lambda_{\max}} = \frac{2}{\lambda_1}. \]

The mean-square error converges (in the mean-square sense) under the more restrictive condition

\[ 0 < \mu < \frac{1}{\text{tr}(\mathbf{R}_{xx})} = \frac{1}{M\,P_x}, \]

where \(P_x = E[|x[n]|^2]\) is the input power. The convergence time constant for the mode corresponding to \(\lambda_k\) is approximately \(\tau_k \approx 1/(2\mu\lambda_k)\) iterations.

The step size \(\mu\) governs the fundamental tradeoff between convergence speed and steady-state excess mean-square error (misadjustment). A large \(\mu\) adapts quickly but settles to a larger steady-state error (the filter "chases" noise in the gradient). A small \(\mu\) reduces steady-state error but converges slowly. The normalized LMS (NLMS) algorithm \(\mu[n] = \tilde{\mu} / (\|\mathbf{x}[n]\|^2 + \epsilon)\) normalizes by the instantaneous input power, making the effective step size independent of the input level and improving robustness.

11.3.2 Applications of the LMS Algorithm

Echo cancellation. In a telephone system, the near-end speaker’s voice leaks into the microphone of the far-end speaker through a room acoustic path \(h[n]\). The adaptive filter models this path: \(\mathbf{x}[n]\) is the far-end speech (which is available), \(d[n]\) is the microphone signal (true speech plus echo), and \(e[n]\) is the echo-cancelled output. The LMS algorithm continuously tracks changes in the room acoustics.

Noise cancellation. A primary microphone captures the desired signal plus additive noise. A reference microphone captures noise only (the reference). The adaptive filter models the transfer function from reference to primary noise, and \(e[n]\) is the noise-cancelled output.

Channel equalization. In digital communications, a transmitted sequence \(x[n]\) passes through a dispersive channel with impulse response \(h[n]\), producing \(r[n] = (x * h)[n] + v[n]\) where \(v[n]\) is additive noise. The equalizer filter \(\mathbf{w}[n]\) attempts to invert \(h\): during a training phase, \(d[n]\) is a known sequence; after training, the decision device provides \(d[n]\) as a “tentative decision.”

Chapter 12: The Discrete Hilbert Transform and Analytic Signals

12.1 Analytic Signals

In continuous time, the analytic signal associated with a real signal \(x_a(t)\) is the complex-valued signal \(x_+(t) = x_a(t) + j\,\hat{x}_a(t)\), where \(\hat{x}_a(t)\) is the Hilbert transform of \(x_a(t)\) (convolution with \(1/(\pi t)\)). The analytic signal has a one-sided spectrum: \(X_+(j\Omega) = 0\) for \(\Omega < 0\).

The discrete-time counterpart constructs a sequence \(x_+[n]\) with a one-sided DTFT: \(X_+(e^{j\omega}) = 0\) for \(\omega \in (-\pi, 0)\).

12.2 The Discrete Hilbert Transform

Definition 12.1 (Discrete Hilbert transform). The discrete Hilbert transform of a real sequence \(x[n]\) is the sequence \(\hat{x}[n]\) defined by convolution with the ideal Hilbert transformer impulse response \[ h_{HT}[n] = \begin{cases} \frac{2}{\pi n}\sin^2\!\left(\frac{\pi n}{2}\right) & n \ne 0, \\ 0 & n = 0, \end{cases} \]

or equivalently in the frequency domain

\[ H_{HT}(e^{j\omega}) = \begin{cases} -j & 0 < \omega < \pi, \\ j & -\pi < \omega < 0, \\ 0 & \omega = 0, \pm\pi. \end{cases} \]

The discrete Hilbert transformer is an ideal allpass filter with \(\pm 90^\circ\) phase shift. It is noncausal and must be approximated by a linear-phase FIR filter with appropriate delay. The approximation quality depends on the transition-band width near \(\omega = 0\) and \(\omega = \pi\), where the ideal response has jump discontinuities.

12.2.1 Single-Sideband Modulation

The analytic signal is used in single-sideband (SSB) amplitude modulation. The lower sideband of a real bandpass signal centered at \(\omega_c\) is

\[ x_{LSB}[n] = x[n]\cos(\omega_c n) + \hat{x}[n]\sin(\omega_c n), \]

and the upper sideband is obtained by replacing \(+\) with \(-\). SSB reduces the transmission bandwidth by a factor of two compared to double-sideband AM, at the cost of requiring a Hilbert transformer.

Chapter 13: Introduction to Filter Banks and Wavelets

13.1 From Filter Banks to Wavelets

The wavelet transform can be understood as a dyadic filter bank: a cascade of two-channel QMF filter banks operating at successively halved sampling rates. At each stage, the lowpass branch is further split; the highpass branches are the wavelet subband signals.

For \(J\) stages of decomposition, the output consists of \(J\) highpass subband signals (each representing detail at a different scale) and one final lowpass subband (the coarse approximation). The scaling function \(\phi(t)\) and wavelet function \(\psi(t)\) of the continuous-time wavelet transform correspond, respectively, to the synthesis lowpass and highpass filters via the two-scale equations:

\[ \phi(t) = \sqrt{2}\sum_k h_0[k]\,\phi(2t - k), \qquad \psi(t) = \sqrt{2}\sum_k h_1[k]\,\phi(2t - k). \]

The discrete wavelet transform (DWT) thus inherits all the polyphase efficiency of the filter bank formulation.

13.2 Perfect Reconstruction Conditions

For a two-channel filter bank with analysis filters \(H_0(z), H_1(z)\) and synthesis filters \(F_0(z), F_1(z)\), perfect reconstruction (up to a delay) requires:

Aliasing cancellation:

\[ H_0(-z)\,F_0(z) + H_1(-z)\,F_1(z) = 0. \]

Distortion-free condition:

\[ H_0(z)\,F_0(z) + H_1(z)\,F_1(z) = 2z^{-n_0}. \]

A standard choice satisfying both conditions is \(F_0(z) = H_1(-z)\) and \(F_1(z) = -H_0(-z)\) (Johnston’s QMF). For FIR filters, the requirement of exact linear phase and perfect reconstruction leads to biorthogonal wavelet bases (e.g., the CDF 9/7 wavelet used in JPEG 2000), where analysis and synthesis filters are different but dual to each other.

Natural order \(n\)	Binary	Bit-reversed	Scrambled index
0	000	000	0
1	001	100	4
2	010	010	2
3	011	110	6
4	100	001	1
5	101	101	5
6	110	011	3
7	111	111	7

Natural order \(n\)	Binary	Bit-reversed	Scrambled index
0	000	000	0
1	001	100	4
2	010	010	2
3	011	110	6
4	100	001	1
5	101	101	5
6	110	011	3
7	111	111	7

Natural order \(n\)	Binary	Bit-reversed	Scrambled index
0	000	000	0
1	001	100	4
2	010	010	2
3	011	110	6
4	100	001	1
5	101	101	5
6	110	011	3
7	111	111	7