ECE 318: Communication Systems

Oussama Damen

Estimated study time: 1 hr 40 min

Table of contents

These notes synthesize the full topic schedule of ECE 318 (Winter 2026) into a standalone mathematical treatment of analog and digital communication systems. The primary textbook is J.G. Proakis and M. Salehi, Fundamentals of Communication Systems (2nd ed., Prentice Hall, 2014). Additional references are cited in the Sources section.

Sources and References

The following publicly available references are recommended for further study:

J.G. Proakis and M. Salehi, Fundamentals of Communication Systems, 2nd ed., Prentice Hall, 2014. (Primary textbook for ECE 318.)
S. Haykin and M. Moher, Communication Systems, 5th ed., Wiley, 2009. (Comprehensive treatment of analog and digital communications; excellent on FM noise analysis and PLL theory.)
MIT OpenCourseWare, 6.450 Principles of Digital Communications (open access). URL: ocw.mit.edu. (Covers digital communications from a rigorous information-theoretic perspective; lecture notes by Robert Gallager are freely available.)
D. Tse and P. Viswanath, Fundamentals of Wireless Communication, Cambridge University Press, 2005. (Open PDF available at web.stanford.edu/~dntse/wireless_book.html.) Rigorous treatment of fading channels, OFDM, and MIMO; chapters on AWGN capacity and multiuser theory are relevant.
T.M. Cover and J.A. Thomas, Elements of Information Theory, 2nd ed., Wiley, 2006. (Standard reference for entropy, mutual information, source coding, and channel capacity.)
R.G. Gallager, Information Theory and Reliable Communication, Wiley, 1968. (Classic, rigorous treatment; convolutional codes and Viterbi algorithm are covered in depth.)

Chapter 1: Signals, Spectra, and Noise

1.1 Signal Classification and Energy/Power

A communication system begins with a source signal \( m(t) \) that must be transmitted reliably across a physical channel. Before we can design or analyze any modulation or coding scheme, we need a precise vocabulary for classifying signals and quantifying their spectral content.

Energy and Power Signals. A signal \( x(t) \) is an energy signal if its total energy is finite: \[ E_x = \int_{-\infty}^{\infty} |x(t)|^2 \, dt < \infty. \]

It is a power signal if its average power is finite and nonzero:

\[ P_x = \lim_{T \to \infty} \frac{1}{T} \int_{-T/2}^{T/2} |x(t)|^2 \, dt, \quad 0 < P_x < \infty. \]

No signal can simultaneously be both an energy signal and a power signal (other than the trivial zero signal). Periodic signals and random noise are power signals; finite-duration pulses are energy signals.

The distinction matters practically: energy signals admit a Fourier transform \( X(f) \) through the standard Lebesgue integral, while power signals require a limiting or distributional treatment, and their spectral content is captured through the power spectral density (PSD) rather than the Fourier transform directly.

1.2 The Fourier Transform

Fourier Transform Pair. For an energy signal \( x(t) \), define \[ X(f) = \int_{-\infty}^{\infty} x(t) \, e^{-j 2\pi f t} \, dt, \]

and the inverse transform

\[ x(t) = \int_{-\infty}^{\infty} X(f) \, e^{j 2\pi f t} \, df. \]

We write \( x(t) \leftrightarrow X(f) \) to denote this transform pair.

1.2.1 Key Fourier Transform Properties

The Fourier transform is a linear, continuous bijection on the space \( L^2(\mathbb{R}) \) of square-integrable functions. The following properties are used repeatedly throughout this course.

Linearity. If \( x(t) \leftrightarrow X(f) \) and \( y(t) \leftrightarrow Y(f) \), then

\[ \alpha x(t) + \beta y(t) \leftrightarrow \alpha X(f) + \beta Y(f). \]

Time Shifting. A delay by \( t_0 \) seconds corresponds to a linear phase shift in frequency:

\[ x(t - t_0) \leftrightarrow X(f) \, e^{-j 2\pi f t_0}. \]

Frequency Shifting (Modulation). Multiplication by a complex exponential shifts the spectrum:

\[ x(t) \, e^{j 2\pi f_c t} \leftrightarrow X(f - f_c). \]

This is the mathematical basis of all modulation: the message spectrum \( X(f) \) is translated to be centered around the carrier frequency \( f_c \).

Time Scaling. For real \( a \neq 0 \):

\[ x(at) \leftrightarrow \frac{1}{|a|} X\!\left(\frac{f}{a}\right). \]

Compressing time by \( a > 1 \) expands the bandwidth by the same factor — a fundamental trade-off that recurs in pulse shaping.

Duality. If \( x(t) \leftrightarrow X(f) \), then \( X(t) \leftrightarrow x(-f) \).

Convolution Theorem. Convolution in time corresponds to multiplication in frequency, and vice versa:

\[ (x * y)(t) = \int_{-\infty}^{\infty} x(\tau) y(t-\tau) \, d\tau \leftrightarrow X(f) Y(f), \]\[ x(t) y(t) \leftrightarrow (X * Y)(f). \]

The convolution theorem is the cornerstone of linear systems theory: the output of a linear time-invariant (LTI) system with impulse response \( h(t) \) is \( y(t) = (x * h)(t) \), so \( Y(f) = X(f) H(f) \).

Parseval’s Theorem. The inner product and energy are preserved:

\[ \int_{-\infty}^{\infty} x(t) y^*(t) \, dt = \int_{-\infty}^{\infty} X(f) Y^*(f) \, df. \]

Setting \( y = x \):

\[ E_x = \int_{-\infty}^{\infty} |x(t)|^2 \, dt = \int_{-\infty}^{\infty} |X(f)|^2 \, df. \]

The integrand \( |X(f)|^2 \) is called the energy spectral density of \( x(t) \).

1.2.2 Important Fourier Transform Pairs

Rectangular pulse. The pulse \( \Pi(t/T) \) of unit height and duration \( T \) centered at the origin transforms to \[ T \, \mathrm{sinc}(fT), \quad \text{where } \mathrm{sinc}(u) = \frac{\sin(\pi u)}{\pi u}. \]

The bandwidth of this pulse (distance to first zero) is \( 1/T \): shorter pulses require wider bandwidth. This sets the stage for the Nyquist sampling theorem.

Gaussian pulse. \( e^{-\pi t^2} \leftrightarrow e^{-\pi f^2} \). The Gaussian is its own Fourier transform and has no sidelobes — making it attractive for pulse shaping, though it has infinite time support.

Dirac delta. \( \delta(t) \leftrightarrow 1 \) and \( 1 \leftrightarrow \delta(f) \). By the modulation property, \( e^{j2\pi f_c t} \leftrightarrow \delta(f - f_c) \), and therefore \[ \cos(2\pi f_c t) \leftrightarrow \frac{1}{2}[\delta(f - f_c) + \delta(f + f_c)]. \]

A pure sinusoid concentrates all its power at a single frequency — its PSD is a pair of Dirac impulses.

1.3 Linear Time-Invariant Systems

A system \( \mathcal{H} \) is linear if it satisfies superposition and time-invariant if a time shift in the input produces the same time shift in the output. Every LTI system is completely characterized in the time domain by its impulse response \( h(t) = \mathcal{H}\{\delta(t)\} \) and in the frequency domain by its transfer function \( H(f) = \mathcal{F}\{h(t)\} \).

Ideal Bandpass Filter. An ideal bandpass filter (BPF) centered at \( f_c \) with bandwidth \( B \) has transfer function \[ H(f) = \begin{cases} 1 & |f - f_c| \leq B/2 \text{ or } |f + f_c| \leq B/2 \\ 0 & \text{otherwise.} \end{cases} \]

It passes all spectral content within \( B \) Hz of \( \pm f_c \) and rejects everything else.

The bandwidth of a signal is an important design parameter. Several definitions coexist: the 3-dB bandwidth (where \( |H(f)|^2 \) falls to half its peak), the null-to-null bandwidth, and the 99% power bandwidth. In this course we use whichever is most natural for the context.

1.4 Power Spectral Density

For power signals — including periodic signals and wide-sense stationary (WSS) random processes — the Fourier transform does not converge in the classical sense. We instead characterize spectral content through the power spectral density.

Power Spectral Density (PSD) — Deterministic. For a power signal \( x(t) \), define a truncated version \( x_T(t) = x(t) \cdot \Pi(t/T) \). The PSD is \[ S_x(f) = \lim_{T \to \infty} \frac{|X_T(f)|^2}{T}, \]

where \( X_T(f) \) is the Fourier transform of \( x_T(t) \). The average power is

\[ P_x = \int_{-\infty}^{\infty} S_x(f) \, df. \]

Autocorrelation Function. For a power signal, the time-averaged autocorrelation is \[ R_x(\tau) = \lim_{T \to \infty} \frac{1}{T} \int_{-T/2}^{T/2} x(t+\tau) x^*(t) \, dt. \]

The Wiener–Khinchin theorem states that the PSD and the autocorrelation function are a Fourier transform pair:

\[ S_x(f) = \int_{-\infty}^{\infty} R_x(\tau) \, e^{-j 2\pi f \tau} \, d\tau. \]

At lag zero, \( R_x(0) = P_x \), the total average power. For a periodic signal with period \( T_0 \), the PSD consists of discrete spectral lines at multiples of \( f_0 = 1/T_0 \).

1.4.1 PSD Through an LTI System

If a power signal \( x(t) \) with PSD \( S_x(f) \) is passed through an LTI system with transfer function \( H(f) \), the output PSD is

\[ S_y(f) = |H(f)|^2 S_x(f). \]

This result follows from the convolution theorem and the definition of the PSD. It is used extensively to analyze the effect of filters on noise and on modulated signals.

1.5 Thermal Noise and the AWGN Model

Physical resistors and electronic devices at temperature \( T \) (in Kelvin) generate thermal noise due to random motion of charge carriers. The available noise power in a bandwidth \( B \) is

\[ P_{\text{noise}} = k_B T B, \]

where \( k_B = 1.381 \times 10^{-23} \) J/K is Boltzmann’s constant. The key insight is that this available power is independent of the resistance value — it depends only on temperature and bandwidth.

White Noise Model. Thermal noise is modeled as a zero-mean, wide-sense stationary Gaussian random process \( n(t) \) with flat (white) PSD: \[ S_n(f) = \frac{N_0}{2}, \quad -\infty < f < \infty. \]

The two-sided PSD is \( N_0/2 \), where \( N_0 = k_B T_e \) and \( T_e \) is the effective noise temperature of the receiver. At room temperature (\( T \approx 290 \) K), \( N_0 \approx 4 \times 10^{-21} \) W/Hz.

The autocorrelation of white noise is \( R_n(\tau) = (N_0/2)\delta(\tau) \), reflecting that samples taken at different times are uncorrelated. White noise has infinite total power, so the model is an idealization valid over any finite bandwidth.

When white noise is passed through a bandpass filter of bandwidth \( B \), the resulting bandlimited white noise or bandpass noise has PSD

\[ S_{n_\text{BP}}(f) = \begin{cases} N_0/2 & |f \pm f_c| \leq B/2 \\ 0 & \text{otherwise,} \end{cases} \]

and total power \( N_0 B \).

Chapter 2: Amplitude Modulation

2.1 Motivation and System Model

Amplitude modulation (AM) is the oldest and most intuitive form of analog modulation. The message signal \( m(t) \) — assumed bandlimited to \( W \) Hz — cannot be transmitted directly over a radio channel: antennas of practical size resonate at frequencies far above audio. We therefore translate \( m(t) \) to a carrier frequency \( f_c \gg W \) by modulating the amplitude of a sinusoidal carrier.

The general AM signal takes the form

\[ s(t) = A_c [1 + k_a m(t)] \cos(2\pi f_c t), \]

where \( A_c \) is the carrier amplitude, \( k_a \) is the amplitude sensitivity (in V\(^{-1}\) if \( m(t) \) is in volts), and \( f_c \) is the carrier frequency. Different choices of \( k_a \) and whether we suppress the carrier term lead to the family of AM schemes.

2.2 Double-Sideband Suppressed-Carrier (DSB-SC)

In DSB-SC, the carrier is not transmitted. The modulated signal is simply

\[ s(t) = A_c m(t) \cos(2\pi f_c t). \]

Taking the Fourier transform and using the modulation property:

\[ S(f) = \frac{A_c}{2}\bigl[M(f - f_c) + M(f + f_c)\bigr]. \]

The message spectrum \( M(f) \), occupying \( [-W, W] \), is translated to form two sidebands: the upper sideband (USB) above \( f_c \) and the lower sideband (LSB) below \( f_c \). The transmission bandwidth is \( B_T = 2W \).

2.2.1 Coherent Detection of DSB-SC

Coherent (Synchronous) Detector. A coherent detector multiplies the received signal by a locally generated carrier \( \cos(2\pi f_c t) \) and low-pass filters the result. With received signal \( r(t) = s(t) + n(t) \):

Multiplying the noiseless component by the local carrier:

\[ s(t) \cos(2\pi f_c t) = A_c m(t) \cos^2(2\pi f_c t) = \frac{A_c}{2} m(t) + \frac{A_c}{2} m(t) \cos(4\pi f_c t). \]

The low-pass filter removes the double-frequency term, yielding \( (A_c/2) m(t) \). This requires that the local oscillator be phase-synchronized to the transmitted carrier — a challenging requirement in practice, motivating phase-locked loops (PLLs).

2.3 Conventional AM (DSB-LC)

In conventional AM (double-sideband large-carrier), the carrier is transmitted along with the modulated sidebands:

\[ s(t) = A_c [1 + k_a m(t)] \cos(2\pi f_c t). \]

Expanding:

\[ s(t) = A_c \cos(2\pi f_c t) + A_c k_a m(t) \cos(2\pi f_c t). \]

The first term is the pure carrier; the second term is the DSB-SC signal. In the frequency domain:

\[ S(f) = \frac{A_c}{2}[\delta(f-f_c) + \delta(f+f_c)] + \frac{A_c k_a}{2}[M(f-f_c) + M(f+f_c)]. \]

Modulation Index. For a sinusoidal message \( m(t) = A_m \cos(2\pi f_m t) \), the modulation index is \( \mu = k_a A_m \). For distortion-free envelope detection, we require \( \mu \leq 1 \), i.e., \( 1 + k_a m(t) \geq 0 \) at all times. If \( \mu > 1 \), the carrier is over-modulated and the envelope detector fails.

2.3.1 Envelope Detection

The great advantage of conventional AM is that demodulation can be accomplished without a synchronized local carrier:

Envelope Detector. An envelope detector consists of a diode, a capacitor \( C \), and a resistor \( R \). The RC time constant must satisfy: \[ \frac{1}{f_c} \ll RC \ll \frac{1}{W}. \]

With \( \mu \leq 1 \), the envelope \( A(t) = A_c[1 + k_a m(t)] \) is always nonnegative, and the detector tracks it faithfully. A DC block then removes the carrier-amplitude offset \( A_c \), leaving \( A_c k_a m(t) \).

The simplicity of the envelope detector explains the global adoption of conventional AM in broadcast radio.

2.3.2 Power Efficiency of Conventional AM

For a sinusoidal message, the total transmitted power is

\[ P_s = \frac{A_c^2}{2}\left(1 + \frac{\mu^2}{2}\right). \]

The useful signal power (in the sidebands) is \( A_c^2 \mu^2 / 4 \), while the carrier wastes \( A_c^2 / 2 \). The power efficiency is

\[ \eta = \frac{\mu^2/2}{1 + \mu^2/2}. \]

At \( \mu = 1 \), \( \eta = 1/3 \): one-third of the transmitted power carries information. DSB-SC achieves \( \eta = 1 \) by eliminating the carrier, at the cost of requiring coherent detection.

2.4 Single-Sideband Modulation (SSB)

DSB wastes bandwidth because both sidebands carry the same information (the upper sideband is the mirror image of the lower sideband for a real message). SSB transmits only one sideband, halving the bandwidth to \( B_T = W \).

2.4.1 SSB Signal Representation

The analytic signal (pre-envelope) of \( m(t) \) is

\[ m_+(t) = m(t) + j\hat{m}(t), \]

where \( \hat{m}(t) \) is the Hilbert transform of \( m(t) \):

\[ \hat{m}(t) = \frac{1}{\pi} \int_{-\infty}^{\infty} \frac{m(\tau)}{t - \tau} \, d\tau = m(t) * \frac{1}{\pi t}. \]

In the frequency domain, \( \hat{M}(f) = -j \, \mathrm{sgn}(f) \cdot M(f) \).

The USB-SSB signal is

\[ s_{\text{USB}}(t) = \frac{A_c}{2}\bigl[m(t)\cos(2\pi f_c t) - \hat{m}(t)\sin(2\pi f_c t)\bigr], \]

and the LSB-SSB signal is

\[ s_{\text{LSB}}(t) = \frac{A_c}{2}\bigl[m(t)\cos(2\pi f_c t) + \hat{m}(t)\sin(2\pi f_c t)\bigr]. \]

2.4.2 Generation and Detection of SSB

SSB can be generated by the phase-shift method (implementing the Hilbert transform using an all-pass network with 90° phase difference over the message band) or by the filter method (DSB followed by a sharp sideband filter). Coherent detection recovers the message:

\[ r(t) \cdot \cos(2\pi f_c t) \xrightarrow{\text{LPF}} \frac{A_c}{4} m(t). \]

SSB is widely used in voice communications (e.g., shortwave radio, telephone frequency-division multiplex systems) where the bandwidth savings justify the added complexity.

2.5 Vestigial Sideband Modulation (VSB)

VSB is a compromise between DSB and SSB. A vestige (partial copy) of one sideband is retained, using a VSB filter \( H_{\text{VSB}}(f) \) that has a gradual roll-off near \( f_c \). The design requirement is

\[ H_{\text{VSB}}(f - f_c) + H_{\text{VSB}}(f + f_c) = \text{constant}, \quad |f| \leq W, \]

which ensures that coherent detection yields an undistorted message. VSB is used in analog television (NTSC/PAL) and in ATSC digital TV.

2.6 Quadrature Amplitude Multiplexing (QAM)

Two independent DSB-SC signals can be multiplexed on the same carrier using orthogonal carriers:

\[ s(t) = A_c m_1(t) \cos(2\pi f_c t) - A_c m_2(t) \sin(2\pi f_c t). \]

Coherent demodulation with \( \cos(2\pi f_c t) \) and \( \sin(2\pi f_c t) \) recovers \( m_1(t) \) and \( m_2(t) \) independently (after low-pass filtering), provided perfect phase synchronization is maintained. This doubles spectral efficiency and foreshadows the IQ (in-phase/quadrature) architecture of modern digital transceivers.

2.7 Superheterodyne Receiver

Almost all commercial radio receivers use the superheterodyne architecture to achieve high selectivity and sensitivity across a wide frequency range.

Superheterodyne Architecture. The received RF signal is first filtered by a broad bandpass filter (RF filter), then multiplied by a local oscillator (LO) at frequency \( f_{\text{LO}} \), converting the signal to a fixed intermediate frequency (IF): \[ f_{\text{IF}} = |f_{\text{RF}} - f_{\text{LO}}|. \]

The IF stage contains a sharp, high-gain bandpass filter centered at \( f_{\text{IF}} \), followed by an AM or FM detector.

The key advantage is that the selective IF filter has a fixed center frequency and can be optimized for both high Q (selectivity) and low noise figure. The LO is tunable, so a single IF filter serves the entire reception band.

The image frequency \( f_{\text{image}} = f_{\text{LO}} + f_{\text{IF}} \) (if \( f_{\text{LO}} = f_{\text{RF}} - f_{\text{IF}} \)) also mixes down to \( f_{\text{IF}} \) and must be rejected by the RF pre-selector filter before the mixer.

Chapter 3: Angle Modulation

3.1 Phase and Frequency Modulation

In angle modulation, the instantaneous phase \( \theta_i(t) \) of the carrier is varied in proportion to (a function of) the message, while the amplitude remains constant.

Instantaneous Frequency and Phase. For a carrier \( s(t) = A_c \cos\theta_i(t) \), the instantaneous frequency is \[ f_i(t) = \frac{1}{2\pi} \frac{d\theta_i}{dt}. \]

In phase modulation (PM), the instantaneous phase is proportional to the message:

\[ \theta_i(t) = 2\pi f_c t + k_p m(t), \]

where \( k_p \) is the phase sensitivity (rad/V). In frequency modulation (FM), the instantaneous frequency is proportional to the message:

\[ f_i(t) = f_c + k_f m(t), \]

so the phase is the integral of the instantaneous frequency:

\[ \theta_i(t) = 2\pi f_c t + 2\pi k_f \int_{-\infty}^{t} m(\tau) \, d\tau. \]

The FM signal is

\[ s(t) = A_c \cos\!\left(2\pi f_c t + 2\pi k_f \int_{-\infty}^{t} m(\tau) \, d\tau\right). \]

Unlike AM, the instantaneous amplitude is always \( A_c \), so FM is immune to amplitude distortions and interference. The price is a generally wider bandwidth.

3.2 Sinusoidal FM: Bessel Function Analysis

For the message \( m(t) = A_m \cos(2\pi f_m t) \), the FM signal becomes

\[ s(t) = A_c \cos\bigl(2\pi f_c t + \beta \sin(2\pi f_m t)\bigr), \]

where the modulation index is

\[ \beta = \frac{k_f A_m}{f_m} = \frac{\Delta f}{f_m}. \]

Here \( \Delta f = k_f A_m \) is the peak frequency deviation.

Fourier Series of Sinusoidal FM. Using the Jacobi–Anger expansion, the sinusoidal FM signal has the Fourier series \[ s(t) = A_c \sum_{n=-\infty}^{\infty} J_n(\beta) \cos\bigl(2\pi (f_c + n f_m) t\bigr), \]

where \( J_n(\beta) \) is the \( n \)-th order Bessel function of the first kind, satisfying the recurrence

\[ J_{n-1}(x) + J_{n+1}(x) = \frac{2n}{x} J_n(x). \]

The FM spectrum consists of discrete lines at \( f_c \pm n f_m \) for all integers \( n \), with amplitudes given by \( J_n(\beta) \).

The total power in the FM signal is \( A_c^2 / 2 \), regardless of \( \beta \), since \( \sum_{n=-\infty}^{\infty} J_n^2(\beta) = 1 \).

3.3 Carson’s Rule and FM Bandwidth

The FM spectrum theoretically has infinite extent, but in practice the sidebands beyond a certain order are negligible.

Carson's Rule. The 98% power bandwidth of an FM signal with peak deviation \( \Delta f \) and message bandwidth \( W \) is approximately \[ B_T \approx 2(\Delta f + W) = 2W(1 + \beta), \]

where \( \beta = \Delta f / W \) is the modulation index evaluated at the message bandwidth.

Narrowband FM (\( \beta \ll 1 \)) has \( B_T \approx 2W \), similar to AM. Wideband FM (\( \beta \gg 1 \)) has \( B_T \approx 2\Delta f \), much wider than AM but with superior noise performance as we show in Chapter 5. Commercial FM radio uses \( \Delta f = 75 \) kHz and \( W = 15 \) kHz, giving \( \beta = 5 \) and \( B_T = 180 \) kHz. The channel spacing of 200 kHz accommodates this bandwidth with guard bands.

3.4 FM Generation and Detection

Indirect FM (Armstrong Method). A narrowband FM signal is first generated by a phase modulator operating at low deviation, then the deviation is multiplied up using frequency multipliers. This method achieves excellent frequency stability when the initial carrier is derived from a crystal oscillator.

Direct FM. The carrier frequency is varied directly using a voltage-controlled oscillator (VCO), whose output frequency is \( f_i(t) = f_c + k_f m(t) \).

FM Demodulation. The two main demodulation methods are:

Discriminator (slope detector). An FM-to-AM converter (slope detector or balanced discriminator) followed by an envelope detector. The discriminator has a transfer function \( H(f) \propto f \) near \( f_c \).
Phase-locked loop (PLL). The PLL tracks the instantaneous phase of the received FM signal; the VCO control voltage is proportional to the instantaneous frequency deviation, hence to the message.

Chapter 4: Random Processes and Noise Analysis

4.1 Wide-Sense Stationary Random Processes

A random process \( X(t) \) is a family of random variables indexed by time. For communication systems, we primarily work with the second-order statistical description.

Wide-Sense Stationarity (WSS). A random process \( X(t) \) is wide-sense stationary if:

Its mean is constant: \( \mathbb{E}[X(t)] = \mu_X \) for all \( t \).
Its autocorrelation function depends only on the lag \( \tau = t_2 - t_1 \): \[ R_X(t_1, t_2) = \mathbb{E}[X(t_1)X(t_2)] = R_X(\tau). \]

The PSD of a WSS process \( X(t) \) is the Fourier transform of its autocorrelation:

\[ S_X(f) = \int_{-\infty}^{\infty} R_X(\tau) e^{-j2\pi f\tau} \, d\tau. \]

The PSD satisfies \( S_X(f) \geq 0 \) for all \( f \), and \( R_X(0) = \mathbb{E}[X^2(t)] = \int_{-\infty}^\infty S_X(f) \, df = P_X \).

4.1.1 Response of LTI Systems to WSS Inputs

If \( X(t) \) is WSS with PSD \( S_X(f) \), and \( h(t) \) is the impulse response of an LTI system, then the output \( Y(t) = (X * h)(t) \) is also WSS with:

\[ R_Y(\tau) = R_X(\tau) * h(\tau) * h(-\tau), \qquad S_Y(f) = S_X(f) |H(f)|^2. \]

The cross-correlation between input and output is \( R_{XY}(\tau) = R_X(\tau) * h(-\tau) \).

4.2 Gaussian Random Processes

A random process \( X(t) \) is Gaussian if any finite collection of samples \( (X(t_1), \ldots, X(t_n)) \) has a multivariate Gaussian joint PDF. The Gaussian process is completely specified by its mean function and autocorrelation function. Thermal noise is modeled as Gaussian because it arises from the superposition of a vast number of independent microscopic contributions (central limit theorem).

Gaussian Process Through LTI System. If \( X(t) \) is a Gaussian process and \( h(t) \) is a stable LTI filter, then \( Y(t) = (X*h)(t) \) is also a Gaussian process.

This is the key property that makes Gaussian noise tractable: the output of any linear receiver is Gaussian, and error probability calculations reduce to \( Q \)-function evaluations.

4.3 Bandpass Noise Representation

Bandpass noise \( n(t) \) with PSD symmetric around \( \pm f_c \) can be written in the form

\[ n(t) = n_I(t)\cos(2\pi f_c t) - n_Q(t)\sin(2\pi f_c t), \]

where \( n_I(t) \) and \( n_Q(t) \) are the in-phase and quadrature components, respectively.

Properties of Bandpass Noise Components. If \( n(t) \) is zero-mean, WSS, with PSD \( S_n(f) \) bandlimited to \( [f_c - B/2, f_c + B/2] \), then:

\( n_I(t) \) and \( n_Q(t) \) are both zero-mean, WSS, lowpass processes with bandwidth \( B/2 \).
They have the same PSD: \( S_{n_I}(f) = S_{n_Q}(f) = [S_n(f-f_c) + S_n(f+f_c)] \cdot \Pi(f/B) \).
For white bandpass noise with \( S_n(f) = N_0/2 \), the lowpass PSD is \( S_{n_I}(f) = S_{n_Q}(f) = N_0 \), \( |f| \leq B/2 \).
If \( n(t) \) is Gaussian, \( n_I(t) \) and \( n_Q(t) \) are jointly Gaussian. At any given time \( t \), \( n_I(t) \) and \( n_Q(t) \) are uncorrelated, hence independent.

The total noise power is \( N_0 B \), equally split between in-phase and quadrature: \( \mathbb{E}[n_I^2] = \mathbb{E}[n_Q^2] = N_0 B / 2... \) more precisely, if the noise bandwidth is \( B \), each component has power \( N_0 B/2 \).

Chapter 5: Noise in Analog Communication Systems

5.1 Figure of Merit: Signal-to-Noise Ratio

The quality of an analog communication system is measured by the output signal-to-noise ratio (SNR). We define:

\[ \text{SNR}_o = \frac{\text{message signal power at output}}{\text{noise power at output}}. \]

To make fair comparisons between modulation schemes, we fix the received signal power \( S_T = A_c^2/2 \) and the channel noise PSD \( N_0/2 \). The baseline reference is the baseband SNR:

\[ \text{SNR}_{\text{base}} = \frac{S_T}{N_0 W}, \]

where \( W \) is the message bandwidth. This is the SNR achievable if the message were transmitted directly in baseband.

5.2 SNR Analysis for DSB-SC

The received signal is

\[ r(t) = A_c m(t) \cos(2\pi f_c t) + n(t), \]

where \( n(t) \) is bandpass white noise. The coherent detector multiplies by \( \cos(2\pi f_c t) \) and low-pass filters.

The output message power (from the useful signal term) is:

\[ S_o^{\text{DSB}} = \frac{A_c^2}{4} P_m, \]

where \( P_m = \mathbb{E}[m^2(t)] \) is the message power. The noise at the multiplier output that passes the LPF has power (using the bandpass noise decomposition):

\[ N_o^{\text{DSB}} = \frac{N_0 W}{2}. \]

SNR of DSB-SC. \[ \text{SNR}_o^{\text{DSB-SC}} = \frac{A_c^2 P_m / 4}{N_0 W / 2} = \frac{A_c^2 P_m}{2 N_0 W}. \]

Since the transmitted power is \( S_T = A_c^2 P_m / 2 \):

\[ \text{SNR}_o^{\text{DSB-SC}} = \frac{S_T}{N_0 W} = \text{SNR}_{\text{base}}. \]

DSB-SC achieves the same SNR as baseband transmission, despite using twice the bandwidth.

5.3 SNR Analysis for Conventional AM

With the envelope detector (assuming no overmodulation, \( \mu \leq 1 \)), the output SNR is

\[ \text{SNR}_o^{\text{AM}} = \frac{A_c^2 k_a^2 P_m / 2}{N_0 W}. \]

In terms of total transmitted power \( S_T = (A_c^2/2)(1 + k_a^2 P_m) \):

\[ \text{SNR}_o^{\text{AM}} = \frac{k_a^2 P_m}{1 + k_a^2 P_m} \cdot \frac{S_T}{N_0 W} = \eta \cdot \text{SNR}_{\text{base}}, \]

where \( \eta < 1 \) is the power efficiency. Conventional AM always performs worse than DSB-SC because power is wasted in the carrier.

5.4 SNR Analysis for FM

For FM with message bandwidth \( W \) and modulation index \( \beta = \Delta f / W \), the noise analysis requires care because the FM discriminator is not linear in the noise. The key result from the small-noise (above-threshold) analysis is:

FM Output SNR. For a discriminator-based FM receiver, the output SNR (for a sinusoidal message and above threshold) is \[ \text{SNR}_o^{\text{FM}} = \frac{3}{2} \beta^2 (\beta + 1) \cdot \text{SNR}_{\text{base}} = \frac{3\beta^2(\beta+1) S_T}{N_0 W}. \]

More precisely, with a message power normalized to \( P_m = A_m^2/2 \) and message bandwidth \( W \):

\[ \text{SNR}_o^{\text{FM}} = \frac{3 k_f^2 P_m}{W^3} \cdot \frac{S_T}{N_0}. \]

FM noise improvement. The factor \( 3\beta^2(\beta+1)/2 \) grows rapidly with \( \beta \). Commercial FM with \( \beta = 5 \) achieves an SNR improvement of approximately \( 3 \times 25 \times 6 / 2 = 225 \) (23.5 dB) over the baseband reference, using a channel bandwidth \( 2(\beta+1) = 12 \) times wider. This is the fundamental bandwidth–power trade-off of FM: you can exchange bandwidth for noise immunity, a pre-Shannon engineering insight that is quantified by Shannon's capacity formula.

5.4.1 The FM Threshold Effect

The above SNR formula holds only when the received SNR is above a threshold. Below threshold, the noise clicks in the discriminator output cause a catastrophic collapse of the output SNR — it falls much faster than the received SNR decreases.

FM Threshold. The FM threshold is approximately at the received SNR value where the probability of clicks (noise exceeding the signal envelope) becomes significant. For a discriminator, threshold occurs near \( S_T / (N_0 B_T) \approx 10 \). A PLL demodulator has a lower threshold (2–3 dB better) because it maintains phase lock to the signal before noise clips.

5.4.2 Pre-emphasis and De-emphasis

In practice, voice and music signals have a PSD that falls with frequency. The FM noise PSD after the discriminator rises quadratically with frequency (the noise power spectral density at the discriminator output is \( \propto f^2 \)). A pre-emphasis filter with response \( H_{\text{pe}}(f) \propto j f \) boosts high frequencies of the message before transmission; a matched de-emphasis filter \( H_{\text{de}}(f) \propto 1/(jf) \) is applied at the receiver output. The net effect is to equalize the noise spectrum and further improve SNR by approximately 13 dB for voice signals.

Chapter 5B: Sampling, Quantization, and Pulse Code Modulation

5B.1 The Sampling Theorem

Transmitting analog signals digitally requires first converting them to a sequence of numbers. The bridge is the Nyquist–Shannon sampling theorem.

Nyquist–Shannon Sampling Theorem. A signal \( m(t) \) bandlimited to \( W \) Hz (i.e., \( M(f) = 0 \) for \( |f| > W \)) is completely determined by its samples \( m(nT_s) \) taken at the sampling rate \( f_s = 1/T_s \geq 2W \). The minimum rate \( f_N = 2W \) is the Nyquist rate, and \( T_s = 1/(2W) \) is the Nyquist interval.

The reconstruction formula is

\[ m(t) = \sum_{n=-\infty}^{\infty} m(nT_s) \, \mathrm{sinc}\!\left(\frac{t - nT_s}{T_s}\right). \]

To see why this works: the sampled signal in the time domain is \( m_s(t) = m(t) \sum_n \delta(t - nT_s) \). In the frequency domain, \[ M_s(f) = f_s \sum_{n=-\infty}^{\infty} M(f - nf_s). \]

The spectrum of the sampled signal is periodic with period \( f_s \), consisting of shifted copies of \( M(f) \). If \( f_s \geq 2W \), adjacent copies do not overlap, and passing \( M_s(f) \) through an ideal lowpass filter of bandwidth \( W \) and gain \( T_s \) recovers \( M(f) \) exactly. The LPF has impulse response \( h(t) = 2W T_s \, \mathrm{sinc}(2Wt) = \mathrm{sinc}(t/T_s) \), giving the reconstruction formula by convolution.

If \( f_s < 2W \), the shifted copies overlap in frequency, causing aliasing: high-frequency content folds back into the baseband and cannot be separated from the true low-frequency content. Anti-aliasing filters are applied before sampling in all practical systems.

5B.1.1 Practical Sampling Methods

In practice, samples are not infinitesimally narrow impulses. Two practical sampling waveforms are:

Natural sampling. \( m_s(t) = m(t) \cdot p(t) \), where \( p(t) \) is a periodic pulse train of pulses with duration \( \tau \ll T_s \). The spectrum is a scaled, weighted sum of shifted replicas of \( M(f) \), recoverable by LPF.
Flat-top (sample-and-hold) sampling. Each sample value is held constant for duration \( \tau \). This introduces a sinc-function amplitude distortion \( H(f) = \tau \, \mathrm{sinc}(f\tau) \), correctable by an equalizer.

5B.2 Quantization

The continuous-valued samples must be quantized to one of \( L = 2^n \) discrete levels for digital representation.

Uniform Quantizer. For a signal with peak amplitude \( A \), a uniform quantizer with \( L \) levels uses a step size \[ \Delta = \frac{2A}{L}. \]

The quantizer rounds each sample to the nearest quantization level. The quantization error \( e = m - \hat{m} \) is bounded \( |e| \leq \Delta/2 \).

For a full-load signal and assuming the quantization error is uniformly distributed on \( [-\Delta/2, \Delta/2] \):

\[ \sigma_e^2 = \frac{\Delta^2}{12} = \frac{A^2}{3L^2}. \]

The quantization SNR for a sinusoidal signal with amplitude \( A \) is

\[ \text{SNR}_q = \frac{A^2/2}{\sigma_e^2} = \frac{3L^2}{2} = \frac{3 \cdot 2^{2n}}{2}. \]

In dB:

\[ \text{SNR}_q \approx 1.76 + 6.02n \text{ dB}. \]

Each additional bit of resolution adds approximately 6 dB of SNR. This rule of thumb governs ADC selection throughout engineering.

5B.2.1 Nonuniform Quantization and Companding

For speech signals, the amplitude distribution is not uniform: small amplitudes are much more common than large ones. Companding (compression + expanding) applies a nonlinear mapping before uniform quantization to allocate more levels to the frequently occurring small amplitudes.

The two international standards are the \(\mu\)-law (North America and Japan) and A-law (Europe):

\[ y = \frac{A_c \ln(1 + \mu |x|/A_c)}{\ln(1 + \mu)} \mathrm{sgn}(x), \quad \mu = 255 \text{ (North America)}. \]

Companding provides a nearly constant SNR across a wide dynamic range of input levels (roughly 40 dB).

5B.3 Pulse Code Modulation (PCM)

PCM is the standard technique for converting analog signals to digital:

Sampling at rate \( f_s \geq 2W \).
Quantization to \( L = 2^n \) levels.
Encoding each sample as an \( n \)-bit binary word.
Transmission as a baseband digital signal.

The bit rate of a PCM signal is

\[ R_b = n f_s \text{ bits/second.} \]

For telephone-quality speech: \( W = 4 \) kHz, \( f_s = 8 \) kHz, \( n = 8 \) bits, giving \( R_b = 64 \) kbps — the DS0 channel rate of the telephone network.

5B.3.1 Noise in PCM Systems

PCM systems suffer from two sources of noise: channel noise (bit errors) and quantization noise. For high-SNR channels (few bit errors), the dominant degradation is quantization noise:

\[ \text{SNR}_{\text{PCM}} \approx \text{SNR}_q = 1.76 + 6.02n \text{ dB}. \]

At low channel SNR, each bit error causes an error in one of the \( n \) bits of a quantized sample, and the most significant bit (MSB) error causes the worst distortion. A PCM system is robust down to a channel bit error rate (BER) of roughly \( 10^{-4} \) before the channel errors dominate quantization noise.

5B.4 Time-Division Multiplexing

Multiple PCM channels can be interleaved in time on a single transmission link. The T1 carrier standard (North America) multiplexes 24 DS0 channels:

24 channels × 8 bits/sample + 1 framing bit per frame = 193 bits/frame
Frame rate = 8 kHz → \( R_b = 193 \times 8000 = 1.544 \) Mbps.

Chapter 6: Digital Baseband Transmission

6.1 Baseband Signal Model

In digital communications, information is encoded as a sequence of symbols \( \{a_k\} \), each drawn from an alphabet of size \( M \). A binary system uses \( M = 2 \). The baseband transmitted signal is

\[ s(t) = \sum_{k=-\infty}^{\infty} a_k g(t - kT), \]

where \( g(t) \) is the pulse waveform and \( T \) is the symbol duration. The symbol rate (baud rate) is \( 1/T \) symbols/second, and the bit rate is \( R_b = \log_2(M) / T \) bits/second.

6.2 Intersymbol Interference and Nyquist Criterion

When \( g(t) \) does not satisfy certain conditions, samples of the received signal at the decision times contain contributions from adjacent symbols — this is intersymbol interference (ISI).

Nyquist ISI-Free Criterion. The pulse \( g(t) \) is ISI-free if and only if \[ g(kT) = \begin{cases} 1 & k = 0 \\ 0 & k \neq 0 \end{cases} \]

(with appropriate normalization). In the frequency domain, this is equivalent to

\[ \sum_{k=-\infty}^{\infty} G\!\left(f - \frac{k}{T}\right) = T, \quad \text{for all } f. \]

Let the received signal sample at time \( t = nT \) be \[ y(nT) = \sum_k a_k g((n-k)T) = a_n g(0) + \sum_{k \neq n} a_k g((n-k)T). \]

The second term is ISI. It vanishes for all sequences \( \{a_k\} \) if and only if \( g((n-k)T) = 0 \) for \( k \neq n \), which is the time-domain Nyquist condition. The frequency-domain equivalent follows by Poisson summation.

The minimum bandwidth for ISI-free transmission is \( 1/(2T) \) Hz, achieved by the ideal sinc pulse. Since the sinc pulse has infinite time support and is sensitive to timing errors, practical systems use the raised-cosine (RC) spectrum:

\[ G_{\text{RC}}(f) = \begin{cases} T & |f| \leq \frac{1-\alpha}{2T} \\ \frac{T}{2}\left[1 + \cos\!\left(\frac{\pi T}{\alpha}\left(|f| - \frac{1-\alpha}{2T}\right)\right)\right] & \frac{1-\alpha}{2T} \leq |f| \leq \frac{1+\alpha}{2T} \\ 0 & |f| > \frac{1+\alpha}{2T} \end{cases} \]

where \( \alpha \in [0,1] \) is the roll-off factor. The bandwidth is \( B = (1+\alpha)/(2T) \), and the pulse in the time domain is

\[ g_{\text{RC}}(t) = \mathrm{sinc}(t/T) \cdot \frac{\cos(\pi\alpha t/T)}{1 - (2\alpha t/T)^2}, \]

which decays as \( t^{-3} \) for large \( t \) — much faster than the sinc’s \( t^{-1} \) decay, making it robust to timing errors.

6.2.1 The Eye Diagram

The eye diagram is obtained by overlaying successive segments of the received waveform \( y(t) \), each of duration one or two symbol periods. In the absence of noise and ISI, all traces overlap perfectly, producing a wide open eye. ISI causes the eye to close. The eye opening measures the noise immunity:

Eye height: the margin for correct detection despite noise.
Eye width: the margin for correct detection despite timing errors.
Eye closure: a direct visual indicator of ISI severity.

6.3 The Matched Filter

Given a known signal \( s(t) \) corrupted by AWGN of PSD \( N_0/2 \), the filter that maximizes the output SNR at sampling time \( t = T_0 \) is the matched filter.

Matched Filter. The filter that maximizes the output signal-to-noise ratio at time \( T_0 \) for a signal \( s(t) \) in white noise of PSD \( N_0/2 \) has impulse response \[ h(t) = s(T_0 - t), \]

i.e., the matched filter is the time-reversed, time-shifted copy of the signal. The maximum SNR achieved is

\[ \text{SNR}_{\max} = \frac{2E_s}{N_0}, \]

where \( E_s = \int_{-\infty}^{\infty} s^2(t) \, dt \) is the signal energy.

The output of the filter at time \( T_0 \) is \[ y(T_0) = \int_{-\infty}^{\infty} H(f) S(f) e^{j2\pi f T_0} \, df. \]

The noise power at the output is \( \sigma_n^2 = (N_0/2) \int |H(f)|^2 \, df \). The output SNR is

\[ \text{SNR} = \frac{|y(T_0)|^2}{\sigma_n^2} = \frac{\left|\int H(f) S(f) e^{j2\pi fT_0} df\right|^2}{(N_0/2)\int |H(f)|^2 df}. \]

By the Cauchy–Schwarz inequality:

\[ \left|\int H(f) S(f) e^{j2\pi fT_0} df\right|^2 \leq \int |H(f)|^2 df \cdot \int |S(f)|^2 df, \]

with equality when \( H(f) = c \cdot S^*(f) e^{-j2\pi fT_0} \), i.e., \( h(t) = c \cdot s(T_0 - t) \). Substituting:

\[ \text{SNR}_{\max} = \frac{\int |S(f)|^2 df}{N_0/2} = \frac{2E_s}{N_0}. \]

An equivalent implementation is the correlator receiver: the received signal is correlated (inner-producted) with a template \( s(t) \) over the symbol interval \( [0, T] \):

\[ y = \int_0^T r(t) s(t) \, dt. \]

The correlator and matched filter are equivalent when the input is causal and time-limited to \( [0,T] \).

6.4 Optimal Receiver for Binary Signaling

Consider binary signaling with symbols \( s_0(t) \) and \( s_1(t) \) of energy \( E_0 \) and \( E_1 \) respectively, transmitted over an AWGN channel. The received signal is

\[ r(t) = s_i(t) + n(t), \quad i \in \{0, 1\}. \]

The optimal receiver minimizes the probability of error. Under equal priors, this reduces to the maximum likelihood (ML) receiver, which computes

\[ \Lambda = \int_0^T r(t)[s_1(t) - s_0(t)] \, dt \underset{H_0}{\overset{H_1}{\gtrless}} \gamma, \]

where the threshold is \( \gamma = (E_1 - E_0)/2 \).

Error Probability for Binary Signaling. The probability of error for the optimal binary receiver in AWGN is \[ P_e = Q\!\left(\sqrt{\frac{d^2}{2N_0}}\right), \]

where \( d^2 = \int_0^T [s_1(t) - s_0(t)]^2 \, dt = E_1 + E_0 - 2\rho\sqrt{E_0 E_1} \) is the squared Euclidean distance between the two signals, \( \rho = \frac{1}{\sqrt{E_0 E_1}}\int s_0 s_1 \, dt \) is the correlation coefficient, and the Q-function is

\[ Q(x) = \frac{1}{\sqrt{2\pi}} \int_x^{\infty} e^{-u^2/2} \, du. \]

The Q-function is monotonically decreasing: larger distance \( d \) means lower error probability. The key design principle is to maximize the distance between signal waveforms subject to an energy constraint.

Chapter 7: Digital Bandpass Modulation

7.1 Bandpass Signal Representation

A bandpass digital signal can always be written as

\[ s(t) = \mathrm{Re}\bigl[\tilde{s}(t) e^{j2\pi f_c t}\bigr], \]

where \( \tilde{s}(t) = s_I(t) + js_Q(t) \) is the complex baseband equivalent (lowpass equivalent) signal. The in-phase component \( s_I(t) \) and quadrature component \( s_Q(t) \) carry the transmitted information.

7.2 Binary Phase-Shift Keying (BPSK)

In BPSK, the carrier phase takes one of two values: \( 0 \) or \( \pi \).

\[ s_i(t) = \sqrt{\frac{2E_b}{T_b}} \cos(2\pi f_c t + (1-i)\pi), \quad i \in \{0, 1\}, \quad 0 \leq t \leq T_b. \]

Equivalently, \( s_0(t) = +\sqrt{2E_b/T_b}\cos(2\pi f_c t) \) and \( s_1(t) = -\sqrt{2E_b/T_b}\cos(2\pi f_c t) \). The correlation coefficient is \( \rho = -1 \) (antipodal signaling), giving maximum distance \( d^2 = 4E_b \).

BPSK Bit Error Rate. \[ P_b^{\text{BPSK}} = Q\!\left(\sqrt{\frac{2E_b}{N_0}}\right). \]

This is the best achievable BER for binary signaling with the same energy per bit — antipodal signals achieve the maximum distance.

The BER of BPSK at \( E_b/N_0 = 9.6 \) dB is approximately \( 10^{-5} \), a typical requirement for reliable digital communications.

7.3 Quadrature Phase-Shift Keying (QPSK)

QPSK transmits 2 bits per symbol using four equally spaced phases \( \{0, \pi/2, \pi, 3\pi/2\} \) (or \( \{\pi/4, 3\pi/4, 5\pi/4, 7\pi/4\} \)):

\[ s_i(t) = \sqrt{\frac{2E_s}{T_s}}\cos\!\left(2\pi f_c t + \frac{(2i-1)\pi}{4}\right), \quad i = 1,2,3,4. \]

QPSK can be viewed as two independent BPSK systems on orthogonal carriers (I and Q channels), each carrying one bit at half the symbol rate. The BER of QPSK is

\[ P_b^{\text{QPSK}} = Q\!\left(\sqrt{\frac{2E_b}{N_0}}\right), \]

identical to BPSK, but QPSK transmits 2 bits per symbol and thus requires only half the bandwidth of BPSK at the same bit rate. This is the key advantage: bandwidth efficiency of 2 bits/s/Hz.

7.4 M-ary QAM

M-ary QAM generalizes QPSK by using a rectangular constellation of \( M = 2^k \) points in the \( (s_I, s_Q) \) plane. For square \( M \)-QAM with \( \sqrt{M} \) levels in each dimension:

\[ s(t) = \sqrt{\frac{2}{T_s}}\bigl[a_I \cos(2\pi f_c t) - a_Q \sin(2\pi f_c t)\bigr], \]

where \( a_I, a_Q \in \{-(M^{1/2}-1), \ldots, -1, 1, \ldots, (M^{1/2}-1)\} \cdot d_{\min}/2 \).

BER of M-QAM (Gray coding). For Gray-coded square M-QAM in AWGN: \[ P_b \approx \frac{4}{\log_2 M}\left(1 - \frac{1}{\sqrt{M}}\right) Q\!\left(\sqrt{\frac{6\log_2 M}{M-1}\cdot\frac{E_b}{N_0}}\right). \]

As \( M \) increases, more bits are transmitted per symbol (higher spectral efficiency, up to \( \log_2 M \) bits/s/Hz), but the required \( E_b/N_0 \) increases — the classic trade-off between spectral efficiency and power efficiency.

7.5 M-ary FSK

In M-ary FSK, each symbol is encoded as one of \( M \) frequency tones \( f_k = f_c + k/T \):

\[ s_k(t) = \sqrt{\frac{2E_s}{T_s}}\cos(2\pi f_k t), \quad k = 0,1,\ldots,M-1. \]

The tones are orthogonal if their frequency separation is \( 1/T_s \). For noncoherent BFSK, the minimum frequency separation for orthogonality doubles to \( 1/T_s \). M-FSK is well-suited for power-limited (but not bandwidth-limited) systems, since increasing \( M \) reduces the required \( E_b/N_0 \) for a given BER at the cost of wider bandwidth.

7.6 Differential Phase-Shift Keying (DPSK)

DPSK avoids the need for absolute carrier phase reference by encoding information in the phase difference between successive symbols. For DBPSK:

\[ \theta_k = \theta_{k-1} + \Delta\theta_k, \quad \Delta\theta_k = \begin{cases} 0 & \text{if bit is 0} \\ \pi & \text{if bit is 1.} \end{cases} \]

The receiver computes the phase difference between adjacent received symbols, using the previous symbol as a phase reference. The BER of DBPSK is

\[ P_b^{\text{DBPSK}} = \frac{1}{2}e^{-E_b/N_0}, \]

which is about 3 dB worse than coherent BPSK at low BER, but requires no carrier phase estimation.

7.7 Error Probability: The Q-Function and Union Bound

The Q-function is the tail probability of a standard normal random variable:

\[ Q(x) = P(Z > x), \quad Z \sim \mathcal{N}(0,1). \]

For \( x > 0 \), tight bounds are:

\[ \frac{x}{1+x^2}\frac{1}{\sqrt{2\pi}}e^{-x^2/2} \leq Q(x) \leq \frac{1}{x\sqrt{2\pi}}e^{-x^2/2}. \]

For \( M \)-ary signaling, the exact symbol error probability is often difficult to compute. The union bound provides a tractable upper bound:

Union Bound. For \( M \)-ary signaling, the symbol error probability satisfies \[ P_s \leq \sum_{i=1}^{M} \sum_{j \neq i} P(\text{decide } s_j | s_i \text{ sent}) \cdot P(s_i) \leq (M-1) Q\!\left(\frac{d_{\min}}{\sqrt{2N_0}}\right), \]

where \( d_{\min} \) is the minimum Euclidean distance between any two constellation points.

The union bound is tight at high SNR where only the nearest neighbor errors dominate. At low SNR, it can be loose by a factor of \( M \).

Chapter 8: ISI Channels and OFDM

8.1 Bandlimited Channels and ISI

Real channels are not ideal lowpass filters. Multipath propagation, bandwidth limitations, and dispersive media cause the channel impulse response \( h_c(t) \) to spread pulses across multiple symbol periods, causing ISI:

\[ y(t) = \sum_k a_k (g * h_c)(t - kT) + n(t). \]

The combined pulse \( p(t) = (g * h_c)(t) \) generally does not satisfy the Nyquist ISI-free criterion, leading to ISI at the sampling instants.

8.2 Equalization

An equalizer is a filter \( c(t) \) at the receiver designed to invert the channel distortion:

\[ (p * c)(kT) = \delta_{k,0}, \]

i.e., the combined pulse-channel-equalizer response is ISI-free. The design involves a trade-off between ISI elimination and noise enhancement.

Zero-Forcing (ZF) Equalizer. The ZF equalizer sets \( C(f) = 1/P(f) \) (complete inversion). This forces ISI to zero but can catastrophically amplify noise at frequencies where \( |P(f)| \) is small.

Minimum Mean-Square Error (MMSE) Equalizer. The MMSE equalizer minimizes \( \mathbb{E}[|a_k - \hat{a}_k|^2] \):

\[ C_{\text{MMSE}}(f) = \frac{P^*(f)}{|P(f)|^2 + N_0/E_s}. \]

The term \( N_0/E_s \) prevents noise amplification at spectral nulls, providing a better noise-ISI trade-off than ZF.

8.3 Orthogonal Frequency-Division Multiplexing (OFDM)

OFDM is the dominant air-interface technology in Wi-Fi (802.11a/g/n/ac/ax), LTE, NR (5G), and ADSL. Instead of transmitting a single wideband pulse, OFDM divides the channel bandwidth into \( N \) narrow subcarriers, each transmitting at a low symbol rate.

OFDM Signal. The complex baseband OFDM signal over one OFDM symbol of duration \( T_u \) is \[ s(t) = \frac{1}{\sqrt{N}} \sum_{k=0}^{N-1} X_k \, e^{j2\pi k \Delta f \, t}, \quad 0 \leq t < T_u, \]

where \( X_k \) is the complex data symbol on subcarrier \( k \), and the subcarrier spacing is \( \Delta f = 1/T_u \).

The subcarriers are orthogonal over \( [0, T_u] \):

\[ \int_0^{T_u} e^{j2\pi k \Delta f t} (e^{j2\pi m \Delta f t})^* \, dt = T_u \delta_{km}. \]

This is precisely the inverse discrete Fourier transform (IDFT): the time-domain OFDM symbol is the IDFT of the vector \( (X_0, X_1, \ldots, X_{N-1}) \). At the receiver, the DFT recovers the data symbols. Using the FFT algorithm, both operations have complexity \( O(N\log N) \).

8.3.1 Cyclic Prefix

To convert the linear convolution with the channel into a circular convolution (so the DFT diagonalizes it), a cyclic prefix (CP) is prepended to each OFDM symbol: the last \( N_{CP} \) samples of the IDFT output are copied to the beginning. The CP must be longer than the channel delay spread \( L_{ch} \) samples.

After CP removal and DFT at the receiver, the input–output relationship on subcarrier \( k \) is simply:

\[ Y_k = H_k X_k + N_k, \]

where \( H_k = H(e^{j2\pi k/N}) \) is the channel frequency response at subcarrier \( k \), and \( N_k \) is AWGN. The multipath channel becomes a set of \( N \) parallel, ISI-free, single-tap channels — dramatically simplifying equalization to a single complex division per subcarrier.

8.3.2 OFDM Trade-offs

Advantages. OFDM is robust to multipath (ISI), enables adaptive bit loading (different constellations on different subcarriers based on \( |H_k|^2 \)), and has a spectrally efficient rectangular spectrum (sinc sidelobes are managed by windowing in practice).

Disadvantages. The high peak-to-average power ratio (PAPR) of OFDM requires linear power amplifiers with large backoff, reducing power efficiency. The sinc sidelobes can cause inter-carrier interference when there are frequency offsets. Careful synchronization of carrier frequency and timing is essential.

Chapter 9: Information Theory

9.1 Entropy

Information theory, founded by Claude Shannon in 1948, provides fundamental limits on how efficiently information can be compressed and how reliably it can be transmitted through a noisy channel.

Shannon Entropy. For a discrete random variable \( X \) taking values in alphabet \( \mathcal{X} = \{x_1, \ldots, x_M\} \) with probabilities \( P(X = x_i) = p_i \), the entropy is \[ H(X) = -\sum_{i=1}^{M} p_i \log_2 p_i \quad \text{bits}, \]

with the convention \( 0 \log 0 = 0 \). Entropy measures the average uncertainty (information content) of \( X \).

Bounds on Entropy. \[ 0 \leq H(X) \leq \log_2 M. \]

The lower bound is achieved when \( X \) is deterministic (one \( p_i = 1 \), all others 0). The upper bound is achieved when \( X \) is uniformly distributed (\( p_i = 1/M \) for all \( i \)).

Binary entropy. For a Bernoulli\((p)\) random variable: \[ H_b(p) = -p\log_2 p - (1-p)\log_2(1-p). \]

\( H_b(0) = H_b(1) = 0 \) (deterministic), \( H_b(1/2) = 1 \) bit (maximum uncertainty for a binary variable).

9.1.1 Joint Entropy, Conditional Entropy, and Mutual Information

Mutual Information. The mutual information between random variables \( X \) and \( Y \) is \[ I(X;Y) = H(X) - H(X|Y) = H(Y) - H(Y|X) = \sum_{x,y} p(x,y) \log_2 \frac{p(x,y)}{p(x)p(y)}, \]

where \( H(X|Y) = -\sum_{x,y} p(x,y) \log_2 p(x|y) \) is the conditional entropy. Mutual information measures how much knowing \( Y \) reduces uncertainty about \( X \).

Mutual information satisfies \( I(X;Y) \geq 0 \), with equality if and only if \( X \) and \( Y \) are independent. The data processing inequality states that processing cannot increase mutual information: if \( X \to Y \to Z \) forms a Markov chain, then \( I(X;Z) \leq I(X;Y) \).

9.2 Channel Capacity

Discrete Memoryless Channel (DMC). A DMC is characterized by an input alphabet \( \mathcal{X} \), output alphabet \( \mathcal{Y} \), and a conditional probability distribution \( p(y|x) \), with the memoryless property that successive channel uses are independent.

Shannon Channel Capacity. The capacity of a DMC is \[ C = \max_{p(x)} I(X;Y) \quad \text{bits per channel use.} \]

Shannon’s channel coding theorem states that reliable communication at rate \( R < C \) bits/use is achievable (there exist codes with vanishingly small error probability as the block length \( n \to \infty \)), and no code can achieve reliable communication at rate \( R > C \).

The theorem is an existence result: it guarantees that good codes exist, but does not provide an explicit construction. Finding codes that approach capacity with low decoding complexity is the central problem of modern coding theory.

9.3 Capacity of the AWGN Channel

For the AWGN channel with received signal \( Y = X + N \), where \( N \sim \mathcal{N}(0, N_0/2) \) and input power constraint \( \mathbb{E}[X^2] \leq P \):

Shannon–Hartley Theorem. The capacity of a bandlimited AWGN channel with bandwidth \( B \) Hz, signal power \( P \), and noise PSD \( N_0/2 \) is \[ C = B \log_2\!\left(1 + \frac{P}{N_0 B}\right) = B \log_2(1 + \text{SNR}) \quad \text{bits per second,} \]

where \( \text{SNR} = P/(N_0 B) \) is the received signal-to-noise ratio.

The optimal input distribution that achieves the mutual information \( I(X;Y) = H(Y) - H(Y|X) = H(Y) - H(N) \) is Gaussian: \( X \sim \mathcal{N}(0, P) \). Then \( Y \sim \mathcal{N}(0, P + N_0 B/2) \). For Gaussian variables, differential entropy is \( h(Y) = \frac{1}{2}\log_2(2\pi e(P + N_0 B)) \) and \( h(N) = \frac{1}{2}\log_2(2\pi e N_0 B) \). Thus \[ I(X;Y) = h(Y) - h(N) = \frac{1}{2}\log_2\!\left(1 + \frac{P}{N_0 B/2}\right) \]

per real-valued channel use. With both I and Q dimensions and Nyquist signaling rate \( 2B \), the capacity in bits/second is \( B\log_2(1 + P/(N_0 B)) \).

The Shannon–Hartley theorem reveals two key insights:

Bandwidth–power trade-off. Doubling the bandwidth (with the same total power, so SNR halves) always increases capacity, but with diminishing returns: \( C \to (P/N_0)\log_2 e \) as \( B \to \infty \). This ultimate limit is the wideband capacity \( C_\infty = (P/N_0)\log_2 e \approx 1.44 P/N_0 \).
SNR in dB vs. bits. Every 3 dB increase in SNR adds approximately 1 bit per channel use. OFDM systems adaptively choose the constellation size on each subcarrier to approach this limit.

Chapter 10: Source Coding

10.1 Source Coding Theorem

Source coding removes redundancy from the source output. Shannon’s source coding theorem establishes that the theoretical limit on lossless compression is the source entropy.

Shannon's Source Coding Theorem. Let \( X \) be a discrete memoryless source with entropy \( H(X) \) bits per symbol. For any \( \epsilon > 0 \) and sufficiently long blocks, there exists a code with average codeword length \( \bar{L} \) satisfying \[ H(X) \leq \bar{L} < H(X) + \epsilon. \]

No code can achieve \( \bar{L} < H(X) \).

For a source with \( M \) symbols and probabilities \( p_1 \geq p_2 \geq \ldots \geq p_M \), variable-length prefix-free codes assign shorter codewords to more probable symbols.

10.2 Huffman Coding

Huffman Algorithm. Given symbol probabilities \( \{p_1, \ldots, p_M\} \):

Sort symbols by probability (decreasing).
Combine the two least probable symbols into a single node with combined probability.
Repeat until one node remains.
Assign binary labels (0/1) to each branch; the codeword for each symbol is the path from the root to the leaf.

The Huffman code achieves the minimum average codeword length among all prefix-free codes, and satisfies

\[ H(X) \leq \bar{L}_{\text{Huffman}} < H(X) + 1. \]

Huffman code construction. For a source with four symbols and probabilities \( p_1 = 0.5, p_2 = 0.25, p_3 = 0.125, p_4 = 0.125 \):

Combine \( p_3 \) and \( p_4 \): new node with probability 0.25.
Combine new node with \( p_2 \): new node with probability 0.5.
Combine with \( p_1 \): root with probability 1.0.

Codewords: \( x_1 = 0 \), \( x_2 = 10 \), \( x_3 = 110 \), \( x_4 = 111 \). Average length: \( 0.5(1) + 0.25(2) + 0.125(3) + 0.125(3) = 1.75 \) bits/symbol. The entropy is \( H = 0.5 \cdot 1 + 0.25 \cdot 2 + 0.125 \cdot 3 + 0.125 \cdot 3 = 1.75 \) bits/symbol — this is a perfect code.

10.3 Shannon–Fano Coding

Shannon–Fano coding is a suboptimal but instructive algorithm: sort symbols by probability, then recursively divide into two groups of approximately equal total probability, assigning 0 to the top group and 1 to the bottom. It does not always produce the minimum average codeword length, but approaches the entropy limit.

Chapter 11: Channel Coding

11.1 The Error-Correction Coding Problem

A channel code adds structured redundancy to the transmitted data to enable the receiver to detect and correct errors caused by channel noise. If \( k \) information bits are encoded into \( n > k \) channel bits, the code rate is \( R_c = k/n \). By Shannon’s theorem, reliable transmission is possible as long as \( R_c < C/\log_2 M \) bits per channel use.

11.2 Linear Block Codes

Linear Block Code. An \( [n, k] \) binary linear block code is a \( k \)-dimensional subspace of \( \{0,1\}^n \) over \( \mathrm{GF}(2) \). The code is specified by a \( k \times n \) generator matrix \( G \): codewords are \( \mathbf{c} = \mathbf{u} G \), where \( \mathbf{u} \in \{0,1\}^k \) is the information word. The \( (n-k) \times n \) parity-check matrix \( H \) satisfies \( G H^T = 0 \) and \( H \mathbf{c}^T = \mathbf{0} \) for all codewords.

The syndrome of a received vector \( \mathbf{r} = \mathbf{c} + \mathbf{e} \) (where \( \mathbf{e} \) is the error vector) is

\[ \mathbf{s} = H \mathbf{r}^T = H \mathbf{e}^T. \]

If \( \mathbf{s} = \mathbf{0} \), no error is detected (though an undetectable error may have occurred). Syndrome decoding: for each possible error pattern, precompute the syndrome; upon receiving \( \mathbf{r} \), compute \( \mathbf{s} \) and look up the most likely error pattern.

11.2.1 Hamming Codes

Hamming Code. The \( [2^m - 1, 2^m - 1 - m] \) Hamming code has a parity-check matrix \( H \) whose columns are all nonzero binary vectors of length \( m \). For \( m = 3 \): the \( [7, 4] \) Hamming code has rate \( 4/7 \approx 0.571 \) and minimum distance \( d_{\min} = 3 \), capable of correcting any single-bit error.

The syndrome of the \( [7,4] \) code directly gives the binary address of the error bit, making single-error correction trivially implementable.

11.3 Convolutional Codes

Convolutional codes process the information stream continuously rather than in blocks. They have memory: each encoder output depends on the current input and the previous \( K-1 \) inputs (constraint length \( K \)).

Rate-1/2 Convolutional Encoder. A rate-\( 1/2 \) encoder with constraint length \( K \) has a shift register of \( K-1 \) bits. At each time step, the input bit shifts in, and two output bits are formed by modulo-2 additions of selected register taps. The code is specified by the generator polynomials \( g_1(D) \) and \( g_2(D) \) in the delay operator \( D \).

A widely used code is the rate-\( 1/2 \), \( K = 7 \) code with generators \( g_1 = (1011011)_2 = 133_8 \) and \( g_2 = (1101101)_2 = 171_8 \), used in NASA deep-space communications and in the 802.11b Wi-Fi standard.

11.3.1 The Viterbi Algorithm

The maximum-likelihood decoder for convolutional codes is the Viterbi algorithm, which exploits the trellis structure of the code.

Viterbi Algorithm. The code trellis has \( 2^{K-1} \) states at each time step. The Viterbi algorithm performs dynamic programming on the trellis:

Branch metric: the distance between the received symbols and the expected symbols for each trellis branch.
Path metric: the cumulative sum of branch metrics along a path.
Add-compare-select (ACS): at each node, retain only the path with the smallest metric (ML survivor). Discard the other.
Traceback from the final state recovers the ML codeword.

For hard-decision decoding, the branch metric is the Hamming distance. For soft-decision decoding (using channel LLRs), it is the squared Euclidean distance — soft-decision decoding provides approximately 2–3 dB coding gain over hard-decision.

The complexity of the Viterbi algorithm is \( O(2^{K-1} \cdot n) \), linear in the code length but exponential in \( K \). For the \( K = 7 \) code, the 64-state trellis is manageable with hardware.

Summary: Key Relationships and Design Trade-offs

The following table collects the central performance formulae for the modulation and coding schemes covered in this course. Throughout, \( E_b/N_0 \) is the energy-per-bit to noise-density ratio, \( W \) is the message bandwidth, \( B_T \) is the transmission bandwidth, and all BER values assume AWGN.

Analog modulation BW and SNR:

Scheme	\( B_T \)	\( \text{SNR}_o \)
DSB-SC	\( 2W \)	\( S_T / (N_0 W) \)
Conventional AM	\( 2W \)	\( \eta \cdot S_T / (N_0 W) \)
SSB	\( W \)	\( S_T / (N_0 W) \)
FM	\( 2(\Delta f + W) \)	\( 3\beta^2(\beta+1) S_T / (N_0 W) \)

Digital modulation BER:

Scheme	Bits/symbol	BER
BPSK	1	\( Q(\sqrt{2E_b/N_0}) \)
QPSK	2	\( Q(\sqrt{2E_b/N_0}) \)
DBPSK	1	\( (1/2) e^{-E_b/N_0} \)
BFSK (coherent)	1	\( Q(\sqrt{E_b/N_0}) \)

Shannon limit: \( C = B\log_2(1 + \text{SNR}) \) bits/s. No modulation or coding scheme can sustain reliable communication at a rate above \( C \).