ECE 445: Integrated Digital Electronics

Lan Wei

Estimated study time: 58 minutes

Table of contents

Sources and References

Primary textbooks — N. Weste and D. Harris, CMOS VLSI Design: A Circuits and Systems Perspective, 4th ed., Pearson/Addison-Wesley, 2010; A. S. Sedra and K. C. Smith, Microelectronic Circuits, 8th ed., Oxford University Press, 2020. Supplementary texts — J. M. Rabaey, A. Chandrakasan, and B. Nikolic, Digital Integrated Circuits: A Design Perspective, 2nd ed., Prentice Hall, 2003. Online resources — MIT OpenCourseWare 6.004 Computation Structures; Stanford EE271 VLSI Digital Circuits lecture notes (M. Horowitz).

Chapter 1: MOS Transistor Physics and Models

1.1 MOSFET Structure and Operation

The Metal-Oxide-Semiconductor Field-Effect Transistor (MOSFET) is the fundamental building block of modern digital integrated circuits. An NMOS transistor consists of a p-type silicon substrate with two heavily doped n+ regions (source and drain) separated by a channel region. A thin gate oxide (SiO2 or high-k dielectric) and a polysilicon (or metal) gate electrode sit above the channel.

Threshold Voltage (\(V_T\)): The gate-to-source voltage at which the channel inverts and a conducting path forms between source and drain. For an NMOS device, \(V_T > 0\); for a PMOS device, \(V_T < 0\).

The threshold voltage is expressed as:

\[ V_T = V_{T0} + \gamma \left( \sqrt{|2\phi_F + V_{SB}|} - \sqrt{|2\phi_F|} \right) \]

where \(V_{T0}\) is the threshold at zero body bias, \(\gamma = \sqrt{2 q \varepsilon_{si} N_A} / C_{ox}\) is the body-effect coefficient, \(\phi_F = (kT/q)\ln(N_A / n_i)\) is the Fermi potential, and \(V_{SB}\) is the source-to-body voltage.

1.2 Long-Channel I-V Characteristics

1.2.1 Triode (Linear) Region

When \(V_{GS} > V_T\) and \(V_{DS} < V_{GS} - V_T\), the MOSFET operates in the triode region:

\[ I_D = \mu_n C_{ox} \frac{W}{L} \left[ (V_{GS} - V_T) V_{DS} - \frac{V_{DS}^2}{2} \right] \]

For small \(V_{DS}\) (deep triode), this simplifies to the linear approximation:

\[ I_D \approx \mu_n C_{ox} \frac{W}{L} (V_{GS} - V_T) V_{DS} \]

which defines an effective channel resistance:

\[ R_{on} = \frac{1}{\mu_n C_{ox} (W/L)(V_{GS} - V_T)} \]

1.2.2 Saturation Region

When \(V_{DS} \geq V_{GS} - V_T\), the channel pinches off at the drain and \(I_D\) saturates:

\[ I_D = \frac{\mu_n C_{ox}}{2} \frac{W}{L} (V_{GS} - V_T)^2 (1 + \lambda V_{DS}) \]

The factor \(\lambda\) (channel-length modulation parameter) accounts for the effective shortening of the channel as \(V_{DS}\) increases. In ideal long-channel analysis, \(\lambda = 0\) and \(I_D\) is independent of \(V_{DS}\) in saturation.

The process transconductance parameter is \(k_n = \mu_n C_{ox}\), so the saturation current is often written \(I_D = (k_n / 2)(W/L)(V_{GS} - V_T)^2\). For PMOS, substitute \(\mu_p\) and note all voltages are negative.

1.2.3 PMOS Transistor

A PMOS transistor has the dual structure: an n-type substrate, p+ source and drain, and a negative threshold voltage \(V_{TP} < 0\). Conduction occurs when \(V_{GS} < V_{TP}\). By convention the drain current flows from source to drain (in the direction of hole flow), and the I-V equations take the same form with \(\mu_p\), \(|V_{GS}|,\) and \(|V_T|\).

1.3 Short-Channel Effects

As channel lengths decrease below roughly 100 nm, several phenomena invalidate the long-channel square-law model.

1.3.1 Velocity Saturation

Carrier drift velocity does not increase linearly with electric field indefinitely. At high lateral electric field \(\mathcal{E}\), the velocity saturates at \(v_{sat} \approx 10^7\) cm/s for electrons in silicon. A simple two-piece linear model gives:

\[ v = \begin{cases} \mu_n \mathcal{E} & \mathcal{E} \leq \mathcal{E}_{sat} \\ v_{sat} & \mathcal{E} > \mathcal{E}_{sat} \end{cases} \]

where \(\mathcal{E}_{sat} = v_{sat} / \mu_n\). Using a smooth model:

\[ v = \frac{\mu_n \mathcal{E}}{1 + \mathcal{E}/\mathcal{E}_{sat}} \]

The drain current in saturation under velocity saturation becomes:

\[ I_{D,sat} = v_{sat} C_{ox} W (V_{GS} - V_T - V_{DSAT}) \]

where \(V_{DSAT} = \mathcal{E}_{sat} L\). For short devices where \(V_{DSAT} \ll V_{GS} - V_T\), \(I_{D,sat} \approx v_{sat} C_{ox} W (V_{GS} - V_T)\), which is linear in overdrive rather than quadratic.

1.3.2 Drain-Induced Barrier Lowering (DIBL)

At short channel lengths, the drain electric field penetrates toward the source and lowers the potential barrier, reducing \(V_T\) as \(V_{DS}\) increases:

\[ \Delta V_T = -\eta \cdot V_{DS} \]

where \(\eta\) (DIBL coefficient) is typically 20–100 mV/V. DIBL manifests as an increase in \(I_{off}\) and a degradation of the on/off ratio.

1.3.3 Subthreshold Conduction

Below \(V_T\), the transistor is not fully off. The subthreshold current follows an exponential dependence:

\[ I_D = I_0 \exp\!\left(\frac{V_{GS} - V_T}{n V_T^{th}}\right) \left(1 - e^{-V_{DS}/V_T^{th}}\right) \]

where \(V_T^{th} = kT/q \approx 26\) mV at room temperature and \(n \approx 1.3\text{–}1.5\) is the ideality factor. The subthreshold slope \(S\) is:

\[ S = n \cdot \frac{kT}{q} \cdot \ln 10 \approx 60n \text{ mV/dec at 300 K} \]

The theoretical minimum is 60 mV/decade for \(n = 1\). Subthreshold leakage is a critical concern in low-power design.

1.3.4 Hot Carriers and Oxide Reliability

High lateral electric fields near the drain can cause impact ionization, generating hot carriers that inject into the gate oxide, causing threshold voltage drift and long-term device degradation. Modern transistors use lightly doped drain (LDD) structures to reduce peak electric field.

1.4 SPICE Models

SPICE level 1 models use the simple long-channel equations. BSIM4 (Berkeley Short-Channel IGFET Model) is the industry standard for nanometer-scale transistors and captures all short-channel effects, including velocity saturation, DIBL, subthreshold slope, gate leakage, and noise. The SPICE model file provides parameters such as \(V_{TH0}\), \(k_1\) (body effect), \(\mu_{eff}\), \(C_{ox}\), \(t_{ox}\), and \(v_{sat}\).

1.5 Scaling Trends

Dennard scaling (1974) proposed that shrinking transistor dimensions by a factor \(\kappa\) while reducing voltages by \(\kappa\) keeps power density constant. In practice:

Gate length scales as \(L \rightarrow L/\kappa\)
Oxide thickness: \(t_{ox} \rightarrow t_{ox}/\kappa\)
Supply voltage: \(V_{DD} \rightarrow V_{DD}/\kappa\)
Transistor count per chip roughly doubles every 18–24 months (Moore’s Law)

Classical Dennard scaling has ended because voltage cannot keep pace with dimension shrinkage (limited by subthreshold slope and reliability). This leads to increasing power density (“dark silicon”) and motivates specialised architectures.

Chapter 2: CMOS Inverter

2.1 Inverter Circuit

The CMOS inverter consists of a complementary pair: one PMOS (pull-up network) with its source tied to \(V_{DD}\) and one NMOS (pull-down network) with its source tied to GND. The gates are tied together as the input \(A\), and the drains are tied together as the output \(Y\).

Complementary CMOS: For every input combination, exactly one of the two networks (pull-up or pull-down) is conducting. This guarantees a low-resistance path from output to either \(V_{DD}\) or GND in steady state, resulting in rail-to-rail output swing with zero static power (ignoring leakage).

2.2 DC Transfer Characteristic

The voltage transfer characteristic (VTC) is obtained by sweeping \(V_{in}\) from 0 to \(V_{DD}\) and plotting \(V_{out}\). Five regions exist:

Region	Condition	NMOS state	PMOS state
A	\(V_{in} < V_{Tn}\)	Off	Linear
B	\(V_{Tn} < V_{in} < V_{DD}/2\)	Saturation	Linear
C	\(V_{in} \approx V_{DD}/2\)	Saturation	Saturation
D	\(V_{DD}/2 < V_{in} < V_{DD}+V_{Tp}\)	Linear	Saturation
E	\(V_{in} > V_{DD}+V_{Tp}\)	Linear	Off

The switching threshold \(V_M\) occurs when \(V_{in} = V_{out}\) and both transistors are in saturation. Setting \(I_{Dn} = |I_{Dp}|\):

\[ V_M = \frac{V_{Tn} + r\left(V_{DD} + V_{Tp}\right)}{1 + r}, \quad r = \sqrt{\frac{k_p (W/L)_p}{k_n (W/L)_n}} \]

For a symmetric inverter with \(V_M = V_{DD}/2\), we need \(r = 1\), which requires:

\[ \frac{(W/L)_p}{(W/L)_n} = \frac{\mu_n}{\mu_p} \approx 2\text{–}3 \]

since electron mobility \(\mu_n \approx 2.5\,\mu_p\) in bulk silicon.

2.3 Noise Margins

Noise margins quantify the inverter’s ability to tolerate noise on its input.

Noise Margin High (NM\(_H\)): \(NM_H = V_{OH} - V_{IH}\)
Noise Margin Low (NM\(_L\)): \(NM_L = V_{IL} - V_{OL}\)

where \(V_{OH}\) and \(V_{OL}\) are the output high and low levels, and \(V_{IH}\), \(V_{IL}\) are the input high and low thresholds defined as the points on the VTC where the gain is \(-1\) (slope = \(-1\)).

For a CMOS inverter driving rail-to-rail, \(V_{OH} = V_{DD}\) and \(V_{OL} = 0\). The noise margins are maximized for a symmetric VTC.

2.4 Propagation Delay

Propagation delay measures how quickly the output responds to an input transition. It is defined as the time from the 50% point of the input transition to the 50% point of the output transition.

\[ t_{pd} = \frac{t_{pHL} + t_{pLH}}{2} \]

where:

\(t_{pHL}\): output falling (high-to-low), NMOS pulls down the load capacitance \(C_L\)
\(t_{pLH}\): output rising (low-to-high), PMOS charges \(C_L\)

2.4.1 RC Delay Model

The transistor in its on-state is approximated as a resistor \(R\) charging or discharging the load capacitance \(C_L\). For an RC circuit with a step input:

\[ t_{50\%} = \ln(2) \cdot RC \approx 0.69 RC \]

The effective resistance of a minimum-size NMOS is \(R_n\) and of PMOS is \(R_p \approx 2R_n\). Propagation delay becomes:

\[ t_{pHL} = 0.69 R_n C_L, \qquad t_{pLH} = 0.69 R_p C_L \]

2.4.2 Elmore Delay Model

The Elmore delay model extends the RC analysis to complex RC trees (series and parallel combinations of transistors). For a path from the output to supply/ground through a series chain of resistances \(R_1, R_2, \ldots, R_k\) with capacitances \(C_i\) at each node:

\[ t_{pd} = \ln(2) \sum_i R_{i,\text{shared}} C_i \]

where \(R_{i,\text{shared}}\) is the total resistance on the path from the source to node \(i\). For a series stack of \(n\) transistors with resistance \(R\) each driving load \(C\):

\[ t_{pd} = 0.69 \cdot n R \cdot C \]

This approximation is accurate to within 10–15% for typical digital gate topologies.

2.5 Rise and Fall Times

The 10%–90% rise time \(t_r\) and fall time \(t_f\) of a CMOS output are:

\[ t_r = 2.2 R_p C_L, \qquad t_f = 2.2 R_n C_L \]

derived from \(V(t) = V_{DD}(1 - e^{-t/RC})\), solving for 90% and 10% points: \(t_{10\%\rightarrow 90\%} = RC(\ln 9) \approx 2.2 RC\).

2.6 Inverter Sizing

The delay of an inverter chain driving a large capacitive load \(C_L\) is minimized by tapering the sizes. If the total effort (logical effort times branching effort) is \(H\), and there are \(N\) stages, the optimal stage effort is \(\hat{f} \approx e \approx 2.718\) (approximately 4 in practice). The optimal number of stages for a given effort \(H\) is:

\[ N_{opt} = \log_{\hat{f}} H \]

and the minimum delay is:

\[ D_{min} = N_{opt} \cdot \hat{f} \cdot \tau \]

where \(\tau = 3RC\) is the intrinsic delay of the process. Each successive stage is scaled by a ratio of \(\hat{f}\) relative to the previous.

Chapter 3: CMOS Combinational Logic

3.1 Static CMOS Gate Structure

A static CMOS gate is composed of a pull-up network (PUN) of PMOS transistors and a pull-down network (PDN) of NMOS transistors. The PUN and PDN are dual networks:

Series NMOS in PDN ↔ Parallel PMOS in PUN
Parallel NMOS in PDN ↔ Series PMOS in PUN

NAND2 Gate: The pull-down network has two NMOS transistors in series (A and B). The pull-up network has two PMOS transistors in parallel. Output \(Y = \overline{A \cdot B}\).

NOR2 Gate: The pull-down network has two NMOS in parallel. The pull-up network has two PMOS in series. Output \(Y = \overline{A + B}\).

3.2 Logical Effort

Logical effort is a method for comparing the intrinsic speed of logic gates by characterizing how much more input capacitance a gate presents compared to a minimum inverter.

Logical Effort \(g\): The ratio of the input capacitance of a gate to the input capacitance of an inverter that delivers the same output current. \[ g = \frac{C_{in,\text{gate}}}{C_{in,\text{inverter}}} \]

For an inverter: \(g = 1\). For a 2-input NAND: \(g = 4/3\). For a 2-input NOR: \(g = 5/3\).

Electrical Effort (branching effort) \(h\): The ratio of the output load capacitance to the input capacitance: \(h = C_{out}/C_{in}\).

Stage Effort \(f\): \(f = g \cdot h\).

Path Effort \(F\): \(F = G \cdot B \cdot H\), where \(G = \prod g_i\), \(B\) is branching effort, and \(H\) is the ratio of path output capacitance to input capacitance.

The delay of a single gate normalized to the inverter delay \(\tau\) is:

\[ d = g \cdot h + p \]

where \(p\) is the intrinsic (parasitic) delay of the gate (due to internal capacitances). For an inverter \(p = 1\); for NAND2, \(p \approx 2\); for NOR2, \(p \approx 2\).

3.3 Delay Characterization and Optimization

3.3.1 Series Stack Resistance

In a NAND gate with \(n\) inputs, the NMOS transistors are stacked in series. Each transistor must be sized up by a factor of \(n\) relative to a minimum inverter to maintain equivalent drive strength:

\[ (W/L)_{PDN} = n \cdot (W/L)_{n,min} \]

This increases input capacitance and must be traded off against propagation delay. The PMOS transistors in the PUN remain sized at \(2\times\) the minimum NMOS width.

3.3.2 Transistor Ordering (Input Ordering)

In a gate with inputs of different arrival times, place the latest-arriving signal closest to the output node (closest to the drain in the series stack). This minimizes the effective capacitance that the late signal must charge/discharge.

3.3.3 Path Delay Optimization

For a multi-stage combinational logic path, the total delay is minimized when all stage efforts are equal. Given path effort \(F\) over \(N\) stages:

\[ f_1 = f_2 = \cdots = f_N = F^{1/N} \]

The transistor widths are found by working backward from the output load through the chain.

3.4 Power Consumption in CMOS

Total power consumption in a CMOS circuit has three components:

3.4.1 Dynamic (Switching) Power

Every time a node switches from low to high, energy \(C V_{DD}^2\) is drawn from the supply (half stored in the capacitor, half dissipated in the PMOS). When the node switches high to low, the energy stored in the capacitor is dissipated in the NMOS. The average dynamic power is:

\[ P_{dynamic} = \alpha C_L V_{DD}^2 f \]

where:

\(\alpha \in \left[0, 1\right]\) is the activity factor (probability of a power-consuming transition per clock cycle)
\(C_L\) is the switched load capacitance
\(V_{DD}\) is the supply voltage
\(f\) is the clock frequency

The quadratic dependence on \(V_{DD}\) makes voltage scaling the most powerful knob for power reduction.

3.4.2 Short-Circuit Power

During a transition, both PMOS and NMOS may be simultaneously conducting for a brief interval (when \(V_{Tn} < V_{in} < V_{DD} - |V_{Tp}|\), both devices are on). This creates a short-circuit current path from \(V_{DD}\) to GND:

\[ P_{sc} = \frac{\beta}{12}(V_{DD} - V_T)^3 t_{sc} f \]

Short-circuit power is typically 5–10% of total power for well-matched rise/fall times. It is minimized by ensuring \(t_{rise} = t_{fall}\) of input signals.

3.4.3 Leakage Power

Static (leakage) power arises even when the circuit is not switching. The dominant leakage mechanisms are:

Subthreshold leakage: exponential in \(V_T\); decreases with higher \(V_T\) but increases with \(V_{DD}\) reduction
Gate leakage: quantum mechanical tunneling through ultra-thin gate dielectrics (\(t_{ox} < 2\) nm)
Junction leakage: reverse-biased p-n junction diode leakage (relatively small)

\[ P_{leakage} = I_{leakage} \cdot V_{DD} \]

Multi-threshold CMOS (MTCMOS) uses high-\(V_T\) transistors on non-critical paths and sleep transistors to cut leakage during standby.

3.5 Pass-Transistor Logic

Pass-transistor logic (PTL) passes logic values through transistor switches (rather than connecting outputs directly to \(V_{DD}\) or GND). An NMOS pass transistor connecting input \(A\) to output has \(R_{on} = 1/\left(\mu_n C_{ox}(W/L)(V_{GS}-V_T)\right)\), but suffers a threshold drop: the output only reaches \(V_{DD} - V_{Tn}\) when transmitting a high, causing reduced noise margin and increased delay for downstream gates.

Transmission gates (TG) use complementary NMOS and PMOS in parallel, controlled by \(C\) and \(\bar{C}\), to pass both logic levels without threshold drop. A transmission gate has effective resistance \(R_{TG} \approx R_n R_p / (R_n + R_p)\) when both devices are on.

3.6 Dynamic CMOS Logic

Dynamic CMOS logic evaluates during a clock phase (evaluate phase, \(\phi = 1\) for NMOS) and precharges the output during the precharge phase (\(\phi = 0\).

3.6.1 Dynamic Gate Operation

A dynamic gate consists of:

A precharge PMOS transistor (gate = \(\phi\)) connecting output to \(V_{DD}\)
A PDN of NMOS transistors implementing the logic function
A footer NMOS transistor (gate = \(\phi\)) connecting PDN to GND

During precharge (\(\phi = 0\)): output precharges to \(V_{DD}\), PDN is disconnected (footer off). During evaluate (\(\phi = 1\)): precharge PMOS is off, PDN evaluates. If the inputs make the PDN conducting, output discharges to 0; otherwise it remains high.

Dynamic gates can only implement inverting logic (output starts high and may go low). The inputs must only transition from 0 to 1 during evaluation (no 1-to-0 transitions on inputs during evaluate), which restricts their use in cascaded form.

3.6.2 Domino Logic

Domino logic resolves the cascading problem by following each dynamic gate with a static inverter. The static inverter output goes from high to low (if the dynamic gate output goes low), and this falling edge can trigger the next dynamic stage:

Dynamic gate → static inverter → dynamic gate → static inverter → …

Domino logic achieves high speed (each stage has only one device type in the critical pull-down path) but requires additional inverter area.

Domino NOR2: The dynamic stage implements NAND function (with PDN: parallel NMOS for A and B inputs). The static inverter inverts to produce NOR. Each domino stage adds one inverter delay but allows high-speed cascading.

Chapter 4: CMOS Sequential Logic

4.1 Latches and Flip-Flops

4.1.1 SR Latch

The SR latch (Set-Reset) stores a single bit using cross-coupled NAND (or NOR) gates. It has two stable states. The forbidden state (S=R=1 for NAND, S=R=0 for NOR) must be avoided.

4.1.2 D Latch (Level-Sensitive)

A D latch is transparent when the clock \(\phi = 1\): output Q follows input D. When \(\phi = 0\), Q holds its last value. Implemented in CMOS with a transmission gate (TG) and an inverter feedback pair:

When \(\phi = 1\): TG passes D to Q
When \(\phi = 0\): TG opens, feedback inverter holds state

The D latch is a level-sensitive storage element and is sensitive to glitches during the transparent phase.

4.1.3 D Flip-Flop (Edge-Triggered)

A master-slave D flip-flop captures input D on the rising edge of the clock. It consists of two cascaded D latches:

Master latch: transparent when \(\phi = 0\) (samples D)
Slave latch: transparent when \(\phi = 1\) (propagates to Q)

The output Q changes only at the clock edge, suppressing glitches. A static CMOS D flip-flop is implemented using two TG-based latches with cross-coupled feedback inverters.

4.2 Timing Constraints

Setup Time \(t_{su}\): The minimum time that the input D must be stable before the clock edge for reliable capture.

Hold Time \(t_h\): The minimum time that D must remain stable after the clock edge.

Clock-to-Q Delay \(t_{cq}\): Time from the active clock edge until the output Q reaches its valid logic level.

These three parameters determine the timing budget of a synchronous pipeline:

\[ t_{clock} \geq t_{cq} + t_{logic} + t_{su} \]

where \(t_{logic}\) is the worst-case combinational delay between flip-flops and \(t_{clock}\) is the clock period. The maximum operating frequency is:

\[ f_{max} = \frac{1}{t_{cq} + t_{logic,max} + t_{su}} \]

4.3 Timing Issues: Clock Skew

Clock skew \(\delta\) is the spatial variation in clock arrival time across different flip-flops on the chip. If the receiving flip-flop clock leads the launching flip-flop clock by \(\delta_{pos}\) (positive skew), the timing constraint tightens:

\[ t_{clock} \geq t_{cq} + t_{logic} + t_{su} - \delta_{pos} \]

Negative skew (clock arrives later at the receiver) can violate the hold time constraint:

\[ t_{cq} + t_{logic,min} \geq t_h + \delta_{neg} \]

Hold time violations are caused by fast paths with too-short combinational logic between flip-flops. Unlike setup violations, hold violations cannot be fixed by slowing the clock — they require adding delay buffers on short paths or carefully controlling clock skew.

4.3.1 Clock Distribution

A balanced H-tree or buffered clock tree distributes the clock with minimal skew. Each branch of the H-tree has equal wire length to each leaf flip-flop, ensuring equal propagation delay. In practice, process variation causes residual skew even with balanced trees.

4.4 Sequencing Methods

4.4.1 Two-Phase Non-Overlapping Clocks

Two non-overlapping phases \(\phi_1\) and \(\phi_2\) prevent races. Latches transparent to \(\phi_1\) capture data, then latches transparent to \(\phi_2\) propagate it. The non-overlap time provides hold time margin.

4.4.2 Pulsed Latches

A pulsed latch uses a very narrow clock pulse (generated from CLK by a pulse generator circuit) to capture data. The latch is transparent only for the duration of the pulse (sub-cycle). This allows time borrowing across stage boundaries: if one stage is fast and the next is slow, time borrowing can relax the constraint.

4.4.3 Schmitt Trigger

A Schmitt trigger is a comparator with hysteresis: it has two different switching thresholds \(V_{T+}\) (rising input) and \(V_{T-}\) (falling input), with \(V_{T+} > V_{T-}\). This provides noise immunity on slow or noisy input signals.

In CMOS, a Schmitt trigger is implemented by adding weak feedback transistors to an inverter. When the output is high, a weak PMOS assists the pull-up, raising the switching threshold for a falling input.

\[ \text{Hysteresis} = V_{T+} - V_{T-} \]

Chapter 5: CMOS Arithmetic Circuits

5.1 Binary Addition: Full Adder

A full adder computes sum \(S\) and carry-out \(C_{out}\) from inputs \(A\), \(B\), and carry-in \(C_{in}\):

\[ S = A \oplus B \oplus C_{in} \]\[ C_{out} = AB + BC_{in} + AC_{in} = AB + C_{in}(A \oplus B) \]

The generate signal \(G = AB\) and propagate signal \(P = A \oplus B\) (or \(P = A + B\) in some formulations) are the basis for fast carry generation.

5.2 Carry-Ripple Adder

In a carry-ripple adder (CRA), the carry-out of each stage feeds the carry-in of the next. For an \(n\)-bit CRA:

\[ t_{CRA} = t_{FA,setup} + (n-1) t_{carry} + t_{sum} \]

For a typical implementation where \(t_{carry}\) is the carry propagation delay through one full adder:

\[ t_{CRA} \approx n \cdot t_{FA} \]

The delay grows linearly with word width, making CRA impractical for wide datapaths at high frequency (e.g., 64-bit at 3 GHz). However, CRA is compact (area-efficient) and suitable for short word widths.

5.3 Carry-Lookahead Adder

The carry-lookahead adder (CLA) precomputes carry signals using generate and propagate logic:

\[ G_i = A_i B_i, \qquad P_i = A_i \oplus B_i \]

Carry at position \(i\):

\[ C_{i+1} = G_i + P_i C_i \]

Expanding recursively for a 4-bit group (bits 0–3):

\[ C_4 = G_3 + P_3 G_2 + P_3 P_2 G_1 + P_3 P_2 P_1 G_0 + P_3 P_2 P_1 P_0 C_0 \]

Group generate and propagate for the 4-bit block:

\[ G_{0:3} = G_3 + P_3 G_2 + P_3 P_2 G_1 + P_3 P_2 P_1 G_0 \]\[ P_{0:3} = P_3 P_2 P_1 P_0 \]

A 2-level CLA operates in \(O(\log n)\) delay:

\[ t_{CLA} = t_{pg} + 2 t_{lookahead} + t_{sum} \]

where \(t_{pg}\) is the propagate/generate computation time and \(t_{lookahead}\) is the lookahead logic level delay. For a 64-bit CLA:

\[ t_{CLA,64} \approx 5\text{–}7 \text{ gate delays (vs. } \sim 64 \text{ for CRA)} \]

5.4 Prefix Adder (Parallel Prefix)

Prefix adders (Han-Carlson, Kogge-Stone, Brent-Kung) compute all group generate and propagate signals in parallel using a tree structure. The prefix operation is associative:

\[ (G_{i:j}, P_{i:j}) \circ (G_{j-1:k}, P_{j-1:k}) = (G_{i:j} + P_{i:j} G_{j-1:k},\; P_{i:j} P_{j-1:k}) \]

For a Kogge-Stone adder with \(n\) bits: delay is \(O(\log_2 n)\) with \(O(n \log n)\) area. For \(n = 64\):

\[ t_{Kogge\text{-}Stone} = \log_2 64 = 6 \text{ prefix levels} + \text{sum stage} \]

5.5 Memory Circuits

5.5.1 SRAM Cell

The standard 6T SRAM cell consists of two cross-coupled inverters (the bistable element) and two access NMOS transistors controlled by the wordline WL.

During a read: WL is raised, both bitlines (BL, \(\overline{\text{BL}}\)) are precharged to \(V_{DD}\). The cell state discharges one bitline (say BL) through the access NMOS and pull-down NMOS. The sense amplifier detects the small differential voltage \(\Delta V \approx 100\text{–}200\) mV.

During a write: The write driver forces one bitline to 0 (and the other to \(V_{DD}\)). When WL is asserted, the bitline-to-0 overrides the cross-coupled feedback and flips the cell.

Cell Ratio (CR): The ratio of pull-down NMOS width to access NMOS width: \(CR = (W/L)_{pd} / (W/L)_{access} \geq 1.5\) ensures the cell node does not rise above \(V_{Tn}\) during a read (read stability).

Pull-Up Ratio (PR): \(PR = (W/L)_{pu} / (W/L)_{access} < 1\) ensures the access NMOS can overpower the pull-up PMOS during a write (write-ability).

5.5.2 DRAM Cell

A 1T-1C DRAM cell stores charge on a capacitor \(C_s\). The single access transistor connects the capacitor to the bitline. Data is destructively read (the bitline voltage changes due to charge sharing) and must be refreshed (rewritten) periodically (typ. every 64 ms). The sense voltage is:

\[ \Delta V_{BL} = \frac{C_s}{C_s + C_{BL}} (V_{cell} - V_{DD}/2) \]

DRAM provides high density (1T1C vs. 6T SRAM) but requires refresh logic and external DRAM controller.

5.6 Power Consumption of Arithmetic Circuits

In a 64-bit adder running at frequency \(f\) with activity factor \(\alpha\), the dynamic power is:

\[ P = \alpha \sum_i C_i V_{DD}^2 f \]

The switched capacitance \(\sum C_i\) is dominated by the bitlines in memory arrays and by wiring capacitance in arithmetic units. In a pipelined processor, the adder and memory subsystem can account for 30–50% of total chip power.

Chapter 6: Interconnect Parasitics

6.1 Wire Resistance

A metal wire of length \(l\), width \(w\), and sheet resistance \(R_\square\) has resistance:

\[ R_{wire} = R_\square \cdot \frac{l}{w} \]

In advanced CMOS processes (e.g., 7 nm), copper M1 has \(R_\square \approx 0.03\text{–}0.05\; \Omega/\square\). As feature sizes shrink, wire cross-sections decrease, increasing \(R_\square\) and wire resistance.

6.2 Wire Capacitance

Wire capacitance has two components:

Parallel-plate (area) capacitance to the substrate or neighboring metal layers: \(C_{area} = \varepsilon_{ox} A / t_{ox}\)
Fringe and lateral capacitance to adjacent wires at the same metal level

The total capacitance per unit length of a wire depends strongly on geometry and dielectric constant. In modern designs, lateral coupling capacitance between adjacent wires at minimum pitch can exceed area capacitance, causing crosstalk.

Wire Capacitance per Unit Length: \[ c_{wire} \approx \varepsilon_0 \varepsilon_r \left( \frac{w \cdot t_{metal}}{h^2} + 2.22 \left(\frac{s}{s+h}\right)^{0.222} + \ldots \right) \]

where \(s\) is the interlayer dielectric thickness, \(h\) is the height of the wire, and the bracketed terms approximate fringe effects. In practice, values are extracted using 2D/3D field solvers (e.g., FastCap).

6.3 Lumped RC Model and Elmore Delay

For short wires (\(l \ll \lambda_{signal}\)) a lumped RC model is adequate. The propagation delay for a wire modeled as a distributed RC transmission line is:

\[ t_{50\%} = 0.38 \cdot R_{wire} C_{wire} = 0.38 \cdot (r \cdot l)(c \cdot l) = 0.38 r c l^2 \]

where \(r\) and \(c\) are resistance and capacitance per unit length. The quadratic dependence on length \(l\) is critical: doubling the wire length quadruples the delay.

For a driver with output resistance \(R_s\) driving a wire of total resistance \(R_w\) and capacitance \(C_w\) and load \(C_L\):

\[ t_{pd} = 0.69 \left[ R_s (C_w + C_L) + \frac{R_w C_w}{2} + R_w C_L \right] \]

This is the Elmore delay for the lumped-distributed RC ladder network.

6.4 Repeater Insertion

For long global wires, inserting repeater buffers reduces the quadratic wire delay to linear in length. With \(k\) equally-spaced repeaters of optimal size \(h_{opt}\):

\[ h_{opt} = \sqrt{\frac{R_0 C_w}{r c} \cdot \frac{1}{R_0 C_0}} \]\[ t_{wire,min} \approx 2.5 \sqrt{r c} \cdot l \cdot \sqrt{R_0 C_0} \]

where \(R_0\) and \(C_0\) are the resistance and capacitance of a minimum-size inverter. Repeater insertion makes wire delay linear in length, enabling feasible cross-chip routing.

6.5 Transmission Line Effects

For very long wires (global busses) or high-frequency signals where \(t_{flight} = l/v \geq t_{rise}/2\), the wire behaves as a transmission line and must be terminated to prevent reflections.

The characteristic impedance of an on-chip microstrip or stripline is:

\[ Z_0 = \sqrt{\frac{L_{line}}{C_{line}}} \]

On-chip, \(Z_0\) is typically 20–100 \(\Omega\). At signal transition times in the picosecond range (multi-GHz operation), transmission line effects become important for wires longer than a few millimeters.

6.6 Crosstalk

Capacitive coupling between adjacent wires can cause signal integrity problems. The crosstalk noise voltage induced on an aggressor-victim pair is:

\[ V_{xtalk} = \frac{C_{coupling}}{C_{coupling} + C_{victim}} \Delta V_{aggressor} \]

Crosstalk can be reduced by: increasing wire pitch (spacing), inserting shield wires (ground lines), using orthogonal routing in adjacent metal layers, or reducing signal swing.

Chapter 7: Timing Design and Clocking

7.1 Synchronous Design Methodology

Modern VLSI chips are overwhelmingly synchronous: all state elements (flip-flops and latches) are governed by a global clock signal. The clock edge samples all flip-flop inputs simultaneously, ensuring deterministic operation.

A synchronous pipeline has \(N\) pipeline stages separated by flip-flops. The minimum clock period is set by the longest combinational path in any stage:

\[ T_{clock} = \max_i \left( t_{cq,i} + t_{logic,i} + t_{su,i} \right) + t_{skew} \]

7.2 Setup and Hold Time Analysis

7.2.1 Setup Time Violation

A setup violation occurs when combinational logic delay is too large and data does not arrive at the flip-flop input before the clock edge. Consequences: metastability, incorrect data capture.

Fix: reduce logic delay (optimize critical path), increase clock period (reduce frequency), add pipeline registers.

7.2.2 Hold Time Violation

A hold violation occurs when data propagates too quickly through a short combinational path and corrupts the just-captured value before it is stable. This is a functional error not correctable by slowing the clock.

\[ t_{cq} + t_{logic,min} < t_h + t_{skew} \]

Fix: insert delay buffers on fast paths, reduce clock skew.

7.3 Clock Skew and Jitter

Clock Skew: Deterministic variation in clock arrival time between different flip-flops, caused by asymmetric clock distribution network (wire length, buffer count). Skew is repeatable from cycle to cycle.

Clock Jitter: Cycle-to-cycle variation in clock period, caused by PLL noise, supply noise, and substrate coupling. Jitter is random and reduces the available timing margin.

The effective timing budget accounting for skew and jitter:

\[ t_{logic,max} \leq T_{clock} - t_{cq} - t_{su} - t_{skew} - t_{jitter} \]\[ t_{logic,min} \geq t_h + t_{skew} \]

7.4 Clock Generation: Phase-Locked Loop

A Phase-Locked Loop (PLL) multiplies a low-frequency reference clock to the chip’s operating frequency with controlled phase. A PLL consists of:

Phase-Frequency Detector (PFD): compares reference and feedback phases
Charge Pump and Loop Filter: generates control voltage
Voltage-Controlled Oscillator (VCO): output frequency proportional to control voltage
Frequency Divider: divides VCO output by \(N\) to create feedback

At lock, \(f_{out} = N \cdot f_{ref}\). PLLs reduce jitter from the clock distribution network and enable on-chip frequency synthesis.

7.5 Pipelining and Retiming

Adding pipeline registers to a combinational datapath allows higher clock frequency at the cost of increased latency. For a path with original delay \(T_{comb}\), inserting \(N\) pipeline stages reduces the clock period to \(T_{comb}/N\) (ideal) but adds \(N\) cycles of latency.

Retiming is a formal technique for moving flip-flops across combinational logic gates (without changing the function) to balance stage delays and maximize clock frequency. Register retiming can be formulated as a linear program.

Chapter 8: Physical Design Considerations

8.1 Layout and Design Rules

CMOS layout follows design rules that specify minimum feature sizes, spacings, and overlaps. Key rules include:

Minimum gate length \(L_{min}\) (defines technology node)
Poly-to-diffusion overlap, contact-to-edge spacing
Metal wire minimum width and spacing per layer
Via enclosure rules

Violations of design rules cause shorts or opens in fabrication. Design Rule Check (DRC) is performed by EDA tools before tapeout.

8.2 Cell-Based Design Flow

Standard cell design uses a library of pre-characterized logic gates (inverter, NAND, NOR, flip-flop, MUX, etc.) placed in rows. The design flow:

RTL coding (Verilog/VHDL): behavioral description
Logic synthesis: RTL → gate-level netlist
Floorplanning: placement of major blocks, I/O pads
Placement: standard cells placed in rows
Clock tree synthesis (CTS): balanced clock distribution
Routing: metal wires connecting cells
Sign-off: timing, DRC, LVS (layout-vs-schematic), IR drop analysis

8.3 Static Timing Analysis

Static Timing Analysis (STA) verifies timing closure without simulation. STA tools (e.g., Synopsys PrimeTime) compute:

Arrival time (AT): earliest time data can arrive at each node
Required time (RT): latest time data must arrive to meet setup constraints
Slack = RT − AT: positive slack means timing met; negative slack is a violation

STA performs worst-case (setup) and best-case (hold) analysis using process corners (slow/fast device models), voltage corners, and temperature corners (PVT variation).

8.4 Power Grid Design

The power grid supplies \(V_{DD}\) and GND to all cells through a mesh of metal wires. IR drop (resistive voltage drop in the power grid) reduces the effective supply voltage reaching cells, slowing them down and potentially causing timing failures.

\[ V_{eff} = V_{DD} - I_{peak} \cdot R_{grid} \]

Decoupling capacitors (decaps) placed near switching circuits reduce instantaneous IR drop by providing local charge storage.

Chapter 9: IC Testing

9.1 Testing Challenges

As circuit complexity grows (billions of transistors), testing becomes a dominant cost. The goal of testing is to detect manufacturing defects (stuck-at faults, bridging faults, open-circuit faults) that would cause a chip to malfunction.

9.2 Fault Models

Stuck-at-0 (SA0): A node is permanently shorted to GND, regardless of the circuit's intended output.

Stuck-at-1 (SA1): A node is permanently connected to \(V_{DD}\).

These stuck-at faults model open and bridging defects and are used in fault simulation and ATPG (Automatic Test Pattern Generation).

9.3 Controllability and Observability

A node is controllable if a test pattern can set it to a specific logic value. It is observable if its value can be propagated to a primary output. Hard-to-observe internal nodes require special testing techniques.

9.4 Scan Design (Design for Testability)

Scan chains replace ordinary flip-flops with scan flip-flops (which have an additional SI input and SE enable). All scan flip-flops are connected in a shift-register chain. During test mode:

Scan-in: test vectors are shifted serially into all flip-flops
Apply: one capture clock cycle applies the test to the combinational logic
Scan-out: responses are shifted out and compared to expected output

Scan design achieves near-100% stuck-at fault coverage with relatively few test pins, at the cost of additional flip-flop area (≈10–15%) and test time.

9.5 Built-In Self-Test (BIST)

BIST places test logic on-chip: a Linear Feedback Shift Register (LFSR) generates pseudo-random test patterns, and a Multiple-Input Signature Register (MISR) compresses the outputs into a compact signature. The signature is compared to a golden reference.

BIST is used for memory (MBIST) and logic (LBIST), enabling at-speed testing and field diagnostics.

Chapter 10: Summary — Design Principles and Tradeoffs

10.1 Performance–Power–Area Triangle

Every digital design decision involves navigating the performance–power–area (PPA) triangle:

Performance (speed): driven by transistor sizing, circuit topology, pipeline depth, and voltage
Power: dominated by \(P = \alpha C V_{DD}^2 f\); reduced by voltage scaling, clock gating, power gating
Area: set by transistor widths, wire routing, and design style (static vs. dynamic)

Optimizing one dimension typically degrades another. Multi-\(V_{DD}\) designs use high \(V_{DD}\) only on critical paths and low \(V_{DD}\) elsewhere, achieving the best PPA tradeoff.

10.2 Critical Path Analysis

The maximum operating frequency of any synchronous circuit is determined by its critical path — the longest combinational delay between any two sequentially adjacent storage elements. Critical path optimization techniques include:

Transistor sizing (logical effort framework)
Logic restructuring (reduce gate levels)
Technology mapping to fast library cells
Retiming and pipelining
Voltage overdrive on critical paths (adaptive voltage scaling)

10.3 Scaling and the End of Dennard Scaling

With the end of classical voltage scaling, power density is no longer constant across technology nodes. Modern approaches to continue performance scaling include:

3D integration: stacking dies with through-silicon vias (TSVs)
FinFET and GAA transistors: improved electrostatic control reducing leakage
Near-threshold computing (NTC): operating at \(V_{DD} \approx V_T\) for maximum energy efficiency
Dark silicon management: only fraction of chip active at full speed simultaneously
Domain-specific architectures: GPUs, NPUs, custom accelerators replacing general-purpose cores

ECE 445 bridges device physics (transistor I-V characteristics, subthreshold behavior) and digital systems (flip-flops, adders, pipelines), connecting SPICE-level circuit simulation to architectural performance analysis. Mastering both levels — and the translation between them through RC delay models, logical effort, and timing analysis — is the core skill of a VLSI digital designer.