ECE 445: Integrated Digital Electronics
Lan Wei
Estimated study time: 58 minutes
Table of contents
Sources and References
Primary textbooks — N. Weste and D. Harris, CMOS VLSI Design: A Circuits and Systems Perspective, 4th ed., Pearson/Addison-Wesley, 2010; A. S. Sedra and K. C. Smith, Microelectronic Circuits, 8th ed., Oxford University Press, 2020. Supplementary texts — J. M. Rabaey, A. Chandrakasan, and B. Nikolic, Digital Integrated Circuits: A Design Perspective, 2nd ed., Prentice Hall, 2003. Online resources — MIT OpenCourseWare 6.004 Computation Structures; Stanford EE271 VLSI Digital Circuits lecture notes (M. Horowitz).
Chapter 1: MOS Transistor Physics and Models
1.1 MOSFET Structure and Operation
The Metal-Oxide-Semiconductor Field-Effect Transistor (MOSFET) is the fundamental building block of modern digital integrated circuits. An NMOS transistor consists of a p-type silicon substrate with two heavily doped n+ regions (source and drain) separated by a channel region. A thin gate oxide (SiO2 or high-k dielectric) and a polysilicon (or metal) gate electrode sit above the channel.
The threshold voltage is expressed as:
\[ V_T = V_{T0} + \gamma \left( \sqrt{|2\phi_F + V_{SB}|} - \sqrt{|2\phi_F|} \right) \]where \(V_{T0}\) is the threshold at zero body bias, \(\gamma = \sqrt{2 q \varepsilon_{si} N_A} / C_{ox}\) is the body-effect coefficient, \(\phi_F = (kT/q)\ln(N_A / n_i)\) is the Fermi potential, and \(V_{SB}\) is the source-to-body voltage.
1.2 Long-Channel I-V Characteristics
1.2.1 Triode (Linear) Region
When \(V_{GS} > V_T\) and \(V_{DS} < V_{GS} - V_T\), the MOSFET operates in the triode region:
\[ I_D = \mu_n C_{ox} \frac{W}{L} \left[ (V_{GS} - V_T) V_{DS} - \frac{V_{DS}^2}{2} \right] \]For small \(V_{DS}\) (deep triode), this simplifies to the linear approximation:
\[ I_D \approx \mu_n C_{ox} \frac{W}{L} (V_{GS} - V_T) V_{DS} \]which defines an effective channel resistance:
\[ R_{on} = \frac{1}{\mu_n C_{ox} (W/L)(V_{GS} - V_T)} \]1.2.2 Saturation Region
When \(V_{DS} \geq V_{GS} - V_T\), the channel pinches off at the drain and \(I_D\) saturates:
\[ I_D = \frac{\mu_n C_{ox}}{2} \frac{W}{L} (V_{GS} - V_T)^2 (1 + \lambda V_{DS}) \]The factor \(\lambda\) (channel-length modulation parameter) accounts for the effective shortening of the channel as \(V_{DS}\) increases. In ideal long-channel analysis, \(\lambda = 0\) and \(I_D\) is independent of \(V_{DS}\) in saturation.
1.2.3 PMOS Transistor
A PMOS transistor has the dual structure: an n-type substrate, p+ source and drain, and a negative threshold voltage \(V_{TP} < 0\). Conduction occurs when \(V_{GS} < V_{TP}\). By convention the drain current flows from source to drain (in the direction of hole flow), and the I-V equations take the same form with \(\mu_p\), \(|V_{GS}|,\) and \(|V_T|\).
1.3 Short-Channel Effects
As channel lengths decrease below roughly 100 nm, several phenomena invalidate the long-channel square-law model.
1.3.1 Velocity Saturation
Carrier drift velocity does not increase linearly with electric field indefinitely. At high lateral electric field \(\mathcal{E}\), the velocity saturates at \(v_{sat} \approx 10^7\) cm/s for electrons in silicon. A simple two-piece linear model gives:
\[ v = \begin{cases} \mu_n \mathcal{E} & \mathcal{E} \leq \mathcal{E}_{sat} \\ v_{sat} & \mathcal{E} > \mathcal{E}_{sat} \end{cases} \]where \(\mathcal{E}_{sat} = v_{sat} / \mu_n\). Using a smooth model:
\[ v = \frac{\mu_n \mathcal{E}}{1 + \mathcal{E}/\mathcal{E}_{sat}} \]The drain current in saturation under velocity saturation becomes:
\[ I_{D,sat} = v_{sat} C_{ox} W (V_{GS} - V_T - V_{DSAT}) \]where \(V_{DSAT} = \mathcal{E}_{sat} L\). For short devices where \(V_{DSAT} \ll V_{GS} - V_T\), \(I_{D,sat} \approx v_{sat} C_{ox} W (V_{GS} - V_T)\), which is linear in overdrive rather than quadratic.
1.3.2 Drain-Induced Barrier Lowering (DIBL)
At short channel lengths, the drain electric field penetrates toward the source and lowers the potential barrier, reducing \(V_T\) as \(V_{DS}\) increases:
\[ \Delta V_T = -\eta \cdot V_{DS} \]where \(\eta\) (DIBL coefficient) is typically 20–100 mV/V. DIBL manifests as an increase in \(I_{off}\) and a degradation of the on/off ratio.
1.3.3 Subthreshold Conduction
Below \(V_T\), the transistor is not fully off. The subthreshold current follows an exponential dependence:
\[ I_D = I_0 \exp\!\left(\frac{V_{GS} - V_T}{n V_T^{th}}\right) \left(1 - e^{-V_{DS}/V_T^{th}}\right) \]where \(V_T^{th} = kT/q \approx 26\) mV at room temperature and \(n \approx 1.3\text{–}1.5\) is the ideality factor. The subthreshold slope \(S\) is:
\[ S = n \cdot \frac{kT}{q} \cdot \ln 10 \approx 60n \text{ mV/dec at 300 K} \]The theoretical minimum is 60 mV/decade for \(n = 1\). Subthreshold leakage is a critical concern in low-power design.
1.3.4 Hot Carriers and Oxide Reliability
High lateral electric fields near the drain can cause impact ionization, generating hot carriers that inject into the gate oxide, causing threshold voltage drift and long-term device degradation. Modern transistors use lightly doped drain (LDD) structures to reduce peak electric field.
1.4 SPICE Models
SPICE level 1 models use the simple long-channel equations. BSIM4 (Berkeley Short-Channel IGFET Model) is the industry standard for nanometer-scale transistors and captures all short-channel effects, including velocity saturation, DIBL, subthreshold slope, gate leakage, and noise. The SPICE model file provides parameters such as \(V_{TH0}\), \(k_1\) (body effect), \(\mu_{eff}\), \(C_{ox}\), \(t_{ox}\), and \(v_{sat}\).
1.5 Scaling Trends
Dennard scaling (1974) proposed that shrinking transistor dimensions by a factor \(\kappa\) while reducing voltages by \(\kappa\) keeps power density constant. In practice:
- Gate length scales as \(L \rightarrow L/\kappa\)
- Oxide thickness: \(t_{ox} \rightarrow t_{ox}/\kappa\)
- Supply voltage: \(V_{DD} \rightarrow V_{DD}/\kappa\)
- Transistor count per chip roughly doubles every 18–24 months (Moore’s Law)
Classical Dennard scaling has ended because voltage cannot keep pace with dimension shrinkage (limited by subthreshold slope and reliability). This leads to increasing power density (“dark silicon”) and motivates specialised architectures.
Chapter 2: CMOS Inverter
2.1 Inverter Circuit
The CMOS inverter consists of a complementary pair: one PMOS (pull-up network) with its source tied to \(V_{DD}\) and one NMOS (pull-down network) with its source tied to GND. The gates are tied together as the input \(A\), and the drains are tied together as the output \(Y\).
2.2 DC Transfer Characteristic
The voltage transfer characteristic (VTC) is obtained by sweeping \(V_{in}\) from 0 to \(V_{DD}\) and plotting \(V_{out}\). Five regions exist:
| Region | Condition | NMOS state | PMOS state |
|---|---|---|---|
| A | \(V_{in} < V_{Tn}\) | Off | Linear |
| B | \(V_{Tn} < V_{in} < V_{DD}/2\) | Saturation | Linear |
| C | \(V_{in} \approx V_{DD}/2\) | Saturation | Saturation |
| D | \(V_{DD}/2 < V_{in} < V_{DD}+V_{Tp}\) | Linear | Saturation |
| E | \(V_{in} > V_{DD}+V_{Tp}\) | Linear | Off |
The switching threshold \(V_M\) occurs when \(V_{in} = V_{out}\) and both transistors are in saturation. Setting \(I_{Dn} = |I_{Dp}|\):
\[ V_M = \frac{V_{Tn} + r\left(V_{DD} + V_{Tp}\right)}{1 + r}, \quad r = \sqrt{\frac{k_p (W/L)_p}{k_n (W/L)_n}} \]For a symmetric inverter with \(V_M = V_{DD}/2\), we need \(r = 1\), which requires:
\[ \frac{(W/L)_p}{(W/L)_n} = \frac{\mu_n}{\mu_p} \approx 2\text{–}3 \]since electron mobility \(\mu_n \approx 2.5\,\mu_p\) in bulk silicon.
2.3 Noise Margins
Noise margins quantify the inverter’s ability to tolerate noise on its input.
Noise Margin Low (NM\(_L\)): \(NM_L = V_{IL} - V_{OL}\)
where \(V_{OH}\) and \(V_{OL}\) are the output high and low levels, and \(V_{IH}\), \(V_{IL}\) are the input high and low thresholds defined as the points on the VTC where the gain is \(-1\) (slope = \(-1\)).
For a CMOS inverter driving rail-to-rail, \(V_{OH} = V_{DD}\) and \(V_{OL} = 0\). The noise margins are maximized for a symmetric VTC.
2.4 Propagation Delay
Propagation delay measures how quickly the output responds to an input transition. It is defined as the time from the 50% point of the input transition to the 50% point of the output transition.
\[ t_{pd} = \frac{t_{pHL} + t_{pLH}}{2} \]where:
- \(t_{pHL}\): output falling (high-to-low), NMOS pulls down the load capacitance \(C_L\)
- \(t_{pLH}\): output rising (low-to-high), PMOS charges \(C_L\)
2.4.1 RC Delay Model
The transistor in its on-state is approximated as a resistor \(R\) charging or discharging the load capacitance \(C_L\). For an RC circuit with a step input:
\[ t_{50\%} = \ln(2) \cdot RC \approx 0.69 RC \]The effective resistance of a minimum-size NMOS is \(R_n\) and of PMOS is \(R_p \approx 2R_n\). Propagation delay becomes:
\[ t_{pHL} = 0.69 R_n C_L, \qquad t_{pLH} = 0.69 R_p C_L \]2.4.2 Elmore Delay Model
The Elmore delay model extends the RC analysis to complex RC trees (series and parallel combinations of transistors). For a path from the output to supply/ground through a series chain of resistances \(R_1, R_2, \ldots, R_k\) with capacitances \(C_i\) at each node:
\[ t_{pd} = \ln(2) \sum_i R_{i,\text{shared}} C_i \]where \(R_{i,\text{shared}}\) is the total resistance on the path from the source to node \(i\). For a series stack of \(n\) transistors with resistance \(R\) each driving load \(C\):
\[ t_{pd} = 0.69 \cdot n R \cdot C \]This approximation is accurate to within 10–15% for typical digital gate topologies.
2.5 Rise and Fall Times
The 10%–90% rise time \(t_r\) and fall time \(t_f\) of a CMOS output are:
\[ t_r = 2.2 R_p C_L, \qquad t_f = 2.2 R_n C_L \]derived from \(V(t) = V_{DD}(1 - e^{-t/RC})\), solving for 90% and 10% points: \(t_{10\%\rightarrow 90\%} = RC(\ln 9) \approx 2.2 RC\).
2.6 Inverter Sizing
The delay of an inverter chain driving a large capacitive load \(C_L\) is minimized by tapering the sizes. If the total effort (logical effort times branching effort) is \(H\), and there are \(N\) stages, the optimal stage effort is \(\hat{f} \approx e \approx 2.718\) (approximately 4 in practice). The optimal number of stages for a given effort \(H\) is:
\[ N_{opt} = \log_{\hat{f}} H \]and the minimum delay is:
\[ D_{min} = N_{opt} \cdot \hat{f} \cdot \tau \]where \(\tau = 3RC\) is the intrinsic delay of the process. Each successive stage is scaled by a ratio of \(\hat{f}\) relative to the previous.
Chapter 3: CMOS Combinational Logic
3.1 Static CMOS Gate Structure
A static CMOS gate is composed of a pull-up network (PUN) of PMOS transistors and a pull-down network (PDN) of NMOS transistors. The PUN and PDN are dual networks:
- Series NMOS in PDN ↔ Parallel PMOS in PUN
- Parallel NMOS in PDN ↔ Series PMOS in PUN
NOR2 Gate: The pull-down network has two NMOS in parallel. The pull-up network has two PMOS in series. Output \(Y = \overline{A + B}\).
3.2 Logical Effort
Logical effort is a method for comparing the intrinsic speed of logic gates by characterizing how much more input capacitance a gate presents compared to a minimum inverter.
For an inverter: \(g = 1\). For a 2-input NAND: \(g = 4/3\). For a 2-input NOR: \(g = 5/3\).
Stage Effort \(f\): \(f = g \cdot h\).
Path Effort \(F\): \(F = G \cdot B \cdot H\), where \(G = \prod g_i\), \(B\) is branching effort, and \(H\) is the ratio of path output capacitance to input capacitance.
The delay of a single gate normalized to the inverter delay \(\tau\) is:
\[ d = g \cdot h + p \]where \(p\) is the intrinsic (parasitic) delay of the gate (due to internal capacitances). For an inverter \(p = 1\); for NAND2, \(p \approx 2\); for NOR2, \(p \approx 2\).
3.3 Delay Characterization and Optimization
3.3.1 Series Stack Resistance
In a NAND gate with \(n\) inputs, the NMOS transistors are stacked in series. Each transistor must be sized up by a factor of \(n\) relative to a minimum inverter to maintain equivalent drive strength:
\[ (W/L)_{PDN} = n \cdot (W/L)_{n,min} \]This increases input capacitance and must be traded off against propagation delay. The PMOS transistors in the PUN remain sized at \(2\times\) the minimum NMOS width.
3.3.2 Transistor Ordering (Input Ordering)
In a gate with inputs of different arrival times, place the latest-arriving signal closest to the output node (closest to the drain in the series stack). This minimizes the effective capacitance that the late signal must charge/discharge.
3.3.3 Path Delay Optimization
For a multi-stage combinational logic path, the total delay is minimized when all stage efforts are equal. Given path effort \(F\) over \(N\) stages:
\[ f_1 = f_2 = \cdots = f_N = F^{1/N} \]The transistor widths are found by working backward from the output load through the chain.
3.4 Power Consumption in CMOS
Total power consumption in a CMOS circuit has three components:
3.4.1 Dynamic (Switching) Power
Every time a node switches from low to high, energy \(C V_{DD}^2\) is drawn from the supply (half stored in the capacitor, half dissipated in the PMOS). When the node switches high to low, the energy stored in the capacitor is dissipated in the NMOS. The average dynamic power is:
\[ P_{dynamic} = \alpha C_L V_{DD}^2 f \]where:
- \(\alpha \in \left[0, 1\right]\) is the activity factor (probability of a power-consuming transition per clock cycle)
- \(C_L\) is the switched load capacitance
- \(V_{DD}\) is the supply voltage
- \(f\) is the clock frequency
The quadratic dependence on \(V_{DD}\) makes voltage scaling the most powerful knob for power reduction.
3.4.2 Short-Circuit Power
During a transition, both PMOS and NMOS may be simultaneously conducting for a brief interval (when \(V_{Tn} < V_{in} < V_{DD} - |V_{Tp}|\), both devices are on). This creates a short-circuit current path from \(V_{DD}\) to GND:
\[ P_{sc} = \frac{\beta}{12}(V_{DD} - V_T)^3 t_{sc} f \]Short-circuit power is typically 5–10% of total power for well-matched rise/fall times. It is minimized by ensuring \(t_{rise} = t_{fall}\) of input signals.
3.4.3 Leakage Power
Static (leakage) power arises even when the circuit is not switching. The dominant leakage mechanisms are:
- Subthreshold leakage: exponential in \(V_T\); decreases with higher \(V_T\) but increases with \(V_{DD}\) reduction
- Gate leakage: quantum mechanical tunneling through ultra-thin gate dielectrics (\(t_{ox} < 2\) nm)
- Junction leakage: reverse-biased p-n junction diode leakage (relatively small)
Multi-threshold CMOS (MTCMOS) uses high-\(V_T\) transistors on non-critical paths and sleep transistors to cut leakage during standby.
3.5 Pass-Transistor Logic
Pass-transistor logic (PTL) passes logic values through transistor switches (rather than connecting outputs directly to \(V_{DD}\) or GND). An NMOS pass transistor connecting input \(A\) to output has \(R_{on} = 1/\left(\mu_n C_{ox}(W/L)(V_{GS}-V_T)\right)\), but suffers a threshold drop: the output only reaches \(V_{DD} - V_{Tn}\) when transmitting a high, causing reduced noise margin and increased delay for downstream gates.
3.6 Dynamic CMOS Logic
Dynamic CMOS logic evaluates during a clock phase (evaluate phase, \(\phi = 1\) for NMOS) and precharges the output during the precharge phase (\(\phi = 0\).
3.6.1 Dynamic Gate Operation
A dynamic gate consists of:
- A precharge PMOS transistor (gate = \(\phi\)) connecting output to \(V_{DD}\)
- A PDN of NMOS transistors implementing the logic function
- A footer NMOS transistor (gate = \(\phi\)) connecting PDN to GND
During precharge (\(\phi = 0\)): output precharges to \(V_{DD}\), PDN is disconnected (footer off). During evaluate (\(\phi = 1\)): precharge PMOS is off, PDN evaluates. If the inputs make the PDN conducting, output discharges to 0; otherwise it remains high.
3.6.2 Domino Logic
Domino logic resolves the cascading problem by following each dynamic gate with a static inverter. The static inverter output goes from high to low (if the dynamic gate output goes low), and this falling edge can trigger the next dynamic stage:
Dynamic gate → static inverter → dynamic gate → static inverter → …
Domino logic achieves high speed (each stage has only one device type in the critical pull-down path) but requires additional inverter area.
Chapter 4: CMOS Sequential Logic
4.1 Latches and Flip-Flops
4.1.1 SR Latch
The SR latch (Set-Reset) stores a single bit using cross-coupled NAND (or NOR) gates. It has two stable states. The forbidden state (S=R=1 for NAND, S=R=0 for NOR) must be avoided.
4.1.2 D Latch (Level-Sensitive)
A D latch is transparent when the clock \(\phi = 1\): output Q follows input D. When \(\phi = 0\), Q holds its last value. Implemented in CMOS with a transmission gate (TG) and an inverter feedback pair:
- When \(\phi = 1\): TG passes D to Q
- When \(\phi = 0\): TG opens, feedback inverter holds state
The D latch is a level-sensitive storage element and is sensitive to glitches during the transparent phase.
4.1.3 D Flip-Flop (Edge-Triggered)
A master-slave D flip-flop captures input D on the rising edge of the clock. It consists of two cascaded D latches:
- Master latch: transparent when \(\phi = 0\) (samples D)
- Slave latch: transparent when \(\phi = 1\) (propagates to Q)
The output Q changes only at the clock edge, suppressing glitches. A static CMOS D flip-flop is implemented using two TG-based latches with cross-coupled feedback inverters.
4.2 Timing Constraints
Hold Time \(t_h\): The minimum time that D must remain stable after the clock edge.
Clock-to-Q Delay \(t_{cq}\): Time from the active clock edge until the output Q reaches its valid logic level.
These three parameters determine the timing budget of a synchronous pipeline:
\[ t_{clock} \geq t_{cq} + t_{logic} + t_{su} \]where \(t_{logic}\) is the worst-case combinational delay between flip-flops and \(t_{clock}\) is the clock period. The maximum operating frequency is:
\[ f_{max} = \frac{1}{t_{cq} + t_{logic,max} + t_{su}} \]4.3 Timing Issues: Clock Skew
Clock skew \(\delta\) is the spatial variation in clock arrival time across different flip-flops on the chip. If the receiving flip-flop clock leads the launching flip-flop clock by \(\delta_{pos}\) (positive skew), the timing constraint tightens:
\[ t_{clock} \geq t_{cq} + t_{logic} + t_{su} - \delta_{pos} \]Negative skew (clock arrives later at the receiver) can violate the hold time constraint:
\[ t_{cq} + t_{logic,min} \geq t_h + \delta_{neg} \]4.3.1 Clock Distribution
A balanced H-tree or buffered clock tree distributes the clock with minimal skew. Each branch of the H-tree has equal wire length to each leaf flip-flop, ensuring equal propagation delay. In practice, process variation causes residual skew even with balanced trees.
4.4 Sequencing Methods
4.4.1 Two-Phase Non-Overlapping Clocks
Two non-overlapping phases \(\phi_1\) and \(\phi_2\) prevent races. Latches transparent to \(\phi_1\) capture data, then latches transparent to \(\phi_2\) propagate it. The non-overlap time provides hold time margin.
4.4.2 Pulsed Latches
A pulsed latch uses a very narrow clock pulse (generated from CLK by a pulse generator circuit) to capture data. The latch is transparent only for the duration of the pulse (sub-cycle). This allows time borrowing across stage boundaries: if one stage is fast and the next is slow, time borrowing can relax the constraint.
4.4.3 Schmitt Trigger
A Schmitt trigger is a comparator with hysteresis: it has two different switching thresholds \(V_{T+}\) (rising input) and \(V_{T-}\) (falling input), with \(V_{T+} > V_{T-}\). This provides noise immunity on slow or noisy input signals.
In CMOS, a Schmitt trigger is implemented by adding weak feedback transistors to an inverter. When the output is high, a weak PMOS assists the pull-up, raising the switching threshold for a falling input.
\[ \text{Hysteresis} = V_{T+} - V_{T-} \]Chapter 5: CMOS Arithmetic Circuits
5.1 Binary Addition: Full Adder
A full adder computes sum \(S\) and carry-out \(C_{out}\) from inputs \(A\), \(B\), and carry-in \(C_{in}\):
\[ S = A \oplus B \oplus C_{in} \]\[ C_{out} = AB + BC_{in} + AC_{in} = AB + C_{in}(A \oplus B) \]The generate signal \(G = AB\) and propagate signal \(P = A \oplus B\) (or \(P = A + B\) in some formulations) are the basis for fast carry generation.
5.2 Carry-Ripple Adder
In a carry-ripple adder (CRA), the carry-out of each stage feeds the carry-in of the next. For an \(n\)-bit CRA:
\[ t_{CRA} = t_{FA,setup} + (n-1) t_{carry} + t_{sum} \]For a typical implementation where \(t_{carry}\) is the carry propagation delay through one full adder:
\[ t_{CRA} \approx n \cdot t_{FA} \]The delay grows linearly with word width, making CRA impractical for wide datapaths at high frequency (e.g., 64-bit at 3 GHz). However, CRA is compact (area-efficient) and suitable for short word widths.
5.3 Carry-Lookahead Adder
The carry-lookahead adder (CLA) precomputes carry signals using generate and propagate logic:
\[ G_i = A_i B_i, \qquad P_i = A_i \oplus B_i \]Carry at position \(i\):
\[ C_{i+1} = G_i + P_i C_i \]Expanding recursively for a 4-bit group (bits 0–3):
\[ C_4 = G_3 + P_3 G_2 + P_3 P_2 G_1 + P_3 P_2 P_1 G_0 + P_3 P_2 P_1 P_0 C_0 \]Group generate and propagate for the 4-bit block:
\[ G_{0:3} = G_3 + P_3 G_2 + P_3 P_2 G_1 + P_3 P_2 P_1 G_0 \]\[ P_{0:3} = P_3 P_2 P_1 P_0 \]A 2-level CLA operates in \(O(\log n)\) delay:
\[ t_{CLA} = t_{pg} + 2 t_{lookahead} + t_{sum} \]where \(t_{pg}\) is the propagate/generate computation time and \(t_{lookahead}\) is the lookahead logic level delay. For a 64-bit CLA:
\[ t_{CLA,64} \approx 5\text{–}7 \text{ gate delays (vs. } \sim 64 \text{ for CRA)} \]5.4 Prefix Adder (Parallel Prefix)
Prefix adders (Han-Carlson, Kogge-Stone, Brent-Kung) compute all group generate and propagate signals in parallel using a tree structure. The prefix operation is associative:
\[ (G_{i:j}, P_{i:j}) \circ (G_{j-1:k}, P_{j-1:k}) = (G_{i:j} + P_{i:j} G_{j-1:k},\; P_{i:j} P_{j-1:k}) \]For a Kogge-Stone adder with \(n\) bits: delay is \(O(\log_2 n)\) with \(O(n \log n)\) area. For \(n = 64\):
\[ t_{Kogge\text{-}Stone} = \log_2 64 = 6 \text{ prefix levels} + \text{sum stage} \]5.5 Memory Circuits
5.5.1 SRAM Cell
The standard 6T SRAM cell consists of two cross-coupled inverters (the bistable element) and two access NMOS transistors controlled by the wordline WL.
During a read: WL is raised, both bitlines (BL, \(\overline{\text{BL}}\)) are precharged to \(V_{DD}\). The cell state discharges one bitline (say BL) through the access NMOS and pull-down NMOS. The sense amplifier detects the small differential voltage \(\Delta V \approx 100\text{–}200\) mV.
During a write: The write driver forces one bitline to 0 (and the other to \(V_{DD}\)). When WL is asserted, the bitline-to-0 overrides the cross-coupled feedback and flips the cell.
Pull-Up Ratio (PR): \(PR = (W/L)_{pu} / (W/L)_{access} < 1\) ensures the access NMOS can overpower the pull-up PMOS during a write (write-ability).
5.5.2 DRAM Cell
A 1T-1C DRAM cell stores charge on a capacitor \(C_s\). The single access transistor connects the capacitor to the bitline. Data is destructively read (the bitline voltage changes due to charge sharing) and must be refreshed (rewritten) periodically (typ. every 64 ms). The sense voltage is:
\[ \Delta V_{BL} = \frac{C_s}{C_s + C_{BL}} (V_{cell} - V_{DD}/2) \]DRAM provides high density (1T1C vs. 6T SRAM) but requires refresh logic and external DRAM controller.
5.6 Power Consumption of Arithmetic Circuits
In a 64-bit adder running at frequency \(f\) with activity factor \(\alpha\), the dynamic power is:
\[ P = \alpha \sum_i C_i V_{DD}^2 f \]The switched capacitance \(\sum C_i\) is dominated by the bitlines in memory arrays and by wiring capacitance in arithmetic units. In a pipelined processor, the adder and memory subsystem can account for 30–50% of total chip power.
Chapter 6: Interconnect Parasitics
6.1 Wire Resistance
A metal wire of length \(l\), width \(w\), and sheet resistance \(R_\square\) has resistance:
\[ R_{wire} = R_\square \cdot \frac{l}{w} \]In advanced CMOS processes (e.g., 7 nm), copper M1 has \(R_\square \approx 0.03\text{–}0.05\; \Omega/\square\). As feature sizes shrink, wire cross-sections decrease, increasing \(R_\square\) and wire resistance.
6.2 Wire Capacitance
Wire capacitance has two components:
- Parallel-plate (area) capacitance to the substrate or neighboring metal layers: \(C_{area} = \varepsilon_{ox} A / t_{ox}\)
- Fringe and lateral capacitance to adjacent wires at the same metal level
The total capacitance per unit length of a wire depends strongly on geometry and dielectric constant. In modern designs, lateral coupling capacitance between adjacent wires at minimum pitch can exceed area capacitance, causing crosstalk.
where \(s\) is the interlayer dielectric thickness, \(h\) is the height of the wire, and the bracketed terms approximate fringe effects. In practice, values are extracted using 2D/3D field solvers (e.g., FastCap).
6.3 Lumped RC Model and Elmore Delay
For short wires (\(l \ll \lambda_{signal}\)) a lumped RC model is adequate. The propagation delay for a wire modeled as a distributed RC transmission line is:
\[ t_{50\%} = 0.38 \cdot R_{wire} C_{wire} = 0.38 \cdot (r \cdot l)(c \cdot l) = 0.38 r c l^2 \]where \(r\) and \(c\) are resistance and capacitance per unit length. The quadratic dependence on length \(l\) is critical: doubling the wire length quadruples the delay.
For a driver with output resistance \(R_s\) driving a wire of total resistance \(R_w\) and capacitance \(C_w\) and load \(C_L\):
\[ t_{pd} = 0.69 \left[ R_s (C_w + C_L) + \frac{R_w C_w}{2} + R_w C_L \right] \]This is the Elmore delay for the lumped-distributed RC ladder network.
6.4 Repeater Insertion
For long global wires, inserting repeater buffers reduces the quadratic wire delay to linear in length. With \(k\) equally-spaced repeaters of optimal size \(h_{opt}\):
\[ h_{opt} = \sqrt{\frac{R_0 C_w}{r c} \cdot \frac{1}{R_0 C_0}} \]\[ t_{wire,min} \approx 2.5 \sqrt{r c} \cdot l \cdot \sqrt{R_0 C_0} \]where \(R_0\) and \(C_0\) are the resistance and capacitance of a minimum-size inverter. Repeater insertion makes wire delay linear in length, enabling feasible cross-chip routing.
6.5 Transmission Line Effects
For very long wires (global busses) or high-frequency signals where \(t_{flight} = l/v \geq t_{rise}/2\), the wire behaves as a transmission line and must be terminated to prevent reflections.
The characteristic impedance of an on-chip microstrip or stripline is:
\[ Z_0 = \sqrt{\frac{L_{line}}{C_{line}}} \]On-chip, \(Z_0\) is typically 20–100 \(\Omega\). At signal transition times in the picosecond range (multi-GHz operation), transmission line effects become important for wires longer than a few millimeters.
6.6 Crosstalk
Capacitive coupling between adjacent wires can cause signal integrity problems. The crosstalk noise voltage induced on an aggressor-victim pair is:
\[ V_{xtalk} = \frac{C_{coupling}}{C_{coupling} + C_{victim}} \Delta V_{aggressor} \]Crosstalk can be reduced by: increasing wire pitch (spacing), inserting shield wires (ground lines), using orthogonal routing in adjacent metal layers, or reducing signal swing.
Chapter 7: Timing Design and Clocking
7.1 Synchronous Design Methodology
Modern VLSI chips are overwhelmingly synchronous: all state elements (flip-flops and latches) are governed by a global clock signal. The clock edge samples all flip-flop inputs simultaneously, ensuring deterministic operation.
A synchronous pipeline has \(N\) pipeline stages separated by flip-flops. The minimum clock period is set by the longest combinational path in any stage:
\[ T_{clock} = \max_i \left( t_{cq,i} + t_{logic,i} + t_{su,i} \right) + t_{skew} \]7.2 Setup and Hold Time Analysis
7.2.1 Setup Time Violation
A setup violation occurs when combinational logic delay is too large and data does not arrive at the flip-flop input before the clock edge. Consequences: metastability, incorrect data capture.
Fix: reduce logic delay (optimize critical path), increase clock period (reduce frequency), add pipeline registers.
7.2.2 Hold Time Violation
A hold violation occurs when data propagates too quickly through a short combinational path and corrupts the just-captured value before it is stable. This is a functional error not correctable by slowing the clock.
\[ t_{cq} + t_{logic,min} < t_h + t_{skew} \]Fix: insert delay buffers on fast paths, reduce clock skew.
7.3 Clock Skew and Jitter
Clock Jitter: Cycle-to-cycle variation in clock period, caused by PLL noise, supply noise, and substrate coupling. Jitter is random and reduces the available timing margin.
The effective timing budget accounting for skew and jitter:
\[ t_{logic,max} \leq T_{clock} - t_{cq} - t_{su} - t_{skew} - t_{jitter} \]\[ t_{logic,min} \geq t_h + t_{skew} \]7.4 Clock Generation: Phase-Locked Loop
A Phase-Locked Loop (PLL) multiplies a low-frequency reference clock to the chip’s operating frequency with controlled phase. A PLL consists of:
- Phase-Frequency Detector (PFD): compares reference and feedback phases
- Charge Pump and Loop Filter: generates control voltage
- Voltage-Controlled Oscillator (VCO): output frequency proportional to control voltage
- Frequency Divider: divides VCO output by \(N\) to create feedback
At lock, \(f_{out} = N \cdot f_{ref}\). PLLs reduce jitter from the clock distribution network and enable on-chip frequency synthesis.
7.5 Pipelining and Retiming
Adding pipeline registers to a combinational datapath allows higher clock frequency at the cost of increased latency. For a path with original delay \(T_{comb}\), inserting \(N\) pipeline stages reduces the clock period to \(T_{comb}/N\) (ideal) but adds \(N\) cycles of latency.
Retiming is a formal technique for moving flip-flops across combinational logic gates (without changing the function) to balance stage delays and maximize clock frequency. Register retiming can be formulated as a linear program.
Chapter 8: Physical Design Considerations
8.1 Layout and Design Rules
CMOS layout follows design rules that specify minimum feature sizes, spacings, and overlaps. Key rules include:
- Minimum gate length \(L_{min}\) (defines technology node)
- Poly-to-diffusion overlap, contact-to-edge spacing
- Metal wire minimum width and spacing per layer
- Via enclosure rules
Violations of design rules cause shorts or opens in fabrication. Design Rule Check (DRC) is performed by EDA tools before tapeout.
8.2 Cell-Based Design Flow
Standard cell design uses a library of pre-characterized logic gates (inverter, NAND, NOR, flip-flop, MUX, etc.) placed in rows. The design flow:
- RTL coding (Verilog/VHDL): behavioral description
- Logic synthesis: RTL → gate-level netlist
- Floorplanning: placement of major blocks, I/O pads
- Placement: standard cells placed in rows
- Clock tree synthesis (CTS): balanced clock distribution
- Routing: metal wires connecting cells
- Sign-off: timing, DRC, LVS (layout-vs-schematic), IR drop analysis
8.3 Static Timing Analysis
Static Timing Analysis (STA) verifies timing closure without simulation. STA tools (e.g., Synopsys PrimeTime) compute:
- Arrival time (AT): earliest time data can arrive at each node
- Required time (RT): latest time data must arrive to meet setup constraints
- Slack = RT − AT: positive slack means timing met; negative slack is a violation
STA performs worst-case (setup) and best-case (hold) analysis using process corners (slow/fast device models), voltage corners, and temperature corners (PVT variation).
8.4 Power Grid Design
The power grid supplies \(V_{DD}\) and GND to all cells through a mesh of metal wires. IR drop (resistive voltage drop in the power grid) reduces the effective supply voltage reaching cells, slowing them down and potentially causing timing failures.
\[ V_{eff} = V_{DD} - I_{peak} \cdot R_{grid} \]Decoupling capacitors (decaps) placed near switching circuits reduce instantaneous IR drop by providing local charge storage.
Chapter 9: IC Testing
9.1 Testing Challenges
As circuit complexity grows (billions of transistors), testing becomes a dominant cost. The goal of testing is to detect manufacturing defects (stuck-at faults, bridging faults, open-circuit faults) that would cause a chip to malfunction.
9.2 Fault Models
Stuck-at-1 (SA1): A node is permanently connected to \(V_{DD}\).
These stuck-at faults model open and bridging defects and are used in fault simulation and ATPG (Automatic Test Pattern Generation).
9.3 Controllability and Observability
A node is controllable if a test pattern can set it to a specific logic value. It is observable if its value can be propagated to a primary output. Hard-to-observe internal nodes require special testing techniques.
9.4 Scan Design (Design for Testability)
Scan chains replace ordinary flip-flops with scan flip-flops (which have an additional SI input and SE enable). All scan flip-flops are connected in a shift-register chain. During test mode:
- Scan-in: test vectors are shifted serially into all flip-flops
- Apply: one capture clock cycle applies the test to the combinational logic
- Scan-out: responses are shifted out and compared to expected output
Scan design achieves near-100% stuck-at fault coverage with relatively few test pins, at the cost of additional flip-flop area (≈10–15%) and test time.
9.5 Built-In Self-Test (BIST)
BIST places test logic on-chip: a Linear Feedback Shift Register (LFSR) generates pseudo-random test patterns, and a Multiple-Input Signature Register (MISR) compresses the outputs into a compact signature. The signature is compared to a golden reference.
BIST is used for memory (MBIST) and logic (LBIST), enabling at-speed testing and field diagnostics.
Chapter 10: Summary — Design Principles and Tradeoffs
10.1 Performance–Power–Area Triangle
Every digital design decision involves navigating the performance–power–area (PPA) triangle:
- Performance (speed): driven by transistor sizing, circuit topology, pipeline depth, and voltage
- Power: dominated by \(P = \alpha C V_{DD}^2 f\); reduced by voltage scaling, clock gating, power gating
- Area: set by transistor widths, wire routing, and design style (static vs. dynamic)
Optimizing one dimension typically degrades another. Multi-\(V_{DD}\) designs use high \(V_{DD}\) only on critical paths and low \(V_{DD}\) elsewhere, achieving the best PPA tradeoff.
10.2 Critical Path Analysis
The maximum operating frequency of any synchronous circuit is determined by its critical path — the longest combinational delay between any two sequentially adjacent storage elements. Critical path optimization techniques include:
- Transistor sizing (logical effort framework)
- Logic restructuring (reduce gate levels)
- Technology mapping to fast library cells
- Retiming and pipelining
- Voltage overdrive on critical paths (adaptive voltage scaling)
10.3 Scaling and the End of Dennard Scaling
With the end of classical voltage scaling, power density is no longer constant across technology nodes. Modern approaches to continue performance scaling include:
- 3D integration: stacking dies with through-silicon vias (TSVs)
- FinFET and GAA transistors: improved electrostatic control reducing leakage
- Near-threshold computing (NTC): operating at \(V_{DD} \approx V_T\) for maximum energy efficiency
- Dark silicon management: only fraction of chip active at full speed simultaneously
- Domain-specific architectures: GPUs, NPUs, custom accelerators replacing general-purpose cores