BME 393: Digital Systems

Bill Bishop

Estimated study time: 22 minutes

Table of contents

Sources and References

These notes synthesize standard material from digital systems pedagogy: M. Morris Mano and Michael Ciletti’s Digital Design, John Wakerly’s Digital Design: Principles and Practices, Sarah and David Harris’s Digital Design and Computer Architecture, and Katz and Borriello’s Contemporary Logic Design. Additional framing draws on open courseware from MIT 6.004 (Computation Structures), MIT 6.111 (Introductory Digital Systems Laboratory), Stanford EE108 (Digital Systems I), and the University of Cambridge Part IB Digital Electronics notes. The biomedical framing of each topic follows the course’s own orientation toward physiological monitoring and clinical instrumentation.

Chapter 1: Digital Abstraction, Number Systems, and Hardware Primitives

Digital systems succeed because they impose a discrete abstraction on top of an inherently continuous physical substrate. A wire carries a voltage that varies smoothly in time, but a digital receiver interprets that voltage as one of two symbols by comparing it to threshold voltages \( V_{IL} \) and \( V_{IH} \). The separating noise margin, the gap between the highest guaranteed logic-0 output and the highest accepted logic-0 input, is what lets a signal travel through many gates and still be regenerated into a clean binary value. This regeneration property is why digital electronics can be stacked into systems of billions of transistors without drowning in accumulated noise, a robustness property that matters enormously in a clinical setting where a pacemaker or infusion pump must keep computing correctly in the presence of electromagnetic interference.

Because the hardware speaks in two symbols, every numerical quantity must be encoded as a string of bits. Unsigned binary represents \( N \) in positional form with weights that are powers of two, so an \( n \)-bit field covers \( 0 \) through \( 2^n - 1 \). Signed values are almost always stored in two’s complement because the same adder hardware works for both signed and unsigned operands and there is a single representation of zero. In an \( n \)-bit two’s complement field the most significant bit carries weight \( -2^{n-1} \), and the negation of a value is obtained by inverting all bits and adding one. Hexadecimal and octal are compact human-readable shorthands that group four or three bits at a time, and binary-coded decimal stores each decimal digit in its own four-bit field so that instrument displays, which think in decimal, can be driven without costly binary-to-decimal conversion.

Fixed-point fractions push the implicit binary point partway through the word, which is the representation of choice in low-cost embedded DSP where a floating-point unit would be an unjustified expense. When wider dynamic range is required, IEEE 754 floating-point stores a biased exponent and a normalized fraction; the small but nonzero representation error of floating arithmetic is what forces safety-critical biomedical code to think carefully about rounding.

Physically, the logic-level abstraction is implemented with complementary metal-oxide-semiconductor transistors. In CMOS each gate has a pull-up network of PMOS transistors and a dual pull-down network of NMOS transistors, wired so that exactly one network conducts for any input combination. The output is thus actively driven to the rail that matches the correct logic value, the static current draw is nearly zero, and power is consumed mainly as dynamic switching energy \( P_{dyn} = \alpha C V^2 f \). This quadratic dependence on supply voltage is why modern biomedical wearables push \( V_{DD} \) as low as the noise margin allows.

Chapter 2: Boolean Algebra and Canonical Forms

A combinational circuit computes a Boolean function of its inputs, and the algebraic laws due to Boole and systematized by Shannon are the grammar that lets us rewrite such functions into equivalent but cheaper implementations. The fundamental identities are commutativity, associativity, and distributivity of AND over OR and of OR over AND, together with DeMorgan’s laws \( \overline{A \cdot B} = \overline{A} + \overline{B} \) and \( \overline{A + B} = \overline{A} \cdot \overline{B} \). DeMorgan’s laws are more than algebraic curiosities; they are the reason that a NAND gate and a NOR gate are each functionally complete, so a whole digital system can in principle be built from just one kind of gate.

Any Boolean function of \( n \) variables can be written in two canonical normal forms. The sum-of-products form is a disjunction of minterms, where a minterm is a product in which every input variable appears exactly once either complemented or uncomplemented. The product-of-sums form is dually a conjunction of maxterms. Writing a function from its truth table is therefore mechanical: read off the rows that evaluate to one to get the minterms, or the rows that evaluate to zero to get the maxterms. The canonical form is unique but is rarely minimal, so the next problem is to simplify it without changing the function it realizes.

Chapter 3: Gray Code, Karnaugh Maps, and Don’t-Care Minimization

The Karnaugh map, introduced by Maurice Karnaugh in 1953, is a visual reorganization of the truth table that exploits the adjacency theorem \( AB + A\overline{B} = A \). The rows and columns of the map are labelled in Gray code so that any two neighbouring cells differ in exactly one input variable, meaning that a pair of adjacent ones in the map always collapses into a single smaller product term. Wrapping the map top-to-bottom and edge-to-edge makes the adjacency toroidal, so a four-variable map has implicants of size one, two, four, eight, and sixteen cells. Grouping the ones into the largest possible rectangles of size \( 2^k \) and then choosing a minimal cover of prime implicants yields the minimum sum-of-products expression.

Gray code matters independently of minimization. An optical rotary encoder mounted on a surgical robot reports shaft angle by lighting detectors through a patterned disk; if that pattern were ordinary binary, the transition from \( 011 \) to \( 100 \) would flip three bits simultaneously, and the slightest timing mismatch between detectors could momentarily read an arbitrary value. Gray code guarantees that only one bit changes per step, so the worst case read is either the old angle or the new, never a spurious intermediate.

Most real truth tables contain don’t-care rows, which correspond to input combinations that the surrounding logic can never produce or whose outputs the downstream system does not examine. Marking these rows with a dash and allowing the K-map grouping algorithm to treat them as ones or zeros whichever is convenient is the cheapest possible form of optimization, and it routinely cuts gate count by a significant fraction. When several outputs share the same input vector, as in a seven-segment decoder, joint minimization can share product terms across the outputs and further reduce area.

Chapter 4: Multiplexers, Decoders, and Timing Analysis

Medium-scale combinational building blocks let a designer think above the level of individual gates. A multiplexer selects one of \( 2^k \) data inputs onto a single output under control of a \( k \)-bit select. Because any Boolean function of \( n \) variables can be written as a truth table and a truth table is essentially a table of constants selected by the input vector, an \( n \)-input multiplexer with the data lines tied to constants or to one of the input variables can implement any \( n \)-input Boolean function, a result that underlies the lookup-table structure of modern field-programmable gate arrays. A decoder is the dual: it asserts exactly one of \( 2^k \) output lines based on the binary value on its \( k \)-bit input, and a decoder followed by an OR tree is yet another universal realization of a sum-of-products.

Real gates, however, take time to respond. The propagation delay \( t_{pd} \) is the maximum elapsed time from an input transition until the corresponding output has settled, and the contamination delay \( t_{cd} \) is the minimum time before the output begins to move at all. These two numbers bracket the window during which the output is in an indeterminate state. Along a combinational path the total propagation delay is the sum of the gate delays on the path, and the critical path of a block is whichever path has the largest such sum. Hazards, meaning transient glitches that appear on an output even when the steady-state logic says it should not change, arise when two paths of different delay converge on the same gate; a static hazard can be suppressed by adding a redundant consensus product term that covers the adjacency, and dynamic hazards are eliminated by pipelining or by registering the output.

Chapter 5: Latches, Flip-Flops, Registers, Finite-State Machines, and Counters

Memory in a digital system is created by cross-coupling two inverters so that each holds the other’s state, producing a bistable element whose output is the same as its previous output until a new value is written in. The simplest such element is the SR latch; adding a level-sensitive enable produces the D latch, and adding an edge-triggered master-slave structure produces the D flip-flop. A flip-flop captures its input exactly at the active edge of the clock, and this instant-of-sampling semantics is what makes synchronous sequential design tractable. A register is simply a parallel bank of flip-flops that share a common clock and capture an entire word at once.

A finite-state machine is a register whose next value is a combinational function of the current register value and of the external inputs. Two stylistic conventions exist: in a Moore machine the outputs depend only on the state, so they are glitch-free and one clock cycle late, while in a Mealy machine the outputs depend on both state and current input, so they are one cycle earlier but can glitch when inputs arrive asynchronously. Formally an FSM is a tuple \( (S, \Sigma, \delta, s_0, O) \) where \( \delta : S \times \Sigma \to S \) is the next-state function and the output map is either \( O : S \to \Gamma \) or \( O : S \times \Sigma \to \Gamma \). A biomedical pulse-oximeter controller cycling through red-LED-on, infrared-LED-on, both-off, and sample-ADC phases is a textbook Moore machine with four states and a handful of outputs.

Counters are the most heavily reused FSMs. A ripple counter chains toggle flip-flops and is almost free in hardware, but because each stage clocks the next it accumulates skew proportional to its length and is therefore unsuitable whenever outputs must be sampled coherently. A synchronous counter clocks every stage from the same edge and uses combinational logic to decide which bits should toggle, so its outputs settle in a single propagation delay. Counters are the heartbeat of timers, PWM generators, and sensor sampling schedules.

Chapter 6: Sequential Timing, Clock Skew, Metastability, and Pipelining

A synchronous design works only if every flip-flop observes a stable input at its active clock edge. Two timing constraints enforce this. The setup constraint says that the combinational logic between two registers must finish before the setup time \( t_{su} \) preceding the next edge, so \( t_{cq} + t_{pd,\text{comb}} + t_{su} \le T_{clk} \), where \( t_{cq} \) is the clock-to-output delay of the launching register. The hold constraint says that the receiving register must still see its old input \( t_h \) past the edge, so \( t_{cq,\min} + t_{cd,\text{comb}} \ge t_h \). Hold violations cannot be fixed by slowing the clock and must be repaired by inserting delay on short paths.

Clock skew, the difference in arrival time of the clock edge at two different flip-flops, tightens the setup constraint at one end of a path and loosens it at the other. A well-designed clock tree drives every leaf with balanced wire length so that skew is bounded to tens of picoseconds. Metastability is a subtler problem: when an asynchronous input changes exactly during the setup-hold window of a flip-flop, the flip-flop can linger at an intermediate voltage for an unbounded time before resolving to zero or one. Because the mean time between metastable failures falls exponentially with the time allowed for resolution, any signal crossing from one clock domain into another must first traverse a synchronizer consisting of two or more cascaded flip-flops that each grant an additional clock period of resolution time.

Pipelining attacks the setup constraint rather than accepting it. By slicing a long combinational path with additional registers, each pipe stage now contains a shorter span of logic, the clock period can shrink accordingly, and throughput rises even though the latency of a single computation increases. This is the same trick that allows an x86 microarchitecture to run at several gigahertz despite containing logic that would take nanoseconds if executed in a single cycle.

Chapter 7: Hardware Description Languages, Simulation, Synthesis, and Verilog

Drawing gates by hand does not scale to modern designs, so engineers describe hardware in a textual hardware description language. Verilog, originally developed at Gateway Design Automation and later standardized as IEEE 1364 and SystemVerilog 1800, is the dominant choice in North American industry. The central idea of Verilog is that a module is a black box with ports, and inside the module behaviour is specified either structurally, by instantiating smaller modules, or behaviourally, by writing always blocks that fire in response to events. A continuous assign statement creates combinational logic, an always block with an edge-sensitivity list on the clock creates registers, and the nonblocking assignment operator matches the simultaneous update semantics of real flip-flops.

A simulator steps through a discrete-event queue and lets the designer watch signal waveforms before any silicon is committed. A synthesizer, by contrast, maps the HDL into a netlist of gates and flip-flops targeted at a specific technology library, applying boolean optimization, retiming, and technology mapping along the way. The crucial discipline is to write synthesizable HDL, meaning code whose simulation behaviour coincides with the hardware the synthesizer will infer, avoiding constructs such as delays and initial blocks that exist purely for testbenches. For a biomedical design cycle, an FPGA flow typically runs from Verilog through simulation, then synthesis, then place-and-route, then bitstream download, with each step producing a model of the design that can be checked against the level above it.

Chapter 8: Arithmetic Circuits, Memory Arrays, and Logic Arrays

A ripple-carry adder chains \( n \) full adders so that the carry propagates bit by bit, which costs \( O(n) \) delay. A carry-lookahead adder computes the carries in parallel by precomputing generate \( g_i = a_i b_i \) and propagate \( p_i = a_i \oplus b_i \) signals and combining them in a tree of \( O(\log n) \) depth. Multipliers are built as arrays of partial products summed through Wallace or Dadda trees, and dividers are either restoring or non-restoring iterative shifters. In biomedical signal processing these blocks implement the finite-impulse-response filters that clean ECG traces and the fast Fourier transforms that extract heart-rate variability.

A memory array is a two-dimensional grid of storage cells addressed by a row decoder and a column multiplexer. Static RAM uses a six-transistor cell that holds its value as long as power is applied, while dynamic RAM stores charge on a tiny capacitor that must be refreshed every few milliseconds. Read-only memory comes in mask-programmed, fuse-programmed, and electrically erasable flavours; flash memory, a descendant of EEPROM, dominates embedded code storage in medical devices because it is nonvolatile, dense, and reprogrammable in the field.

Programmable logic arrays generalize the sum-of-products form into silicon: an AND-plane feeds an OR-plane, and the interconnection pattern is either mask-programmed, fuse-programmed, or, in the case of an FPGA, loaded from an external configuration memory at power-on. The FPGA is the dominant prototyping platform for biomedical instrumentation because a single chip can be reconfigured into any digital architecture the designer chooses.

Chapter 9: Assembly Language, Machine Language, and Addressing Modes

Assembly language is a near-one-to-one textual rendering of the binary instructions that a processor actually fetches. Each instruction specifies an opcode, zero or more register operands, and optionally an immediate constant or a memory address. A reduced instruction set computer such as the ARM Cortex-M found in clinical infusion pumps has fixed-length instructions and a load-store memory model: arithmetic operates only on registers, and dedicated load and store instructions move values between memory and the register file.

Addressing modes describe how the operand address is computed. Immediate mode bakes a constant into the instruction. Register mode names a register. Direct mode gives an absolute address. Register-indirect mode reads the address from a register, and base-plus-offset mode adds a small constant to the register so that a single instruction can walk through a struct field. Indexed and scaled-indexed modes add a second register to support array traversal. Picking the right addressing mode is a compiler decision that directly affects how compactly and how quickly a signal-processing inner loop executes.

Chapter 10: Microcontrollers, Processors, Pipelined Datapaths, Hazards, and x86 Microarchitecture

A single-cycle processor fetches, decodes, and executes one instruction per clock edge, which keeps the control logic simple but forces the clock period to accommodate the longest instruction. A pipelined processor overlaps stages so that while instruction \( n \) is in execute, instruction \( n+1 \) is in decode and instruction \( n+2 \) is in fetch, multiplying throughput by the number of stages at the cost of additional forwarding logic and stall logic.

Three classes of hazards threaten a pipeline. A structural hazard appears when two instructions need the same hardware unit in the same cycle. A data hazard appears when an instruction needs the result of an earlier instruction that has not yet written back; forwarding paths route the result directly from the execute or memory stage of the producer to the execute stage of the consumer, and when forwarding cannot cover the gap a one-cycle bubble is inserted. A control hazard appears at a branch, because the fetch stage cannot know which instruction to fetch until the branch resolves; branch prediction and delayed branching mitigate this.

A microcontroller folds a CPU, a modest amount of on-chip SRAM and flash, and a set of peripherals into a single inexpensive package. For a blood-glucose meter this typically means an ARM Cortex-M0 or Cortex-M4 core, a few kilobytes to a few hundred kilobytes of flash, timers driving the LCD, an ADC reading the sensor, and a UART or Bluetooth Low Energy radio for reporting. Contemporary x86 parts in laptops and clinical servers ornament the same fetch-decode-execute skeleton with out-of-order execution, multiple-issue superscalar pipelines, speculative branch execution, deep caches, and SIMD vector extensions, but the underlying principles are those of the five-stage pipeline.

Chapter 11: Memory Hierarchy, Caches, Virtual Memory, I/O Systems, and Biomedical Sensors

Main memory is enormously larger than a register file but also much slower, so practical systems insert caches between the processor and memory to exploit temporal and spatial locality. A cache is organized into fixed-size blocks, and an access either hits, meaning the block is already resident, or misses, meaning the block must be fetched from the next level. Direct-mapped caches pin each memory block to exactly one cache line, fully associative caches let any block live anywhere, and set-associative caches compromise by allowing a block to live in any way of a small set. The average access time is \( t_{avg} = t_{hit} + m \cdot t_{miss} \) where \( m \) is the miss rate, which shows why even a small reduction in miss rate buys a large speedup.

Virtual memory adds a translation step: each process sees a private linear address space that the memory management unit translates into physical frames through a page table. This translation is cached in a translation lookaside buffer, and any miss both in the TLB and in the cache hierarchy is an expensive event that a careful embedded designer tries to avoid in the real-time signal path.

Input and output devices connect to the processor either through memory-mapped registers, where peripheral control registers occupy addresses in the processor’s address space, or through explicit I/O instructions. Interrupts let a peripheral notify the processor asynchronously of a new event, and direct memory access engines let a peripheral stream data into memory without processor involvement at all. A biomedical sensor chain typically consists of an analog front-end amplifier, an anti-alias filter, an analog-to-digital converter clocked by a timer, a DMA channel that deposits samples into a circular buffer, and an interrupt handler that wakes a processing task whenever a frame is complete. The same skeleton fits an ECG, an EEG, and a continuous glucose monitor.

Chapter 12: Biomedical Application Synthesis and Course Review

The final integration of the course folds every earlier chapter into a single picture. A safety-critical biomedical device such as an implantable cardioverter-defibrillator starts with a sensor whose analog output is digitized into a bit stream; the bit stream is filtered by a fixed-point FIR or IIR that is itself a pipeline of adders and multipliers; the filtered samples feed a finite-state machine that classifies the cardiac rhythm; the classifier drives an actuator through a timed output that must respect tight setup and hold margins because a missed edge is a clinical failure; and the whole device is programmed in assembly or C that compiles to a RISC instruction set executed by a pipelined microcontroller sitting next to its cache and its I/O peripherals. Reliability comes from the digital abstraction itself, because every bit can be regenerated through noise margins; from redundancy, because critical paths can be duplicated with majority voting; and from the careful discipline of synchronous design that this course teaches from the bottom up.

Reviewing the arc of the term is then straightforward. Chapters 1 through 3 give the algebra and the representation. Chapter 4 builds the combinational toolkit. Chapters 5 and 6 add memory and timing. Chapter 7 lifts the notation up to hardware description languages so that real designs become tractable. Chapter 8 populates the datapath with arithmetic and storage arrays. Chapters 9 and 10 raise the level of abstraction from gates to instructions and microarchitectures, Chapter 11 connects the processor to memory and to peripherals, and Chapter 12 closes the loop by showing how a biomedical instrument is literally assembled out of the pieces that earlier chapters described. The enduring lesson of the subject is that digital design is reductionist in the most productive sense: if each layer respects its contract with the layer below, a device as complex as a modern patient monitor really can be understood, built, and verified.