ECE 454: Distributed Computing

Tahsin Reza

Estimated study time: 4 minutes

Table of contents

Sources and References

Equivalent UW courses — CS 454/654 (Distributed Systems, single cross-listed course), CS 451 (Data-intensive Distributed Computing) Primary textbook — van Steen, Maarten, and Andrew S. Tanenbaum. Distributed Systems. 4th ed., 2023. Available at distributed-systems.net. Supplementary references — Apache Spark, Apache Thrift, Apache ZooKeeper, Apache Kafka, and MPI documentation.

Equivalent UW Courses

ECE 454 overlaps heavily with CS 454/654, which is a single cross-listed undergrad/grad course in Distributed Systems sharing the same lectures, instructor, and exams; the calendar antirequisite explicitly forbids taking both. Both use the van Steen and Tanenbaum textbook and cover architectures, RPC, consistency, replication, and consensus. CS 451 is the nearest CS analogue on the data-processing side: it targets data-intensive frameworks such as Spark and MapReduce for analytics workloads and overlaps with ECE 454 on Spark but not on coordination, consensus, or RPC internals. An ECE 454 student has effectively done a hybrid of CS 454/654 and a slice of CS 451.

What This Course Adds Beyond the Equivalents

Compared to CS 454/654, ECE 454 leans more hands-on: it builds skills in three concrete Apache frameworks (Spark for data parallelism, Thrift for cross-language RPC, ZooKeeper for coordination) and exposes students to MPI and HPC-cluster programming, which CS-stream distributed courses rarely touch. It also spends a dedicated segment on parallel and distributed deep learning, reflecting the ECE department’s compute-systems focus.

What it omits relative to CS 454/654 is a portion of the formal-model treatment — failure detectors, causal delivery proofs, and the theoretical framing of impossibility results get lighter coverage than in the CS grad treatment. Relative to CS 451, it omits much of the MapReduce lineage, Pig/Hive-style query layers, and data-warehouse-flavored analytics.

Topic Summary

Distributed software architectures

Client-server, multi-tier, peer-to-peer, and service-oriented architectures; middleware as the glue layer. Van Steen and Tanenbaum chapters 1-2 provide the reference framing.

Processes, networking, and RPC

Review of processes, threads, and virtualization; refresher on transport-layer protocols as they appear in distributed middleware; remote procedure call as a core abstraction. Apache Thrift is the hands-on RPC framework used for client-server communication with cross-language serialization.

MPI and HPC

Message Passing Interface as the dominant programming model for scientific / HPC cluster computing. Performance evaluation metrics (speedup, efficiency, scalability) are introduced here and used for the rest of the course.

Apache Spark

Cluster computing with data parallelism and fault tolerance through resilient distributed datasets. Programming model, transformations and actions, DAG scheduling, and lineage-based recovery. This is the course’s main data-intensive framework and the most direct overlap with CS 451.

Coordination, leader election, and mutual exclusion

Classical distributed coordination problems and their solutions. Apache ZooKeeper is presented as a reliable distributed coordination service used for configuration, synchronization, and group membership. Leader election and distributed locks are worked examples.

Consistency, replication, and the CAP theorem

Consistency models from strong / linearizable to eventual; primary-backup and quorum-based replication; the CAP trade-off between consistency, availability, and partition tolerance. Van Steen and Tanenbaum chapters 7-8 are the reference.

Consistent hashing and distributed hash tables

Chord / Dynamo-style partitioning, key lookup, and how DHTs allow peer-to-peer systems to scale. Connects back to replication and membership.

Commit protocols, fault tolerance, and Kafka

Two-phase and three-phase commit; checkpointing and fault recovery. Apache Kafka is used as the streaming / log-based messaging system that underlies many modern fault-tolerant pipelines.

Consensus: Paxos and Raft

Distributed agreement and state-machine replication. Both Paxos and Raft are presented, with Raft receiving the more operational treatment. This is where the course meets the theory of distributed systems head-on.

Logical time and event ordering

Lamport’s clocks, vector clocks, and happens-before. Provides the formal vocabulary for reasoning about causality across non-synchronized nodes.

Parallel and distributed deep learning

Closing module on data and model parallelism for neural network training, parameter servers, and all-reduce-style synchronization. Not covered in CS 454/654 and a signature ECE addition.