CS 459: Privacy, Cryptography, Network and Data Security

Daniel Vogel

Estimated study time: 2 hr 48 min

Table of contents

Sources and References

Primary references — Dan Boneh and Victor Shoup, A Graduate Course in Applied Cryptography (freely available at toc.cryptobook.us); Cynthia Dwork and Aaron Roth, The Algorithmic Foundations of Differential Privacy, Foundations and Trends in Theoretical Computer Science, 2014.

Supplementary texts — Alfred J. Menezes, Paul C. van Oorschot, and Scott A. Vanstone, Handbook of Applied Cryptography (freely available at cacr.math.uwaterloo.ca/hac); Ross Anderson, Security Engineering, 3rd ed., Wiley, 2020; Arvind Narayanan et al., A Primer on Data Science and Privacy (Princeton).

Online resources — Cloudflare blog on cryptography engineering; Google Project Zero blog; Tor Project design documents; NIST Cybersecurity Framework; Latanya Sweeney’s k-anonymity and re-identification work; Cynthia Dwork’s differential privacy original paper (ICALP 2006).


Chapter 1: Cryptographic Foundations

Cryptography is the mathematical backbone of modern security. Before examining specific protocols, it helps to state precisely what security goals cryptography pursues and what assumptions underlie its guarantees.

Security Goals and Threat Models

The classical security goals are confidentiality (only intended parties can read plaintext), integrity (data has not been altered in transit or at rest), authenticity (the claimed sender is the true sender), and non-repudiation (a sender cannot later deny having sent a message). Real systems often need all four simultaneously.

Kerckhoffs’ principle states that a cryptosystem should be secure even if everything about the system, except the key, is public knowledge. Equivalently, security must never depend on secrecy of the algorithm — only secrecy of the key. This principle underpins modern cryptographic design: algorithms such as AES and RSA are fully public, yet remain secure because breaking them requires knowing the key.

A threat model formalises what an adversary can do: observe ciphertexts (ciphertext-only attack), choose plaintexts and observe their encryptions (chosen-plaintext attack, CPA), or adaptively choose ciphertexts and observe decryptions (chosen-ciphertext attack, CCA). IND-CCA2 — indistinguishability under adaptive chosen-ciphertext attack — is the gold standard for encryption schemes.

Symmetric Encryption

Definition (Symmetric encryption scheme): A symmetric encryption scheme is a triple of algorithms \(\mathsf{(Gen, Enc, Dec)}\) where: Gen outputs a key \(k\) uniformly from a key space \(\mathcal{K}\); Enc takes \((k, m)\) and outputs a ciphertext \(c \leftarrow \mathsf{Enc}_k(m)\); Dec takes \((k, c)\) and outputs \(m' = \mathsf{Dec}_k(c)\). Correctness requires \(\mathsf{Dec}_k(\mathsf{Enc}_k(m)) = m\) for all \(k, m\). Security is defined by the adversary's inability to distinguish encryptions of chosen plaintexts.

Symmetric encryption uses the same key for encryption and decryption. The one-time pad achieves information-theoretic security: \(c = m \oplus k\) with a uniformly random key as long as the message. It is perfectly secure but impractical at scale because the key is as long as the message and must never be reused.

Stream ciphers approximate the one-time pad using a pseudorandom keystream generated from a short key and nonce. AES-CTR mode turns a block cipher into a stream cipher by encrypting successive counter values and XORing with plaintext.

Definition (Block cipher): A block cipher with key length \(\lambda\) and block length \(n\) is a keyed permutation family \(E: \{0,1\}^\lambda \times \{0,1\}^n \to \{0,1\}^n\) such that for every key \(k\), the map \(E_k(\cdot)\) is a bijection on \(\{0,1\}^n\) with an efficiently computable inverse \(E_k^{-1}(\cdot)\). Security requires that \(E_k\) is computationally indistinguishable from a uniformly random permutation when \(k\) is secret — this is the pseudorandom permutation (PRP) assumption.

Block ciphers operate on fixed-size blocks. AES (Advanced Encryption Standard) operates on 128-bit blocks with 128-, 192-, or 256-bit keys through a substitution-permutation network. The mode of operation determines how a block cipher encrypts messages longer than one block:

ModeConfidentialityParallelisableNotes
ECBBrokenYesIdentical blocks → identical ciphertexts; never use
CBCSecure (with random IV)Decrypt onlyPadding oracle attacks if unauthenticated
CTRSecure (with unique nonce)YesStream cipher from block cipher
GCMSecure + authenticatedYesAEAD; nonce reuse is catastrophic

ECB mode is famously insecure: encrypting the same plaintext block always produces the same ciphertext block, leaking patterns (the “ECB penguin” illustration). GCM is the preferred mode for new systems.

Example (AES-128-GCM encrypting "Hello, World!"): Consider encrypting the 13-byte string "Hello, World!" under AES-128-GCM. The key \(K\) is 128 bits: for concreteness, \(K = \texttt{2b7e151628aed2a6abf7158809cf4f3c}\) (the standard NIST AES-128 test key, 16 bytes). A fresh random nonce \(N\) of 12 bytes is generated per encryption — for example, \(N = \texttt{000000000000000000000000}\) in a test vector. The plaintext "Hello, World!" is 13 bytes; no block-level padding is needed in GCM (it is a stream mode).

AES-GCM proceeds in two conceptual steps. First, confidentiality via CTR mode: the nonce is combined with a counter to form successive 128-bit blocks, each encrypted with AES to produce a keystream; the keystream is XOR’d with the plaintext to produce ciphertext \(C\) of the same length (13 bytes). Second, integrity via GHASH: the ciphertext (and any additional authenticated data, such as a packet header) is processed by GHASH, a polynomial hash over GF(\(2^{128}\)), to produce a 128-bit authentication tag \(T\). The recipient verifies \(T\) before decrypting; if \(T\) is wrong, the ciphertext is rejected without any plaintext being exposed.

The output of the encryption is the pair \((C, T)\): the ciphertext (same length as plaintext — 13 bytes) plus the tag (always 16 bytes). Contrast with AES-CBC: CBC adds PKCS#7 padding, producing 16 bytes of ciphertext for a 13-byte message (padding to the block boundary), and produces no tag — a separate HMAC must be appended manually, and it must be verified before decryption to avoid padding oracle attacks. Failure to follow this encrypt-then-MAC ordering was the root cause of POODLE (CBC padding oracle over SSLv3, 2014) and BEAST (CBC chosen-plaintext attack on TLS 1.0, 2011). AES-GCM eliminates this class of vulnerabilities by construction.

Remark (Authenticated encryption and AEAD): The modern cryptographic consensus is that encryption without authentication is nearly always wrong. Using a cipher that provides only confidentiality (e.g., AES-CBC without a MAC) exposes implementations to chosen-ciphertext attacks that can recover plaintext without ever breaking the underlying cipher — all that is needed is a decryption oracle that tells the attacker whether decryption "succeeded." The classic demonstration is Bleichenbacher's 1998 attack on PKCS#1 v1.5 RSA encryption in TLS: by sending approximately one million specially crafted ciphertexts to a TLS server and observing whether each one produced a particular error response, an attacker could recover the plaintext of any intercepted RSA-encrypted message. The server was acting as an unwitting decryption oracle. No quantum computer or brute force was needed — just adaptive querying.

Two decades later, the ROBOT vulnerability (Return Of Bleichenbacher’s Oracle Threat, 2018) found that 27% of the HTTPS servers for the top 100 websites were still vulnerable to variations of the same attack, despite decades of warnings. The correct response is to use AEAD: AES-GCM and ChaCha20-Poly1305 are the two mandated cipher suites in TLS 1.3 — all others were removed. Neither can be attacked in this way because any modification to the ciphertext causes tag verification to fail before any decryption occurs.

Example (AES-128 block cipher in action): AES-128 takes a 128-bit key \(K\) and a 128-bit plaintext block \(P\), and produces a 128-bit ciphertext block \(C\) through 10 rounds of four operations: SubBytes (a nonlinear S-box substitution), ShiftRows (cyclic row shifts), MixColumns (mixing within columns over GF\((2^8)\)), and AddRoundKey (XOR with the round key derived from \(K\)).

Concretely: encrypt the string “Hello World!\x04\x04\x04\x04” (padded to exactly 16 bytes with PKCS#7) under the all-zero key \(K = 0\text{x}00000000\ldots00\). The output ciphertext is (approximately) 0x8d a0 2c 1e 9f 6b 3a 7c 44 e2 8b d1 f6 05 9a 2e — a block with no visible structure, even though the plaintext is entirely ASCII. This pseudo-random appearance holds regardless of how structured the plaintext is: AES’s design ensures that changing even one bit of \(P\) or \(K\) avalanches through all output bits. The strict avalanche criterion means that, on average, half of the output bits flip when one input bit changes.

Remark (IND-CPA vs. IND-CCA2): IND-CPA (indistinguishability under chosen-plaintext attack, also called semantic security) is the minimum acceptable security goal for confidentiality. An encryption scheme is IND-CPA secure if no efficient adversary who can query the encryption oracle can distinguish the encryption of \(m_0\) from the encryption of \(m_1\). However, many real-world attacks exploit the ability to decrypt chosen ciphertexts — padding oracle attacks on CBC, Bleichenbacher's attack on PKCS#1 v1.5, and BEAST all fit this pattern. IND-CCA2 (adaptive chosen-ciphertext security) closes these gaps by requiring that the adversary cannot gain advantage even with access to a decryption oracle for any ciphertext other than the challenge.

A concrete illustration of why IND-CPA alone is insufficient: suppose encryption is deterministic (the same plaintext always produces the same ciphertext). Eve observes a challenge ciphertext \(c^*\). She guesses that the plaintext is \(m'\), queries the encryption oracle for \(\mathsf{Enc}(m')\), and compares to \(c^*\). If they match, she has identified the plaintext with certainty. IND-CPA security prevents this only if encryption is randomized — two encryptions of the same message look different. Schemes like AES-CBC with a random IV and AES-GCM with a random nonce achieve this; textbook RSA and AES-ECB do not.

Example (Stream cipher key-reuse disaster: RC4 and WEP): RC4 is a stream cipher once widely used in WEP (Wi-Fi's original encryption protocol, 1999–2004). A stream cipher produces a pseudorandom keystream \(z\) from a key \(k\), and encrypts by \(c = m \oplus z\). If two messages \(m_1\) and \(m_2\) are encrypted with the same key (i.e., the same keystream \(z\)), an attacker who intercepts both ciphertexts can XOR them together: \[ c_1 \oplus c_2 = (m_1 \oplus z) \oplus (m_2 \oplus z) = m_1 \oplus m_2. \] The keystream cancels entirely, leaving only the XOR of the two plaintexts. Since natural-language text and network protocols have known structure, \(m_1 \oplus m_2\) is sufficient to recover both messages using crib-dragging techniques. WEP used 24-bit IVs prepended to the secret key to form the RC4 key. With only \(2^{24} \approx 16\) million possible IVs and active Wi-Fi traffic, IV collisions (and hence keystream reuse) occur after approximately \(\sqrt{2^{24}} \approx 4{,}000\) packets by the birthday paradox. In practice, active attackers could force enough traffic that WEP networks were cracked in under a minute. This is why "the nonce must be unique" is a hard cryptographic requirement — not a recommendation.

Authenticated Encryption and the Cryptographic Doom Principle

Encryption alone provides confidentiality but not integrity. An attacker can flip ciphertext bits, causing predictable changes in decrypted plaintext. Authenticated Encryption with Associated Data (AEAD) combines confidentiality and integrity in a single primitive.

The Cryptographic Doom Principle, articulated by Moxie Marlinspike, states: if you have to perform any cryptographic operation before verifying the MAC on a message you’ve received, it will somehow be used to destroy you. Concretely: always verify the MAC before decrypting. Systems that decrypt first and authenticate second are vulnerable to padding oracle and CBC timing attacks (e.g., the BEAST and Lucky13 attacks on TLS). The correct composition is Encrypt-then-MAC.

AES-GCM authenticates both the ciphertext and associated data (e.g., packet headers) using GHASH over GF(\(2^{128}\)). ChaCha20-Poly1305 is a software-friendly alternative resistant to timing attacks, used in TLS 1.3 and WireGuard.

Pseudorandom Functions and Generators

Definition (Pseudorandom function, PRF): A function family \(F: \mathcal{K} \times \mathcal{X} \to \mathcal{Y}\) is a pseudorandom function (PRF) if no efficient adversary can distinguish oracle access to \(F_k(\cdot)\) (for a uniformly random key \(k \in \mathcal{K}\)) from oracle access to a truly random function \(f: \mathcal{X} \to \mathcal{Y}\). Formally, for all efficient distinguishers \(\mathcal{A}\): \[ \left|\Pr_{k \leftarrow \mathcal{K}}\!\left[\mathcal{A}^{F_k(\cdot)} = 1\right] - \Pr_{f \leftarrow \mathsf{RF}}\!\left[\mathcal{A}^{f(\cdot)} = 1\right]\right| \leq \mathsf{negl}(\lambda). \] AES with a fixed key acts as a PRF on 128-bit inputs. PRFs are the central building block for MACs, stream ciphers (via CTR mode), and key derivation functions.
Definition (Pseudorandom generator, PRG): A deterministic function \(G: \{0,1\}^\lambda \to \{0,1\}^{\ell(\lambda)}\) with \(\ell(\lambda) > \lambda\) is a pseudorandom generator (PRG) if its output on a uniformly random seed is computationally indistinguishable from a uniform string of length \(\ell(\lambda)\). PRGs are the basis of stream ciphers: generate a long pseudorandom keystream \(G(k)\) from a short key \(k\), then XOR with the plaintext.

The existence of PRGs (and PRFs) cannot be proven unconditionally — they require computational hardness assumptions (e.g., the hardness of factoring or discrete log). The PRF-PRG-block cipher connections form the core of provable symmetric security.

Hash Functions

A cryptographic hash function \(H: \{0,1\}^* \to \{0,1\}^n\) must satisfy:

  • Collision resistance: computationally infeasible to find \(x \neq x'\) with \(H(x) = H(x')\)
  • Preimage resistance: given \(y\), computationally infeasible to find \(x\) with \(H(x) = y\)
  • Second-preimage resistance: given \(x\), computationally infeasible to find \(x' \neq x\) with \(H(x') = H(x)\)

SHA-1 (160-bit output) was broken in 2017 by the SHAttered attack (Stevens et al.), which produced two PDF files with the same SHA-1 hash using approximately \(2^{63.1}\) operations — far below the \(2^{80}\) brute-force bound. This had practical implications: TLS certificates signed with SHA-1 could be forged, which is why all major CAs stopped issuing SHA-1 certificates by 2016. SHA-2 (SHA-256, SHA-512) and SHA-3 (Keccak, sponge construction) are currently secure.

Example (Hash functions in practice): The SHA-256 hash of the ASCII string "Hello" is the 256-bit value 185f8db32921bd46d35980763e29a460bc1d32a8a8c2e34ca2f48d73e5c9e36 — 64 hexadecimal characters, each representing 4 bits. This compact digest has three security properties: (1) preimage resistance — given only the digest 185f…, there is no efficient algorithm to find any string that hashes to it; (2) second-preimage resistance — given "Hello" itself, one cannot efficiently find a different string "Hello'" such that SHA-256("Hello'") equals SHA-256("Hello"); (3) collision resistance — one cannot efficiently find any pair of distinct strings \(x \neq x'\) such that SHA-256(\(x\)) = SHA-256(\(x'\)).

The current hash landscape: MD5 (128-bit) has been collisible since 2004 (Wang and Yu); collisions can now be computed on a laptop in seconds using Hashclash. SHA-1 (160-bit) was collisible in 2017 (SHAttered). Both are completely broken for integrity purposes, though MD5 is still seen in checksums where collision resistance is not required (because a file publisher controls both files). Only SHA-2 (SHA-256, SHA-512) and SHA-3 (Keccak, sponge construction) are currently considered secure for all three properties. NIST standardised SHA-3 in 2015 as a structural backup in case weaknesses were discovered in SHA-2’s Merkle-Damgård construction.

Theorem (Birthday paradox collision bound): For a hash function with \(2^n\) possible output values, the expected number of evaluations before a collision is found is \(\Theta(2^{n/2})\). More precisely, after \(k\) uniformly random hash evaluations, the probability that some two outputs collide is approximately \(1 - e^{-k^2 / (2 \cdot 2^n)}\). Setting this probability to \(1/2\) gives \(k \approx \sqrt{2^n \ln 2} \approx 1.18 \cdot 2^{n/2}\).
Proof sketch. The complementary probability — that all \(k\) outputs are distinct — is \[ \Pr[\text{no collision}] = 1 \cdot \left(1 - \frac{1}{2^n}\right) \cdot \left(1 - \frac{2}{2^n}\right) \cdots \left(1 - \frac{k-1}{2^n}\right). \] Taking logarithms and using \(\ln(1-x) \approx -x\) for small \(x\): \[ \ln \Pr[\text{no collision}] \approx -\sum_{i=0}^{k-1} \frac{i}{2^n} = -\frac{k(k-1)}{2 \cdot 2^n} \approx -\frac{k^2}{2 \cdot 2^n}. \]

So \(\Pr[\text{collision}] \approx 1 - e^{-k^2 / (2^{n+1})}\). Setting this to \(1/2\) and solving gives \(k \approx \sqrt{2^n \ln 2} = 2^{n/2} \sqrt{\ln 2}\). For SHA-256 (\(n = 256\)): \(k \approx 2^{128}\) evaluations — computationally infeasible for any foreseeable adversary. For MD5 (\(n = 128\)): \(k \approx 2^{64}\) evaluations — feasible with GPU clusters, and in practice the algebraic structure of MD5 makes collisions far cheaper than this generic bound.

The Merkle-Damgård construction underlies MD5, SHA-1, and SHA-2: a compression function is iterated over message blocks. A weakness is length extension attacks — given \(H(m)\) and \(|m|\), an attacker can compute \(H(m \| \text{padding} \| m')\) without knowing \(m\). SHA-3’s sponge construction avoids this.

Message Authentication Codes

A MAC provides integrity and authenticity for a shared key setting. HMAC is constructed as \(\text{HMAC}(k, m) = H(k \oplus \text{opad} \| H(k \oplus \text{ipad} \| m))\), which provably avoids length extension vulnerabilities. The birthday bound implies that a MAC with \(n\)-bit output can be forged with approximately \(2^{n/2}\) queries, so short MACs (e.g., 32-bit) are dangerous.

MACs require a shared key and provide no non-repudiation. Digital signatures replace the shared key with asymmetric key pairs and additionally provide non-repudiation — see Chapter 2.

Remark (Digital signatures vs. MACs): The distinction is architectural as much as cryptographic. A MAC (e.g., HMAC-SHA256) is a symmetric primitive: both the signer and the verifier must share the same secret key \(k\). Anyone who can verify a MAC can also produce one — so MACs cannot be used for third-party verification, and they provide no non-repudiation. If Alice sends a message with a MAC to Bob, and Bob shows a judge the (message, MAC) pair, Alice can truthfully claim that Bob could have forged the MAC himself.

A digital signature (RSA-PSS, ECDSA, Ed25519) is asymmetric: the signing key is private and the verification key is public. Anyone with Alice’s public key can verify that Alice signed the message, but only Alice (the sole holder of the private signing key) can produce a valid signature. This gives non-repudiation: Alice cannot later plausibly deny having signed a document whose signature verifies under her public key. This is why digital signatures underlie certificate authorities, code signing, and legal electronic signatures — while MACs are used for integrity within a session where both parties already share a secret (e.g., HTTPS record-layer integrity after TLS handshake).

Public Key Infrastructure

PKI binds public keys to identities. An X.509 certificate contains the subject’s public key, identity, validity period, and a digital signature from a Certificate Authority (CA). Browsers ship with a trust store of ~100–150 root CAs; any of them can sign certificates for any domain, which creates a systemic risk.

Certificate Transparency (CT) logs (RFC 6962) require CAs to append every issued certificate to append-only, publicly auditable logs. Browsers require a Signed Certificate Timestamp (SCT) proving the certificate appears in a CT log. This makes fraudulent certificates detectable after the fact, though not preventable.

Example (DigiNotar compromise and CA trust failure): In July 2011, DigiNotar — a Dutch CA trusted by all major browsers — was compromised by an Iranian attacker who issued over 500 fraudulent certificates, including a wildcard certificate for *.google.com. This single certificate, trusted by all browsers because DigiNotar was a root CA, allowed an attacker-controlled server to impersonate any Google service — Gmail, Drive, Search — to any user whose browser trusted DigiNotar. Because TLS certificate validation only checks that the signature is from a trusted CA, not that the CA is the one who would legitimately issue that certificate, there was no warning to users.

The attack was used operationally against Iranian citizens attempting to access Google services — the Iranian government used the fraudulent certificate to perform SSL interception (MITM) at the national ISP level, decrypting Iranian users’ traffic. The compromise was discovered when an Iranian user noticed an unusual certificate error in Chrome. Within days, Mozilla, Google, and Microsoft revoked DigiNotar’s root certificate, instantly making DigiNotar-signed certificates untrusted in all browsers — effectively destroying DigiNotar as a company. DigiNotar went bankrupt months later.

This incident directly motivated Certificate Transparency: if CT logs had existed, the fraudulent *.google.com certificate would have appeared in the public log, Google’s automated monitoring would have detected the misissued certificate within minutes, and the CA could have been required to revoke it before it was used operationally.

Remark (DNS-based Authentication of Named Entities — DANE): DANE (RFC 6698) is an alternative PKI mechanism that pins a server's TLS certificate or public key directly in DNSSEC-signed DNS records (TLSA records). A client looking up _443._tcp.example.com can retrieve the fingerprint of the expected certificate directly from DNS, bypassing the CA system entirely. This eliminates the "any CA can issue for any domain" vulnerability: only the domain owner (who controls DNS) can publish TLSA records. The limitation: DANE requires DNSSEC deployment throughout the resolution chain, and DNSSEC adoption by registrars and resolvers has been slow. Approximately 3% of global DNS zones have DNSSEC signed; DANE over HTTPS sees negligible adoption in the browser ecosystem (though it is used in email — SMTP DANE is deployed by major providers including Google and Cloudflare for mail server certificate validation).

Chapter 2: Public Key Cryptography

Asymmetric cryptography allows two parties who have never met to exchange keys or verify signatures without a pre-shared secret, solving the fundamental key distribution problem.

RSA

RSA key generation: choose large primes \(p, q\); set \(n = pq\); choose \(e\) coprime to \(\phi(n) = (p-1)(q-1)\); compute \(d = e^{-1} \bmod \phi(n)\). The public key is \((n, e)\), private key is \(d\). Encryption: \(c = m^e \bmod n\). Decryption: \(m = c^d \bmod n\).

Textbook RSA (without padding) is insecure. A catalogue of attacks documented in Boneh’s 20-year survey includes:

AttackConditionConsequence
Small \(e = 3\)\(m^3 < n\)Take cube root over integers
Coppersmith / small message\(m < n^{1/e}\)Polynomial-time factoring
Wiener’s attack\(d < n^{0.25}\)Recover \(d\) via continued fractions
Common modulusSame \(n\), two \(e\)’sRecover plaintext via extended GCD
CCA on textbook RSAChosen ciphertextsTrivially malleable

OAEP (Optimal Asymmetric Encryption Padding) — RSA-OAEP — achieves IND-CCA2 security in the random oracle model by adding randomised padding before encryption.

Example (RSA key generation and toy encryption): To build intuition, consider a small RSA example (real keys are 2048+ bits; this uses tiny numbers for exposition). Choose primes \(p = 61\) and \(q = 53\). Compute the modulus \(n = pq = 3233\) and Euler's totient \(\phi(n) = (p-1)(q-1) = 60 \times 52 = 3120\). Choose public exponent \(e = 17\) (coprime to 3120 since \(\gcd(17, 3120) = 1\)). Compute the private exponent \(d\) such that \(de \equiv 1 \pmod{3120}\); using the extended Euclidean algorithm, \(d = 2753\) (verify: \(17 \times 2753 = 46801 = 15 \times 3120 + 1\)).

Public key: \((n, e) = (3233, 17)\). Private key: \(d = 2753\).

Encrypt the message \(m = 65\): \(c = m^e \bmod n = 65^{17} \bmod 3233 = 2790\). Decrypt: \(m = c^d \bmod n = 2790^{2753} \bmod 3233 = 65\). The correctness follows from Euler’s theorem: \(m^{\phi(n)} \equiv 1 \pmod{n}\), so \(m^{de} = m^{1 + k\phi(n)} = m \cdot (m^{\phi(n)})^k \equiv m \pmod{n}\).

Why is textbook RSA insecure for encryption? The scheme is deterministic: the same plaintext always produces the same ciphertext. An attacker who knows \(m\) is one of two values \(\{m_0, m_1\}\) can simply encrypt both with the public key and compare to \(c\) — breaking IND-CPA immediately. OAEP adds randomness before encryption, making this attack impossible. For signatures, the analogous weakness is that RSA is multiplicatively homomorphic: given signatures \(\sigma_1 = m_1^d\) and \(\sigma_2 = m_2^d\), one can compute \(\sigma_1 \sigma_2 = (m_1 m_2)^d \bmod n\), a valid signature on \(m_1 m_2\), without knowing \(d\).

Example (Diffie-Hellman key exchange — step by step): Alice and Bob want to establish a shared secret over an insecure channel. They agree publicly on a large prime \(p\) and a generator \(g\) of the group \(\mathbb{Z}_p^*\). In practice, \(p\) is a 2048-bit or 3072-bit safe prime (chosen so \((p-1)/2\) is also prime), but for illustration use \(p = 23\) and \(g = 5\).

Alice chooses a private key \(a = 6\) uniformly at random and computes her public value \(A = g^a \bmod p = 5^6 \bmod 23 = 8\). Bob chooses a private key \(b = 15\) and computes \(B = g^b \bmod p = 5^{15} \bmod 23 = 19\). They exchange \(A = 8\) and \(B = 19\) publicly.

Alice computes the shared secret \(s = B^a \bmod p = 19^6 \bmod 23 = 2\). Bob computes \(s = A^b \bmod p = 8^{15} \bmod 23 = 2\). Both arrive at the same secret \(s = 2 = g^{ab} \bmod p\). An eavesdropper sees \(p = 23\), \(g = 5\), \(A = 8\), \(B = 19\) but must solve the discrete logarithm problem to find \(a\) or \(b\) — computationally infeasible for large \(p\).

Critically, vanilla DH provides no authentication: a man-in-the-middle can impersonate both Alice and Bob, establishing separate shared secrets with each. This is why DH is always combined with authentication (signatures on the key shares) in protocols like TLS — ephemeral ECDHE key shares are signed by the server’s certificate key.

Discrete Logarithm and Diffie-Hellman

The discrete logarithm problem (DLP): given a prime-order group \(\mathbb{G}\) with generator \(g\) and element \(h = g^x\), find \(x\). This is believed hard for appropriately chosen groups.

Diffie-Hellman key exchange: Alice sends \(g^a\), Bob sends \(g^b\); both compute \(g^{ab}\). The CDH (Computational DH) assumption states computing \(g^{ab}\) from \(g^a, g^b\) is hard; the DDH (Decisional DH) assumption states \(g^{ab}\) is indistinguishable from a random group element.

El-Gamal encryption: to encrypt \(m\) under public key \(h = g^x\), choose random \(r\), send \((g^r, m \cdot h^r)\). Decryption: \(m = c_2 / c_1^x\). El-Gamal is IND-CPA secure under DDH.

For finite fields \(\mathbb{F}_p^*\), the index calculus algorithm achieves subexponential \(e^{O((\log p)^{1/3} (\log \log p)^{2/3})}\) complexity, requiring 2048-bit primes for 112-bit security. No analogous subexponential algorithm is known for generic elliptic curves.

Elliptic Curve Cryptography

An elliptic curve over \(\mathbb{F}_p\) is the set of points satisfying \(y^2 = x^3 + ax + b\). The group law (point addition) is defined geometrically. ECDH performs Diffie-Hellman in this group; ECDSA and EdDSA provide signatures.

Advantages over finite-field DH: 256-bit ECC keys provide the same security as 3072-bit RSA/DH keys. Commonly used curves include NIST P-256 (widely deployed, but concerns about NIST’s curve constants), and Curve25519 / Ed25519 (designed by Bernstein with fully transparent parameters, faster in software, constant-time implementations available).

Digital Signatures

A signature scheme consists of key generation, signing, and verification. Properties: existential unforgeability under chosen-message attack (EUF-CMA). Key schemes:

  • RSA-PSS: randomised padding gives provable security; preferred over PKCS#1 v1.5 for signatures
  • ECDSA: \(r = (k \cdot G)_x \bmod n\), \(s = k^{-1}(H(m) + r \cdot d) \bmod n\); nonce reuse catastrophe — if the same \(k\) is used for two signatures, the private key \(d\) is trivially recoverable. Sony’s PS3 used a constant “random” \(k\), allowing Fail0verflow to extract the signing key in 2010.
  • EdDSA (Ed25519): deterministic nonce derived from message and private key, eliminating nonce-reuse risk; constant-time implementation; widely adopted (SSH, TLS 1.3)

Signature malleability: in some schemes (ECDSA, early Bitcoin), a valid signature \((r, s)\) has a second valid form \((r, -s \bmod n)\). Bitcoin’s transaction malleability bug exploited this.

Key Management

HKDF (HMAC-based KDF) derives multiple keying material outputs from a single secret using HMAC. It consists of an Extract step (compress entropy into a pseudorandom key) and an Expand step (derive output key material).

Forward secrecy (perfect forward secrecy, PFS): compromise of the long-term private key does not compromise past session keys. Achieved by using ephemeral DH/ECDH key pairs per session, discarding them after use. TLS 1.3 mandates ephemeral key exchange; TLS 1.2 optional (ECDHE cipher suites). Without PFS, a passive adversary can record ciphertext traffic and decrypt it years later upon obtaining the private key.


Chapter 3: Network Security

Networks expose cryptographic systems to adversaries who can observe, intercept, and modify traffic. Securing communications requires understanding the entire stack.

The Network Security Landscape

The TCP/IP stack presents distinct attack surfaces at each layer:

LayerAttacks
Link (Ethernet)ARP spoofing, VLAN hopping
Network (IP)IP spoofing, ICMP redirect, BGP hijacking
Transport (TCP/UDP)TCP session hijacking, SYN flooding, UDP amplification
Application (TLS, HTTP)Downgrade attacks, injection, CSRF, XSS

ARP spoofing poisons a local network’s ARP cache, redirecting traffic through the attacker (man-in-the-middle). IP spoofing forges the source IP; useful in amplification DDoS but not for two-way communication because replies go to the spoofed address.

TLS 1.3

TLS 1.3 (RFC 8446, 2018) is a substantial redesign eliminating the weaknesses accumulated in TLS 1.2. Key changes:

Removed from TLS 1.3: RSA static key exchange (no forward secrecy), static DH, RC4, 3DES, MD5 and SHA-1 in handshake, CBC mode cipher suites, renegotiation, compression.

The TLS 1.3 handshake:

  1. Client sends ClientHello with supported cipher suites and key shares (ECDHE public keys).
  2. Server selects cipher suite, responds with its ECDHE key share, certificate, and CertificateVerify (signature over transcript).
  3. Both derive the session keys using HKDF over the shared ECDHE secret and handshake transcript.
  4. All subsequent records are AEAD-protected (AES-GCM or ChaCha20-Poly1305).

0-RTT resumption allows a client to send application data in the first flight using a pre-shared key from a previous session, reducing latency to zero round trips. The security cost is replay attack vulnerability: a network attacker can replay a 0-RTT request. Servers must treat 0-RTT data as idempotent (safe to replay), or reject it for sensitive operations.

Example (TLS 1.3 handshake key derivation): The TLS 1.3 key schedule uses HKDF to derive all session keys from two secrets: (1) the ECDHE shared secret and (2) optional pre-shared key material. The derivation proceeds in four phases, each building on the previous using HKDF-Extract (combines a salt and input key material into a pseudorandom key) and HKDF-Expand (stretches a PRK into keying material):

Phase 1 — Early Secret: \(\text{ES} = \text{HKDF-Extract}(0, \text{PSK or } 0)\). This absorbs any pre-shared key for 0-RTT; if no PSK, input is the zero string.

Phase 2 — Handshake Secret: \(\text{HS} = \text{HKDF-Extract}(\text{Derive-Secret}(\text{ES}, \ldots), g^{xy})\), where \(g^{xy}\) is the ECDHE shared secret. The client and server handshake traffic keys (for encrypting the handshake after ServerHello) are derived from HS: \(\text{client\_handshake\_traffic\_secret} = \text{Derive-Secret}(\text{HS}, \text{"c hs traffic"}, \text{transcript})\).

Phase 3 — Master Secret: \(\text{MS} = \text{HKDF-Extract}(\text{Derive-Secret}(\text{HS}, \ldots), 0)\). The application data keys are derived from MS: \(\text{client\_application\_traffic\_secret} = \text{Derive-Secret}(\text{MS}, \text{"c ap traffic"}, \text{transcript})\).

Phase 4 — Resumption Secret: Derived from MS for future 0-RTT sessions.

Each party independently computes this key schedule, so session keys are never transmitted. The entire transcript (all handshake messages) is input to each Derive-Secret call — any tampering with any handshake message changes the transcript hash, producing different session keys, and subsequent MAC verification fails. This binding of the key schedule to the transcript prevents downgrade attacks: an attacker who modifies the ClientHello cipher suite list causes the derived keys to differ, breaking the handshake.

Remark (Downgrade attacks and the Logjam/FREAK/POODLE retrospective): TLS has a long history of downgrade attacks that exploit the protocol's backward-compatibility mechanisms to force negotiation of deliberately weakened cipher suites. FREAK (2015) exploited residual support for "export-grade" RSA (512-bit moduli, mandated by US export controls in the 1990s): an active attacker could inject a ServerHello advertising export RSA, and then factor the 512-bit key in hours using distributed computing to decrypt the session. Logjam (2015) exploited analogous export-grade Diffie-Hellman (512-bit), and additionally noted that the NSA's documented capabilities were consistent with having precomputed discrete logs for the few 1024-bit groups commonly used in TLS. POODLE (2014) forced downgrade to SSLv3 and then exploited CBC padding oracle attacks. BEAST (2011) exploited a known weakness in TLS 1.0's IV handling for CBC mode.

The recurring lesson: any flexibility to negotiate weak parameters is eventually exploited. TLS 1.3’s response was radical: remove all weak algorithms entirely, leaving no negotiation for cipher suites that are known to be broken, and mandate forward secrecy (ephemeral key exchange) for all connections. The design philosophy shifted from “support old stuff for compatibility” to “old clients that can’t do TLS 1.3 simply don’t connect.”

Authentication

Password-based authentication must store passwords securely. Plain storage, MD5 hashing, or SHA-1 hashing are all inadequate because GPU-accelerated rainbow tables can crack billions of hashes per second. The correct approach uses memory-hard key derivation functions:

FunctionMemory-hardnessNotes
bcryptModerateCPU-hard; time cost only; FPGA-attackable
scryptStrongMemory + CPU hard; Litecoin PoW
Argon2idBest practicePHC winner 2015; configurable memory, time, parallelism

Multi-factor authentication (MFA) adds a second factor beyond the password. TOTP (RFC 6238) generates a time-based 6-digit code from a shared HMAC secret: \(\text{TOTP}(k, t) = \text{HMAC-SHA1}(k, \lfloor t/30 \rfloor) \bmod 10^6\). FIDO2/WebAuthn uses public-key cryptography tied to the device; it is phishing-resistant because the key is bound to the origin URL.

Credential stuffing automates testing of username-password pairs leaked from one breach against other services, exploiting password reuse. MFA and breach-detection services (HaveIBeenPwned) mitigate this.

Example (TOTP — Time-based One-Time Password step by step): TOTP (RFC 6238) generates a 6-digit code from a shared HMAC secret \(K\) and the current Unix time \(T\). The algorithm:

(1) Compute the time step counter: \(t = \lfloor T_{\text{unix}} / 30 \rfloor\), encoding the current 30-second window as an 8-byte big-endian integer. (2) Compute HMAC-SHA1: \(h = \text{HMAC-SHA1}(K, t)\), producing a 20-byte output. (3) Dynamic truncation: take the last 4 bits of \(h\) as an offset \(o = h[19] \text{ \& } 0x0f\). Extract 4 bytes starting at offset \(o\): \(P = (h[o] \text{ \& } 0x7f) \| h[o+1] \| h[o+2] \| h[o+3]\) (mask the MSB to avoid signed integer issues). (4) Compute \(\text{TOTP} = P \bmod 10^6\).

For example, if \(K = \texttt{12345678901234567890}\) (ASCII) and the Unix time is 1111111109 (giving \(t = 37037037\)), the TOTP value is 081804. Both the client app and the server independently compute this value from the shared secret; agreement proves knowledge of \(K\) within the current or adjacent time window.

TOTP is phishable: an attacker who tricks a user into entering their TOTP code on a fake site can relay it to the real site within the 30-second window. FIDO2/WebAuthn is not phishable because the cryptographic challenge is bound to the site’s origin URL — a key registered for bank.com will not respond to a challenge from evil-bank.com, even if the user is deceived about which site they are visiting.

Remark (Side-channel attacks on cryptographic implementations): Correct algorithms implemented incorrectly can leak secrets through physical side channels — timing, power consumption, electromagnetic emissions, or acoustic signals. Timing attacks exploit variable-time operations: if an RSA implementation uses modular exponentiation with early termination or table lookups that are key-dependent, measuring thousands of decryption times reveals the private key. Kocher's 1996 timing attack on RSA was practical; even a naive network attacker with millisecond-precision timing could extract keys.

Power analysis: on embedded devices (smart cards, HSMs), the current drawn during computation correlates with the data being processed. Simple Power Analysis (SPA) reads off the key directly from a single power trace; Differential Power Analysis (DPA, Kocher 1999) uses statistical averaging across many traces to isolate key-dependent signals even in the presence of noise. DPA broke the AES implementations in many commercial smart cards.

The countermeasure for all these attacks is constant-time implementation: all branches, memory accesses, and arithmetic operations must take exactly the same time regardless of the secret key values. The Nacl/libsodium library and all modern TLS implementations enforce this; OpenSSL’s historical RSA code had timing vulnerabilities (Lucky13, 2013). The lesson: a cryptographic algorithm is secure only if its implementation is also secure — correctness of the mathematics does not guarantee security of the code.

Authentication Protocols

Challenge-response authentication: the prover demonstrates knowledge of a secret by computing a response to a server-generated nonce, without transmitting the secret. This prevents replay attacks.

Kerberos is a symmetric-key authentication protocol used in enterprise Active Directory environments. The Key Distribution Center (KDC) has two components: the Authentication Server (AS), which issues a Ticket-Granting Ticket (TGT) upon initial login, and the Ticket-Granting Server (TGS), which issues service tickets. The TGT is encrypted with the user’s password-derived key; Kerberoasting attacks exploit this by requesting service tickets and brute-forcing them offline.

OAuth 2.0 is an authorisation framework (not authentication). OIDC (OpenID Connect) layers identity on top of OAuth 2.0 with a signed ID token (JWT). SAML is an XML-based alternative used in enterprise SSO. Common vulnerabilities: confused deputy (SSRF via redirect URI), token theft via open redirects, JWT algorithm confusion (accepting alg: none).

Security Through the Layers

IPsec secures IP-layer communications. Transport mode protects the payload only (headers visible); tunnel mode encapsulates the entire original packet (used for VPNs). AH (Authentication Header) provides integrity only; ESP (Encapsulating Security Payload) provides both encryption and integrity.

DNSSEC signs DNS records with ECDSA, creating an authenticated chain from the root zone down. It protects against cache poisoning (Kaminsky attack) but not against a malicious resolver or DDoS against the authoritative server.

BGP security: BGP has no built-in authentication; autonomous systems announce prefixes and peers accept them. BGP hijacking (e.g., Pakistan Telecom’s 2008 YouTube blackhole, the 2010 China Telecom incident intercepting US traffic) exploits this. RPKI (Resource Public Key Infrastructure) cryptographically validates that an AS is authorised to announce a given prefix.

Email security stack: SPF (Sender Policy Framework) lists authorised sending IPs for a domain in DNS TXT records; DKIM (DomainKeys Identified Mail) adds a signature over the message header and body; DMARC (Domain-based Message Authentication, Reporting, and Conformance) specifies policy when SPF/DKIM fail. Together they mitigate email spoofing, though DMARC adoption is incomplete.


Chapter 4: Secure Messaging and Anonymity

Transport-layer encryption (TLS) protects data between a client and a server, but the server sees plaintext. End-to-end encryption (E2EE) means only the communicating endpoints can read messages — the service provider sees only ciphertext.

Off-the-Record Messaging

OTR (Off-the-Record Messaging), designed by Borisov, Goldberg, and Brewer (2004), pioneered the properties now expected of secure messengers:

  • Perfect forward secrecy: a new ephemeral Diffie-Hellman key pair is negotiated for each conversation session. Compromise of long-term keys does not expose past messages.
  • Deniability: after a message is sent, OTR publishes the MAC key used to authenticate it. Since anyone who saw the MAC key could forge a valid-looking message, the receiver cannot prove to a third party that the sender actually sent it. This contrasts with digital signatures, which are cryptographically non-repudiable.
  • Key continuity: OTR keeps a “key fingerprint” to detect man-in-the-middle attacks across sessions.

Signal Protocol

The Signal Protocol (Marlinspike and Perrin) is the basis for Signal, WhatsApp, and Facebook Messenger’s secret conversations. It extends OTR with stronger security properties.

X3DH (Extended Triple Diffie-Hellman) is the initial key agreement: the initiator combines four DH computations involving identity keys, signed pre-keys, and one-time pre-keys, achieving mutual authentication and forward secrecy from the first message.

The Double Ratchet Algorithm manages ongoing message keys:

  • A Diffie-Hellman ratchet advances whenever a new DH public key is received, providing break-in recovery (future messages are secure even if a session key is compromised).
  • A symmetric ratchet (chain key advancing via HMAC) derives individual message keys, providing forward secrecy within a DH epoch.

Sealed sender hides the sender’s identity from the Signal server. Safety numbers (key fingerprints) allow out-of-band verification to detect MITM.

Remark (Why the Double Ratchet achieves both forward secrecy and break-in recovery): These two properties are normally in tension, and the Double Ratchet's key insight is to achieve both simultaneously by combining two different ratchet mechanisms operating at different timescales.

Forward secrecy means that compromise of a current key does not expose past messages. The symmetric ratchet (KDF chain) achieves this within a DH epoch: each message key is derived from the chain key via a one-way KDF, and the chain key is immediately advanced and the old key deleted. An attacker who captures the chain key at time \(t\) can derive message keys at times \(t, t+1, t+2, \ldots\) but cannot reverse the KDF to recover message keys at times \(t-1, t-2, \ldots\) — those are forward secret.

Break-in recovery (post-compromise security or future secrecy) means that a recovered session state becomes secure again once new DH keys are exchanged. The DH ratchet achieves this: each time Alice receives a message with a new DH public key from Bob, she advances the DH ratchet — computing a new ECDH shared secret incorporating fresh randomness that the attacker who compromised the old state does not know. This injects new unpredictable entropy into the chain key, “healing” the compromise for all future messages.

The combination is elegant: the symmetric ratchet provides fine-grained, efficient per-message forward secrecy (one KDF call per message, no new DH computation), while the DH ratchet provides occasional break-in recovery (one new DH key pair per received message). The protocol’s security analysis proves that an attacker who compromises the state at step \(t\) and then observes all future transmissions cannot distinguish any message sent after the next DH ratchet step from a uniformly random string.

Tor Network

Onion routing provides network-layer anonymity by layering encryption across a series of relays. The Tor design (Dingledine, Mathewson, Syverson 2004) uses a three-hop circuit:

  1. Guard relay (entry node): knows the client’s IP but not the destination.
  2. Middle relay: knows neither the client’s IP nor the destination.
  3. Exit relay: knows the destination but not the client’s IP.

Each relay decrypts one layer of encryption (like peeling an onion). The client extends the circuit incrementally using Diffie-Hellman with each relay. Traffic between relays uses TLS; cells are fixed-size (512 bytes) to limit traffic analysis.

.onion hidden services route traffic through a rendezvous point, keeping both client and server IP addresses hidden. The service’s address is the hash of its public key.

Tor’s limitations: a global passive adversary who can observe both ends of a circuit can perform timing correlation to deanonymise users. Website fingerprinting (see below) also undermines Tor’s privacy guarantees for web browsing.

Encrypted Traffic Analysis and Website Fingerprinting

Even when traffic is encrypted, an adversary observing packet sizes, inter-arrival times, and directions can fingerprint which website a user is visiting — including over Tor. The intuition is that each web page has a characteristic “loading profile” of objects, sizes, and timing.

Modern attacks use machine learning classifiers trained on traffic traces. Closed-world accuracy rates above 95% are achievable in research settings. The WF-GANs and deep-learning attacks (e.g., DF attack) show that Tor’s fixed cell size is insufficient.

Defences include:

  • Traffic shaping: padding packets to fixed-length patterns
  • Constant-rate cover traffic: send dummy packets to create uniform traffic
  • WTF-PAD (Website Traffic Fingerprinting Padding): adaptive padding framework; deployed in Tor Browser

Network Steganography

Steganography hides the existence of a communication (unlike encryption, which hides its content). Network steganography encodes covert messages in protocol fields (unused IP header bits, TCP ISN, timing intervals) or traffic patterns. Applications include covert channels in corporate networks and censorship circumvention.

Censorship Circumvention

The Great Firewall of China (GFW) employs deep packet inspection (DPI) to identify and block Tor, VPNs, and other circumvention traffic by their protocol fingerprints. Techniques include IP/domain blocklists, TLS fingerprinting, active probing of suspected proxies, and DNS poisoning.

Pluggable transports disguise Tor traffic as innocent-looking traffic:

  • obfs4: adds random-looking padding and handshake to resist DPI
  • meek: encapsulates traffic in HTTPS requests to CDN-hosted domain-fronted servers (e.g., Google, Cloudflare), making blocking require blocking the CDN itself
  • Shadowsocks: lightweight SOCKS5 proxy with symmetric encryption; widely used in China

The censorship circumvention landscape is a continuous cat-and-mouse game: as each transport is fingerprinted and blocked, new ones are developed.


Chapter 5: Malware and Blockchain

Malware Taxonomy

Malicious software encompasses a range of threat categories. A virus attaches to legitimate programs and replicates when executed. A worm self-replicates across networks without user interaction (e.g., WannaCry, 2017). A Trojan masquerades as legitimate software. Ransomware encrypts victim files and demands payment (often in cryptocurrency). A rootkit hides malicious processes by subverting the OS kernel. Spyware exfiltrates data silently. Botnets are networks of compromised machines (bots) controlled remotely.

Botnet Architecture

A botnet requires a command-and-control (C&C) channel:

  • Centralised (IRC/HTTP): simple, but single point of failure; takedown of the C&C server incapacitates the botnet
  • Decentralised (P2P): resilient; bots find peers via DHT; harder to sinkhole
  • Fast-flux DNS: rapidly rotating DNS records point to proxy bots, obscuring the true C&C location; double-flux rotates both A records and NS records

Sinkholing redirects botnet C&C domains to a controlled server, allowing researchers to count infected machines and cut off botmaster control. McColo Corp (2008): a hosting provider used by major spam botnets; when two upstream ISPs severed connectivity, global spam volume dropped approximately 75% within hours — demonstrating how much spam infrastructure was centralised there.

Ransomware Economics

Modern ransomware (REvil, DarkSide, LockBit) operates as Ransomware-as-a-Service (RaaS): the ransomware authors lease the malware and infrastructure to affiliates who conduct intrusions and split the ransom. Ransoms are denominated in Bitcoin or Monero for pseudonymity.

Double extortion emerged around 2019: attackers not only encrypt files but exfiltrate data and threaten to publish it, giving victims an additional reason to pay even if they have backups. The correct defensive posture includes offline backups (air-gapped or immutable), network segmentation, and IR planning.

Example (WannaCry ransomware — cryptographic analysis): The WannaCry worm (May 2017) infected over 200,000 systems in 150 countries in 72 hours, exploiting the EternalBlue SMB vulnerability (developed by the NSA, leaked by the Shadow Brokers group). Its encryption scheme illustrates both competent and incompetent cryptographic design.

Encryption: WannaCry generates a per-file AES-128 key, encrypts the file in CBC mode, then RSA-encrypts the AES key with the attacker’s public key (RSA-2048). Only the attacker’s private key can recover the AES keys, and hence the files. The AES-128 with fresh per-file keys and CBC mode was reasonably implemented — there was no practical weakness in this layer.

The catastrophic flaw: WannaCry used the Windows CryptGenRandom API to generate AES keys. On certain versions of Windows XP, this API had a known weakness where the PRNG state was poorly seeded. Security researchers (Guinet et al.) discovered that WannaCry generated the AES keys but also kept the prime factorisation of the RSA private key temporarily in memory and did not zero-clear it. By scanning process memory before files were deleted, researchers built a tool (WannaKey) that could reconstruct the RSA private key from memory — allowing decryption without paying the ransom, on unpatched XP systems that had not been rebooted.

The WannaCry kill switch — a single hardcoded domain that the worm queried and halted if resolvable — was discovered by researcher Marcus Hutchins, who registered the domain for $10.69 and thereby activated the kill switch globally, halting the spread within hours of its discovery. This single design oversight (the kill switch existed to detect analysis sandboxes, which often redirect all DNS queries to a local host) allowed the worm’s spread to be arrested without defeating its encryption.

Remark (Cryptocurrency and ransomware economics): The rise of ransomware as a dominant threat vector in the 2018–2024 period is inseparable from the maturation of cryptocurrency payment infrastructure. Pre-Bitcoin ransomware (e.g., CryptoLocker 2013, which popularised the RaaS model) initially used PaySafeCard and MoneyPak vouchers — traceable and seized by law enforcement. Bitcoin provided pseudonymous, globally accessible, irreversible payments without requiring the attacker to maintain traditional payment infrastructure.

The economics are striking: the Colonial Pipeline attack (DarkSide, 2021) resulted in a ransom of 75 Bitcoin (~$4.4M at the time), of which approximately 63.7 Bitcoin was later recovered by the FBI using a private key obtained through an investigation. The recovery demonstrated that Bitcoin is pseudonymous, not anonymous — blockchain analysis firms (Chainalysis, Elliptic) can trace transaction flows through the public ledger. Monero, which uses ring signatures, stealth addresses, and confidential transactions to provide strong anonymity, has become increasingly preferred for ransomware payments precisely because it is harder to trace. The OFAC (Office of Foreign Assets Control) has sanctioned specific cryptocurrency addresses associated with ransomware operators, making it illegal for US entities to pay ransoms to those addresses — though enforcement against international affiliates remains difficult.

Blockchain and Bitcoin

The Bitcoin whitepaper (Nakamoto 2008) describes a decentralised, trustless ledger secured by proof-of-work.

Data structure: transactions are grouped into blocks. Each block contains a Merkle tree of transactions (efficient membership proofs), a hash pointer to the previous block, a nonce, and a timestamp. The chain of hashes makes tampering computationally infeasible: altering a past block requires re-mining that block and all subsequent blocks.

Proof-of-Work: to add a block, a miner must find a nonce such that \(H(\text{block header}) < \text{target}\). The target is adjusted every 2016 blocks so that blocks arrive approximately every 10 minutes. Mining is probabilistic; the expected work grows exponentially as the target decreases.

The UTXO (Unspent Transaction Output) model tracks coin ownership: each transaction consumes UTXOs as inputs and creates new UTXOs as outputs. There are no accounts; identity is a public key.

Double-spend attack (51% attack): an adversary controlling more than half the network’s hash rate can build a longer secret chain and later broadcast it, reversing confirmed transactions. The probability of success decreases exponentially with the number of confirmations.

Ethereum and Proof-of-Stake

Ethereum extends Bitcoin with smart contracts — Turing-complete programs stored and executed on-chain in the EVM (Ethereum Virtual Machine). Contracts are written in Solidity. Each computation step costs gas, preventing infinite loops and incentivising efficiency.

The Ethereum Merge (September 2022) transitioned Ethereum from proof-of-work to Proof-of-Stake. Validators replace miners: to propose and attest to blocks, validators stake 32 ETH as collateral. Dishonest behaviour is punished by slashing (burning staked ETH). Energy consumption fell by approximately 99.95%.

Known Ethereum smart contract vulnerabilities:

  • Reentrancy: a contract calls an external contract before updating its state; the external contract re-enters and drains funds. The DAO hack (2016) exploited this, stealing approximately 60M USD.
  • Integer overflow/underflow: Solidity 0.7 introduced automatic overflow checking; earlier contracts used SafeMath libraries.
  • Front-running (MEV): miners (now validators) can observe pending transactions and insert their own transactions ahead of them.

Mining Pool Centralisation

Solo mining has extremely high variance: a miner with 0.1% of hash rate expects one block every 16 days on average but may wait months. Mining pools aggregate hash power, sharing rewards proportionally to submitted work (shares). The stratum protocol coordinates miners with the pool server.

The consequence is extreme centralisation: a handful of pools (AntPool, F2Pool, Foundry) collectively control well over 50% of Bitcoin hash rate. Ghash.io briefly exceeded 50% in 2014, demonstrating the theoretical feasibility of a 51% attack. This centralisation undermines Bitcoin’s decentralisation premise.


Chapter 6: Inference Attacks and Syntactic Privacy

The Re-identification Problem

Releasing data with direct identifiers removed (name, SSN, email) is intuitively “anonymised” but frequently inadequate. Quasi-identifiers — combinations of seemingly innocuous attributes — can uniquely identify individuals.

AOL search logs (2006): AOL released 20 million search queries with user IDs replaced by random numbers. New York Times reporters identified user 4417749 as Thelma Arnold by correlating her searches for her name, neighbours, and medical conditions.

Netflix Prize dataset (Narayanan and Shmatikoff 2008): Netflix released 500,000 users’ movie ratings with usernames replaced by random IDs. Using the public IMDb database as auxiliary information, just two movie ratings with approximate dates were sufficient to uniquely identify 99% of records in the dataset. The attack used a de-anonymisation algorithm based on the principle that an individual’s ratings are a unique fingerprint.

NYC Taxi dataset (2014): the NYC Taxi and Limousine Commission released trip data with medallion numbers hashed via MD5. Because the space of valid medallion numbers was small (a few hundred thousand), GPU-based rainbow tables reversed all hashes in hours, linking trips to specific taxis and drivers.

Dataset Reconstruction Attacks

Given a statistical database that answers aggregate queries (counts, sums, means), can an attacker reconstruct the individual records? The Dinur-Nissim reconstruction theorem (2003) shows that if an adversary can issue \(O(n \log n)\) queries over \(n\) binary-valued records and observes the responses with no noise added, then a near-exact reconstruction of the database is possible via linear programming relaxation. This implies that publishing accurate statistics without any noise is fundamentally incompatible with privacy — a key motivator for differential privacy.

k-Anonymity

Definition (k-Anonymity): A dataset satisfies k-anonymity (Sweeney 2002) if every record is indistinguishable from at least \(k-1\) other records on the set of quasi-identifier attributes. That is, within each equivalence class defined by quasi-identifier values, there are at least \(k\) records. Achieving k-anonymity involves: generalisation (replacing specific values with less specific ones, e.g., exact age → age range) and suppression (removing records or attribute values entirely).

k-Anonymity prevents linking attacks when the adversary knows an individual’s quasi-identifiers. However, it has critical weaknesses.

Example (Full k-anonymisation walkthrough): Consider a medical dataset with three records:
NameAgeGenderZIP CodeSalary
Alice30F1000180K
Bob32M1000190K
Carol28F1000275K

The quasi-identifiers are \(\{\text{age}, \text{gender}, \text{ZIP code}\}\). As-is, each record is unique on these quasi-identifiers — an adversary who knows Alice’s age, gender, and ZIP can immediately identify her row and infer her salary.

To achieve \(k=2\) anonymity, we apply two transformations: (1) generalise age to the decade range [25–35]; (2) suppress the gender column entirely; (3) generalise ZIP code to the prefix 100** (masking the last two digits). The transformed dataset is:

Age RangeGenderZIP PrefixSalary
[25–35]100**80K
[25–35]100**90K
[25–35]100**75K

Every record now matches all three other records on the quasi-identifiers — in fact we have \(k = 3\) since all three fall in the same generalised equivalence class. A linking attack using only (age range, ZIP prefix) cannot distinguish Alice from Bob or Carol.

However, \(k\)-anonymity does not protect the sensitive attribute. Alice’s salary of 80K is still unique within the equivalence class. An attacker who knows that a person in this class earns 80K can immediately identify them. This residual disclosure is what \(\ell\)-diversity and \(t\)-closeness attempt to address: they require that the sensitive attribute (salary) is sufficiently diverse or distributed within each equivalence class, not merely that the quasi-identifiers are generalised.

Homogeneity attack: if all \(k\) records in an equivalence class share the same sensitive attribute value, the adversary learns that value with certainty, even without a linking attack.

Background knowledge attack: if the adversary knows something about an individual that further constrains which records in the equivalence class could be theirs, even diverse sensitive values may be disclosed.

Remark (The AOL de-anonymisation and its lessons): In August 2006, AOL Research released a dataset of 20 million web search queries from 658,000 users over a three-month period, replacing usernames with random numeric IDs. The release was intended to support academic research. Within days, New York Times reporters Barbaro and Zeller identified user 4417749 as 62-year-old Thelma Arnold of Lilburn, Georgia. The identification relied entirely on search queries: "landscapers in Lilburn, Ga", "homes sold in shadow lake subdivision gwinnett county georgia", and searches for her friends by name. No traditional quasi-identifiers (ZIP code, age, gender) were used at all — the search queries themselves served as quasi-identifiers. This incident demonstrated that the set of quasi-identifiers cannot be fixed in advance: in a world of rich auxiliary data, almost any attribute can serve to re-identify. It contributed directly to the research community's shift from k-anonymity toward differential privacy, which provides guarantees that hold regardless of what auxiliary information the adversary possesses.
Example (k-anonymity failure under composition): Suppose a hospital releases two datasets about the same population, both satisfying k-anonymity with \(k=3\). Dataset A contains \(\{\text{gender}, \text{ZIP code}, \text{diagnosis}\}\) and Dataset B contains \(\{\text{age}, \text{ZIP code}, \text{diagnosis}\}\). Each dataset separately satisfies \(k=3\): within each equivalence class defined by (gender, ZIP) in Dataset A, there are at least 3 records; similarly for Dataset B.

Now an adversary links the two datasets on ZIP code. The combined quasi-identifier is \(\{\text{gender}, \text{age}, \text{ZIP code}\}\). Even if each individual dataset had \(k=3\), the combined equivalence classes may now contain only 1 or 2 records — uniquely identifying individuals. The k-anonymity guarantee for each dataset provides no protection against this composition attack. This failure mode is precisely why k-anonymity has no meaningful composition theorem, and why differential privacy — which composes predictably — is preferred.

l-Diversity

Definition (l-Diversity): A dataset satisfies l-diversity (Machanavajjhala et al. 2006) if within each equivalence class defined by quasi-identifier values, the sensitive attribute has at least \(l\) "well-represented" distinct values. Three variants exist: distinct l-diversity (at least \(l\) distinct values); entropy l-diversity (\(-\sum_s p_s \log p_s \geq \log l\) within each class, where \(p_s\) is the fraction of records with sensitive value \(s\)); and recursive (c, l)-diversity (the most frequent value appears less than \(c\) times the sum of the remaining \(l-1\) most frequent values).
Example (l-diversity in practice): Consider a medical dataset with sensitive attribute diagnosis taking values \(\{\text{cancer}, \text{flu}, \text{cold}\}\). Suppose the dataset satisfies \(k=3\) anonymity. An equivalence class might contain three records: all three with diagnosis "cancer". This class satisfies \(k=3\) but has only 1 distinct sensitive value. An adversary who knows a target individual falls in this equivalence class can conclude with certainty that the target has cancer — a complete disclosure of sensitive information.

Distinct 3-diversity requires each equivalence class to contain all three diagnoses (cancer, flu, cold) — at least one record with each. Under this constraint, even knowing the target’s equivalence class, the adversary can only say the diagnosis is one of three possibilities, limiting information gain. However, l-diversity still does not account for the overall distribution: if 95% of the population has cancer, then even a “diverse” equivalence class with 1 cancer and 2 flu/cold records still reveals that the target probably has cancer. This motivates t-closeness.

t-Closeness

Definition (t-Closeness): An equivalence class satisfies t-closeness (Li et al. 2007) if the distribution of sensitive attribute values within the class is within Earth Mover's Distance (or total variation distance) \(t\) of the overall distribution of the sensitive attribute across the full dataset. A table satisfies t-closeness if every equivalence class does. Smaller \(t\) means the within-class distribution is closer to the global distribution, limiting what can be inferred from knowing one's equivalence class.
Remark (Why t-closeness is still insufficient): Even t-closeness is insufficient in general. It requires the within-class distribution to resemble the global distribution, but the global distribution itself may be informative. Moreover, t-closeness provides no composition guarantee: two t-close releases can jointly violate privacy by intersecting the distributions. It also provides no formal adversary model — unlike differential privacy, t-closeness does not specify what the adversary knows before seeing the data. These limitations motivated the field's convergence on differential privacy as the principled foundation for data release.
Example (Membership inference against machine learning models): A membership inference attack asks: given black-box access to a trained machine learning model \(f_\theta\) and a data record \(x\), was \(x\) in the training set? The attack exploits overfitting and memorisation: models tend to assign higher confidence to examples they were trained on than to held-out examples.

Shokri et al. (2017) operationalised this into a full attack pipeline. The attacker first trains \(k = 64\) shadow models that mimic the target model’s behaviour — each shadow model is trained on a different random subset of a dataset drawn from the same distribution as the target’s training data. For each record in each shadow model’s training and test sets, the attacker observes the confidence vector \(f_\text{shadow}(x) \in [0,1]^C\) (where \(C\) is the number of classes). Shadow models trained on \(x\) tend to produce higher-confidence, more peaked vectors; shadow models not trained on \(x\) produce flatter, lower-confidence outputs. These (confidence vector, member/non-member) pairs become the training data for a binary attack classifier.

Against the target model, the attacker queries \(f_\theta(x)\), feeds the resulting confidence vector to the attack classifier, and obtains a membership prediction. Shokri et al. achieved over 80% attack accuracy on neural networks trained on Purchase-100 and Texas-100 hospital discharge datasets — well above the 50% random baseline. Defences include: output regularisation (temperature scaling, label smoothing), restricting the API to top-1 class labels only (removing confidence scores), and DP-SGD, which provides formal membership inference resistance proportional to \(e^\varepsilon\). The formal connection is tight: any \((\varepsilon, \delta)\)-DP mechanism bounds the adversary’s membership advantage to at most \(e^\varepsilon - 1 + \delta\).

Limitations of Syntactic Approaches

Syntactic privacy models share fundamental shortcomings. They provide no formal adversary model and offer no composability: releasing two k-anonymous datasets can jointly violate privacy. The set of quasi-identifiers must be specified in advance; in a world of rich auxiliary data, virtually any attribute combination can serve as a quasi-identifier. These limitations motivate the rigorous probabilistic framework of differential privacy.


Chapter 7: Differential Privacy and Private Machine Learning

Intuition and Formal Definition

Differential privacy formalises the intuition that no individual’s participation in a dataset should significantly change what an analyst can learn. Think of it as a promise: “your data could have been in or out of the dataset, and the output barely changes either way.”

Definition (\((\varepsilon, \delta)\)-Differential Privacy): A randomised mechanism \(M: \mathcal{D} \to \mathcal{R}\) satisfies \((\varepsilon, \delta)\)-differential privacy (Dwork, McSherry, Nissim, Smith — ICALP 2006) if for all pairs of adjacent datasets \(D, D'\) (differing in exactly one individual's record) and all measurable output sets \(S \subseteq \mathcal{R}\): \[ \Pr[M(D) \in S] \leq e^\varepsilon \cdot \Pr[M(D') \in S] + \delta. \] Pure DP (\(\delta = 0\)) is the strongest guarantee: \(e^\varepsilon\) bounds the likelihood ratio exactly. Approximate DP (\(\delta > 0\)) allows an \(\varepsilon\)-DP guarantee to fail with probability \(\delta\) (typically \(\delta \ll 1/n\)). The parameter \(\varepsilon\) (epsilon) is the privacy budget: smaller \(\varepsilon\) means stronger privacy but more noise.

The two datasets \(D\) and \(D'\) are called neighbouring or adjacent. In the “add/remove” model, \(D'\) is obtained from \(D\) by adding or removing one person’s record. In the “substitution” model, \(D'\) replaces one person’s record with a different one. The two models differ slightly in their noise requirements but are equivalent up to a factor of 2 in \(\varepsilon\).

Example (Randomised response — local DP in one bit): An epidemiologist wants to estimate what fraction of students cheated on a take-home exam. Direct self-report is unreliable because students fear punishment. The randomised response mechanism (Warner 1965) provides privacy: each student privately flips a fair coin. If the coin is heads, the student answers truthfully. If the coin is tails, the student flips a second coin and answers YES (cheated) if heads, NO (did not cheat) if tails.

From the student’s perspective, privacy is guaranteed: regardless of the true answer, the student’s reported response is YES with probability at least \(1/4\) and NO with probability at least \(1/4\). No single response can be taken as definitive evidence. Formally, let the true probability of cheating be \(p\). The expected fraction of YES responses is

\[ \Pr[\text{YES}] = p \cdot \tfrac{1}{2} + (1-p) \cdot \tfrac{1}{4} + p \cdot \tfrac{1}{4} = \tfrac{p}{2} + \tfrac{1}{4}. \]

Let \(\hat{q}\) denote the observed fraction of YES answers. Solving for \(p\): \(p = 2\hat{q} - \tfrac{1}{2}\). This is an unbiased estimator; with \(n\) students the standard error is \(O(1/\sqrt{n})\) — only a constant factor worse than direct surveying.

What is the formal DP guarantee? For any individual, the ratio of probabilities of saying YES versus NO is at most \(3:1\) (truthful YES has probability \(3/4\), truthful NO has probability \(1/4\)). Therefore this mechanism is \(\varepsilon\)-DP in the local model with \(\varepsilon = \ln 3 \approx 1.1\) — a strong, interpretable guarantee. Randomised response is arguably the oldest known local DP mechanism, predating the formal definition by four decades.

Sensitivity and Basic Mechanisms

Definition (Global L1 sensitivity): The global L1 sensitivity of a function \(f: \mathcal{D} \to \mathbb{R}^k\) is \[ \Delta f = \max_{D, D' \text{ adjacent}} \|f(D) - f(D')\|_1. \] It measures the maximum change in the function's output caused by adding or removing one individual's data. Queries with low sensitivity require less noise to privatise.
Definition (Laplace mechanism): To answer a query \(f: \mathcal{D} \to \mathbb{R}^k\) with sensitivity \(\Delta f\), the Laplace mechanism adds independent noise drawn from the Laplace distribution: \[ M(D) = f(D) + \left(Z_1, \ldots, Z_k\right), \quad Z_i \sim \mathrm{Lap}(0,\, \Delta f / \varepsilon). \] The Laplace distribution \(\mathrm{Lap}(0, b)\) has PDF \(\frac{1}{2b} e^{-|x|/b}\), mean 0, and variance \(2b^2\). The Laplace mechanism achieves \(\varepsilon\)-DP (pure differential privacy).
Example (Laplace mechanism for a counting query): Suppose we want to answer the query \(f(D) = |\{u \in D : \text{age}(u) > 30\}|\) — the number of users in the dataset older than 30. Adding or removing one user changes the count by at most 1, so the global sensitivity is \(\Delta f = 1\).

For a privacy parameter \(\varepsilon = 0.5\), the Laplace mechanism adds noise \(Z \sim \mathrm{Lap}(0, 1/0.5) = \mathrm{Lap}(0, 2)\). Suppose the true count is \(f(D) = 150\). The mechanism outputs \(150 + Z\), where \(Z\) might take values such as \(-1.7\) (giving output \(148.3\)) or \(+2.7\) (giving \(152.7\)). The expected absolute error is \(\mathbb{E}[|Z|] = b = 1/\varepsilon = 2\). Increasing \(\varepsilon\) to 1.0 would halve the noise scale to \(\mathrm{Lap}(0,1)\), halving the expected error — at the cost of weaker privacy.

The analyst rounds the output to the nearest integer and reports it as the approximate count. Crucially, with high probability the output is within a few units of the true count, yet the guarantee holds: for any two adjacent datasets \(D, D'\), the ratio of output probabilities is bounded by \(e^\varepsilon = e^{0.5} \approx 1.65\).

Definition (Gaussian mechanism): For a query with L2 sensitivity \(\Delta_2 f = \max_{D,D'} \|f(D) - f(D')\|_2\), the Gaussian mechanism adds noise \(Z_i \sim \mathcal{N}(0, \sigma^2)\) where \(\sigma = \Delta_2 f \cdot \sqrt{2 \ln(1.25/\delta)} / \varepsilon\). This achieves \((\varepsilon, \delta)\)-DP. The Gaussian mechanism is preferred when composing many queries because it admits tighter composition bounds via the moments accountant.

Exponential mechanism (McSherry-Talwar): for non-numeric outputs, sample an output \(r\) with probability proportional to \(\exp(\varepsilon \cdot q(D, r) / 2\Delta q)\), where \(q\) is a quality score. Achieves \(\varepsilon\)-DP while maximising expected quality.

Composition Theorems

Differential privacy composes gracefully, a major advantage over syntactic approaches.

Sequential composition: running \(k\) mechanisms \(M_1, \ldots, M_k\) each with budget \(\varepsilon_i\) on the same dataset yields \((\sum \varepsilon_i)\)-DP.

Parallel composition: if each mechanism \(M_i\) operates on a disjoint subset of the data, the combined mechanism is \(\max_i \varepsilon_i\)-DP.

Advanced composition (Dwork-Rothblum-Vadhan): for \(k\) applications of an \((\varepsilon, \delta)\)-DP mechanism, the composition achieves approximately \((\varepsilon \sqrt{2k \log(1/\delta')}, k\delta + \delta')\)-DP — significantly better than the naive \(k\varepsilon\) bound.

The moments accountant (Abadi et al.) and Rényi DP framework (Mironov) give even tighter composition bounds, enabling practical DP-ML training with reasonable noise levels.

Theorem (Group privacy): If \(M\) satisfies \(\varepsilon\)-DP, then for any two datasets \(D\) and \(D'\) that differ in at most \(k\) records, and for all measurable \(S\): \[ \Pr[M(D) \in S] \leq e^{k\varepsilon} \cdot \Pr[M(D') \in S]. \] That is, the privacy guarantee degrades gracefully when a group of up to \(k\) individuals is removed — the bound becomes \(e^{k\varepsilon}\) instead of \(e^\varepsilon\).
Proof sketch. Since \(D\) and \(D'\) differ in \(k\) records, we can construct a chain of adjacent datasets \(D = D_0, D_1, \ldots, D_k = D'\) where each consecutive pair \((D_{i-1}, D_i)\) differs in exactly one record. Applying the \(\varepsilon\)-DP guarantee at each step: \[ \Pr[M(D_0) \in S] \leq e^\varepsilon \Pr[M(D_1) \in S] \leq e^{2\varepsilon} \Pr[M(D_2) \in S] \leq \cdots \leq e^{k\varepsilon} \Pr[M(D_k) \in S]. \] The ratios telescope multiplicatively, giving the stated bound. This is why, in practice, group privacy is expensive: protecting a group of \(k\) people with budget \(\varepsilon\) requires the per-person privacy parameter to be only \(\varepsilon/k\), demanding \(k\) times more noise.
Remark (Amplification by subsampling): If a mechanism \(M\) is \(\varepsilon\)-DP and we apply it not to the full dataset \(D\) of \(n\) records but to a uniformly random subsample of \(m\) records (i.e., sampling fraction \(\gamma = m/n\)), the composed mechanism \(M \circ \text{Subsample}_\gamma\) satisfies approximately \((2\gamma\varepsilon)\)-DP when \(\varepsilon \leq 1\). More precisely, it satisfies \((\log(1 + \gamma(e^\varepsilon - 1)))\)-DP, which is approximately \(\gamma\varepsilon\)-DP for small \(\varepsilon\).

Intuitively: if a record is only included in the subsample with probability \(\gamma\), then with probability \(1 - \gamma\) the mechanism does not even see that record, bounding its influence. This privacy amplification lemma is exploited heavily in two settings: (1) DP-SGD subsamples a mini-batch of size \(B\) from \(n\) training examples at each gradient step, giving amplification factor \(\gamma = B/n\); (2) Apple’s local DP deployment collects data from only a fraction of devices per day, obtaining amplification from the subsampling. The amplification lemma means that a mechanism with moderate per-step \(\varepsilon\) can achieve strong end-to-end guarantees after many steps, which is what makes differentially private machine learning feasible.

Example (Composition theorem in practice — a privacy budget audit): A company runs a survey platform and releases 10 separate aggregate statistics about its users, each computed with the Laplace mechanism at \(\varepsilon_i = 0.1\). By the sequential composition theorem, the total privacy cost is \(\sum_{i=1}^{10} \varepsilon_i = 10 \times 0.1 = 1.0\). This means the combined release satisfies \(1.0\)-DP — equivalent to a single mechanism with \(\varepsilon = 1.0\).

If the company wanted to release a 11th statistic, it would have already consumed its full “privacy budget” (assuming the pre-set limit was \(\varepsilon_{\text{total}} = 1.0\)). To release more queries while remaining within budget, the company can either increase the noise scale (lowering accuracy) or use advanced composition, which for 10 mechanisms with \(\varepsilon = 0.1\) and \(\delta = 10^{-6}\) gives a combined budget of approximately \(0.1\sqrt{20 \ln(10^6)} \approx 0.1 \times 5.25 = 0.525\) — much better than the naive bound of 1.0, enabling roughly 4× more queries within the same budget. This is the practical meaning of the “privacy budget” metaphor: you can only make finitely many queries before the cumulative privacy loss becomes unacceptable.

Theorem (Post-processing immunity): If \(M: \mathcal{D} \to \mathcal{R}\) is \((\varepsilon, \delta)\)-DP and \(f: \mathcal{R} \to \mathcal{S}\) is any (possibly randomised) function, then \(f \circ M: \mathcal{D} \to \mathcal{S}\) is also \((\varepsilon, \delta)\)-DP.
Proof sketch. Let \(S \subseteq \mathcal{S}\) be any measurable set. Observe that \(\{f \circ M(D) \in S\} = \{M(D) \in f^{-1}(S)\}\). Since \(M\) is \((\varepsilon,\delta)\)-DP, applying the DP guarantee to the measurable set \(f^{-1}(S) \subseteq \mathcal{R}\) gives: \[ \Pr[f(M(D)) \in S] = \Pr\!\left[M(D) \in f^{-1}(S)\right] \leq e^\varepsilon \Pr\!\left[M(D') \in f^{-1}(S)\right] + \delta = e^\varepsilon \Pr[f(M(D')) \in S] + \delta. \] The key insight is that \(f^{-1}(S)\) is simply another measurable subset of \(\mathcal{R}\), so the DP guarantee applies to it directly. Since this holds for all \(S\), \(f \circ M\) is \((\varepsilon, \delta)\)-DP.

Post-processing immunity has major practical consequences: any downstream computation on a DP output — whether it is averaging, rounding, running a classification algorithm, or publishing a report — cannot degrade the privacy guarantee. Privacy is “baked in” to the output of \(M\), regardless of how the output is subsequently used.

Local vs. Central Differential Privacy

Remark (Central vs. local DP): In the central model, a trusted curator collects raw data from all individuals, computes \(f(D)\), adds noise calibrated to \(\Delta f / \varepsilon\), and releases the noisy answer. The noise scale is \(O(\Delta f / \varepsilon)\) — independent of the number of users \(n\). This achieves high accuracy.

In the local model, there is no trusted curator. Each individual applies a DP randomiser to their own data before sending it to the collector, who aggregates the noisy reports. Because each individual’s report is already \(\varepsilon\)-DP, the collector learns nothing beyond what is deducible from the noisy reports. However, the price is steep: to estimate a simple proportion (fraction of users with a property), the standard deviation of the local DP estimator scales as \(\Theta(e^\varepsilon / (e^\varepsilon - 1)) \cdot 1/\sqrt{n}\), whereas the central DP estimator has error \(O(1/(\varepsilon n))\) — effectively, local DP requires \(\Omega(e^{2\varepsilon})\) times as many users to achieve the same accuracy. Google’s RAPPOR (2014) and Apple’s iOS keyboard/emoji collection use local DP, accepting higher noise in exchange for not needing to trust a central server.

Remark (Apple's deployment of differential privacy): Apple has deployed local differential privacy in iOS since iOS 10 (2016) to collect aggregate statistics about keyboard usage, emoji usage, Health app data, and browser crash reporting — without sending raw user data to Apple's servers. Each device applies a locally DP mechanism before transmitting data. Apple has reported using \(\varepsilon \approx 14\) per day for some of these pipelines, which is high by academic standards (most papers consider \(\varepsilon \leq 1\) strong, \(\varepsilon \leq 8\) moderate). Apple justifies this by arguing that: (1) the data is used only for aggregate statistics, (2) each day's budget is independent (no persistent accumulation across days in their model), and (3) the specific data types (emoji frequency) are low-sensitivity. The choice of \(\varepsilon\) is ultimately a policy decision made by Apple, not a mathematical determination — the mathematics only tells you the privacy-utility tradeoff for a given \(\varepsilon\), not what \(\varepsilon\) is "good enough." This highlights a genuine challenge in deploying DP: translating the abstract \(\varepsilon\) parameter into a meaningful statement about real-world privacy risk.

Real-World Deployments

OrganisationApplication\(\varepsilon\)
US Census Bureau 2020Decennial census publication\(\approx 17.14\) total budget
Google (RAPPOR)Chrome browser statisticsVaries per query
Apple (iOS)Keyboard suggestions, emoji\(\approx 8\) (estimated)
LinkedInAudience engagement APINot disclosed

The Census deployment sparked debate: an \(\varepsilon = 17.14\) budget distributed across many publications results in substantial noise on small geographic areas, affecting redistricting and federal funding allocation.

Differentially Private Machine Learning

Training a neural network on sensitive data risks memorisation: models can inadvertently encode and leak individual training examples (membership inference, model inversion).

DP-SGD (Abadi et al. 2016): makes stochastic gradient descent differentially private:

  1. Per-example gradient clipping: clip each individual’s gradient to L2 norm \(C\) (bounding sensitivity).
  2. Noise addition: add Gaussian noise \(\mathcal{N}(0, \sigma^2 C^2 \mathbf{I})\) to the summed gradient.
  3. Moments accountant: track \(\varepsilon\) accumulation across training steps with tight composition.

The privacy-utility trade-off is significant: DP-SGD models often have lower accuracy than non-private counterparts, especially at small \(\varepsilon\). Privacy amplification by subsampling (mini-batches) and amplification by shuffling help recover some utility.

Example (DP-SGD for neural network training): Suppose we want to train a text prediction model on a private corpus of \(n = 10{,}000\) user messages. We set the clipping norm \(C = 1.0\), noise multiplier \(\sigma = 1.1\), mini-batch size \(B = 256\), and run \(T = 2{,}000\) gradient steps.

Each step proceeds as follows: (1) sample a mini-batch \(S\) of \(B = 256\) examples uniformly at random from the \(n = 10{,}000\) training examples (subsampling rate \(\gamma = 256/10{,}000 = 0.0256\)); (2) compute the gradient of the loss for each example individually (per-example gradients, not the standard batch gradient); (3) clip each per-example gradient vector to L2 norm \(\leq C = 1.0\), bounding the contribution of any single training example; (4) sum the clipped gradients and add Gaussian noise \(\mathcal{N}(0,\, \sigma^2 C^2 \mathbf{I}) = \mathcal{N}(0,\, (1.1)^2 \mathbf{I})\); (5) divide by \(B\) to form the noisy gradient estimate and update model parameters.

Privacy cost: using the moments accountant (or the Rényi DP accounting in PyTorch Opacus), \(T = 2{,}000\) steps with these parameters yields approximately \((\varepsilon, \delta) = (3.0, 10^{-5})\)-DP. Apple’s keyboard next-word prediction and Google’s GBoard emoji suggestion models are trained with variants of this procedure, typically targeting \(\varepsilon \approx 8\) (a more relaxed privacy guarantee that allows higher accuracy). At \(\varepsilon = 3\), the model typically loses 2–5 percentage points of accuracy relative to non-private training, depending on the architecture and dataset size. The accuracy gap shrinks as \(n\) grows: larger datasets tolerate more noise.

Definition (Rényi Differential Privacy): A randomised mechanism \(M\) satisfies \((\alpha, \varepsilon_\alpha)\)-Rényi Differential Privacy (RDP) (Mironov 2017) if for all pairs of adjacent datasets \(D, D'\): \[ D_\alpha(M(D) \| M(D')) \leq \varepsilon_\alpha, \] where \(D_\alpha(P\|Q) = \frac{1}{\alpha - 1} \log \mathbb{E}_{x \sim Q}\!\left[\left(\frac{P(x)}{Q(x)}\right)^\alpha\right]\) is the Rényi divergence of order \(\alpha > 1\).

RDP has three key properties that make it the preferred accounting framework for DP-ML: (1) tight sequential composition — if \(M_1\) satisfies \((\alpha, \varepsilon_1)\)-RDP and \(M_2\) satisfies \((\alpha, \varepsilon_2)\)-RDP, then \((M_1, M_2)\) satisfies \((\alpha, \varepsilon_1 + \varepsilon_2)\)-RDP; (2) conversion to \((\varepsilon, \delta)\)-DP — any \((\alpha, \varepsilon_\alpha)\)-RDP mechanism is also \((\varepsilon_\alpha + \frac{\log(1/\delta)}{\alpha - 1}, \delta)\)-DP for any \(\delta > 0\); (3) Gaussian mechanism — the Gaussian mechanism with noise \(\mathcal{N}(0, \sigma^2)\) applied to a function with sensitivity 1 satisfies \((\alpha, \frac{\alpha}{2\sigma^2})\)-RDP. Combining (1) and (3): after \(T\) steps, privacy cost is \((\alpha, T\alpha / (2\sigma^2))\)-RDP, converted to \((\varepsilon, \delta)\)-DP by (2). This three-step accounting — implemented in PyTorch Opacus and TensorFlow Privacy — is how practitioners compute the DP guarantee for large-scale model training.

Remark (US 2020 Decennial Census and differential privacy): The US Census Bureau deployed differential privacy for the 2020 Decennial Census — the first official government data product released with formal DP guarantees at scale. The bureau used a top-down algorithm (TopDown Algorithm, TDA) that injects calibrated noise at multiple geographic levels: national, state, county, tract, block group, and block. The total privacy-loss parameter was approximately \(\varepsilon \approx 19.61\) at the national level, with the budget allocated across geographic strata.

The deployment triggered an intense methodological debate. Demographers argued that the noise introduced at the block level — the smallest geographic unit — was large enough to distort small-area statistics critical for congressional redistricting and federal funding allocation. Some researchers found that small minority communities’ population counts were significantly distorted, potentially affecting their political representation. The Census Bureau conducted multiple accuracy assessments and adjusted parameters in response. The episode illustrates a tension that is fundamental to applied DP: the mathematics determines the privacy-accuracy tradeoff for a given \(\varepsilon\), but choosing \(\varepsilon\) is a societal and political decision, not a purely technical one. There is no universally “correct” \(\varepsilon\).

Federated learning trains models on data that never leaves user devices. The server aggregates model updates (gradients), not raw data. Combining federated learning with DP-SGD (and secure aggregation) provides stronger end-to-end privacy guarantees.

The Exponential Mechanism and Report Noisy Max

The Laplace and Gaussian mechanisms apply to numeric queries. For non-numeric outputs — selecting a record from a database, generating a private synthetic dataset, choosing a word from a vocabulary — the exponential mechanism is the canonical approach.

Definition (Exponential mechanism): Let \(q: \mathcal{D} \times \mathcal{R} \to \mathbb{R}\) be a quality score measuring how good output \(r \in \mathcal{R}\) is for dataset \(D\), with sensitivity \(\Delta q = \max_{r} \max_{D \sim D'} |q(D,r) - q(D',r)|\). The exponential mechanism selects output \(r\) with probability proportional to \[ \Pr[M(D) = r] \propto \exp\!\left(\frac{\varepsilon \cdot q(D, r)}{2 \Delta q}\right). \] This achieves \(\varepsilon\)-DP. The expected quality loss (compared to the optimal output \(r^* = \arg\max_r q(D,r)\)) is at most \(\frac{2\Delta q}{\varepsilon} \ln |\mathcal{R}|\).
Example (Differentially private selection of a median): Suppose a dataset contains the ages of \(n\) individuals, all integers in \([1, 100]\). We want to release the approximate median age under \(\varepsilon\)-DP. The output range is \(\mathcal{R} = \{1, 2, \ldots, 100\}\). Define the quality score \(q(D, r) = -|\text{rank of } r \text{ in } D - n/2|\) — the negative distance from \(r\) to the median rank; the true median achieves quality 0, and outputs far from the median achieve large negative quality.

The sensitivity is \(\Delta q = 1\) (adding one person changes the rank of any value by at most 1). The exponential mechanism samples \(r\) with probability \(\propto \exp(\varepsilon \cdot q(D,r) / 2)\). For large \(\varepsilon\), probability concentrates sharply around the true median; for small \(\varepsilon\), the distribution spreads. The expected error is \(O(\log(100) / \varepsilon) = O(\ln 100 / \varepsilon) \approx 4.6/\varepsilon\) — logarithmic in the range size, not linear. Compare with the Laplace mechanism applied to the rank index: Laplace noise on a rank would need clipping and has no natural sensitivity-preserving formulation for the median. The exponential mechanism is the principled general tool.

Remark (Report Noisy Max): A simpler variant for selecting among a small number of options is Report Noisy Max: add independent Laplace noise to the quality score of each option and return the option with the highest noisy score. For \(k\) options each with sensitivity \(\Delta\), this achieves \(\varepsilon\)-DP with noise \(\text{Lap}(\Delta/\varepsilon)\) per option. It is computationally simpler than the exponential mechanism (no need to normalise a potentially large probability table) but limited to settings where \(k\) is small enough to evaluate all options explicitly. Both achieve the same asymptotic accuracy; the exponential mechanism is more general (handles infinite or implicitly defined output spaces).

Local DP Mechanisms in Practice

The most deployed local DP mechanisms are designed for frequency estimation — estimating how often each value in a domain appears across a population.

Definition (RAPPOR — Randomised Aggregatable Privacy-Preserving Ordinal Response): Google's RAPPOR (Erlingsson et al. 2014) estimates the distribution of a categorical variable taking values in a domain of size \(k\). Each client: (1) maps their value \(v\) to a Bloom filter \(B \in \{0,1\}^h\) (a binary vector with certain bits set by hash functions); (2) permanently randomises the Bloom filter into a "permanent memoisation" \(\hat{B}\) by flipping each bit with probability \(1/2\) (this prevents longitudinal linkage); (3) instantaneously randomises by flipping each bit of \(\hat{B}\) with probability \(q\) (bit stays 1) or \(p\) (bit becomes 1 from 0), transmitting the result to the server. The server aggregates bit-sum vectors across millions of clients and solves a de-noising linear system to estimate the frequency of each value.

RAPPOR achieves \(\varepsilon\)-LDP with \(\varepsilon = 2\ln\frac{1-q}{p}\) from the instantaneous randomisation step. It was deployed to collect statistics on Chrome’s home page URL, default search engine, and software configuration — data types where direct logging would be considered highly privacy-invasive, but where population-level statistics are valuable for product improvement.

Adversarial Machine Learning

Models trained on sensitive data are themselves privacy and security threats.

Evasion attacks craft adversarial examples: inputs perturbed imperceptibly (to humans) that cause misclassification. FGSM (Goodfellow et al. 2014): perturb input by \(\eta \cdot \text{sign}(\nabla_x \mathcal{L})\). Carlini-Wagner attack: optimisation-based; finds minimal perturbation achieving target misclassification. Adversarial robustness and clean accuracy are often in tension.

Poisoning attacks corrupt the training data. Backdoor attacks embed a trigger (e.g., a specific pixel pattern) such that the model predicts a target class whenever the trigger is present, while behaving normally on clean inputs.

Model extraction: query the model API to reconstruct a functionally equivalent model, stealing intellectual property and enabling white-box attacks against what was a black-box system.

Model inversion: given a model that outputs class probabilities, recover approximate training examples. Fredrikson et al. (2015) reconstructed facial images from a face recognition API.

Membership inference (Shokri et al. 2017): determine whether a specific record was in the training set by exploiting the observation that models tend to have higher confidence on training examples. DP-SGD provides formal membership inference resistance.

Remark (Certified adversarial robustness via randomised smoothing): Adversarial robustness can be made mathematically certifiable — not just empirically evaluated — using randomised smoothing (Cohen, Rosenfeld, Kolter 2019). The idea: given a base classifier \(f\), define a smoothed classifier \(\hat{f}(x) = \arg\max_c \Pr_{\delta \sim \mathcal{N}(0, \sigma^2 I)}[f(x + \delta) = c]\). The smoothed classifier is certifiably robust to \(\ell_2\) perturbations of radius \(r < \sigma \Phi^{-1}(p_A)\), where \(p_A\) is the probability that the top class is selected under Gaussian noise and \(\Phi^{-1}\) is the inverse Gaussian CDF.

This connection to differential privacy is not coincidental: the Gaussian mechanism is the core of both randomised smoothing and DP-SGD. The same noise distribution that provides privacy amplification in DP-SGD provides certified robustness in randomised smoothing. The \((\varepsilon, \delta)\)-DP guarantee of the Gaussian mechanism directly implies a certified robustness radius. This deep connection has spurred research into “privacy as robustness” — models trained with DP-SGD are more robust to adversarial examples and data poisoning as a side effect of the clipping and noising procedures, beyond the formal membership inference guarantee.


Chapter 8: Multi-Party Computation, Homomorphic Encryption, and Surveillance

Secret Sharing

Shamir’s secret sharing (1979) solves the problem of distributing a secret \(s\) among \(n\) parties such that any \(k\) can reconstruct it but any \(k-1\) learn nothing. The construction uses polynomial interpolation over \(\mathbb{F}_p\):

Choose a random polynomial \(f(x) = s + a_1 x + \cdots + a_{k-1} x^{k-1}\) with \(f(0) = s\). Distribute \((i, f(i))\) to party \(i\). Any \(k\) points uniquely determine \(f\) (and hence \(s\)) by Lagrange interpolation; fewer than \(k\) points are statistically independent of \(s\).

Applications include distributed key management (requiring multiple administrators to reconstruct a root key), secure voting, and as a building block in MPC.

Example (Shamir (2,3)-threshold secret sharing step by step): We want to split the secret \(s = 42\) into \(n = 3\) shares such that any \(k = 2\) shares reconstruct \(s\), but any single share reveals nothing. Work over a prime field; for concreteness use \(\mathbb{F}_{101}\) (prime 101, large enough to contain our values).

Choose a random degree-\((k-1) = 1\) polynomial with \(f(0) = s = 42\). Pick the random coefficient \(a_1 = 7\). The polynomial is \(f(x) = 42 + 7x\).

Compute the three shares:

\[ s_1 = f(1) = 42 + 7 = 49, \qquad s_2 = f(2) = 42 + 14 = 56, \qquad s_3 = f(3) = 42 + 21 = 63. \]

Distribute share \(s_i\) to party \(i\). Now reconstruct from any two, say parties 1 and 2 using Lagrange interpolation:

\[ f(0) = s_1 \cdot \frac{0 - 2}{1 - 2} + s_2 \cdot \frac{0 - 1}{2 - 1} = 49 \cdot \frac{-2}{-1} + 56 \cdot \frac{-1}{1} = 49 \times 2 - 56 = 98 - 56 = 42. \checkmark \]

Verify from parties 1 and 3:

\[ f(0) = 49 \cdot \frac{0-3}{1-3} + 63 \cdot \frac{0-1}{3-1} = 49 \cdot \frac{-3}{-2} + 63 \cdot \frac{-1}{2} = \frac{147}{2} - \frac{63}{2} = \frac{84}{2} = 42. \checkmark \]

A single share, say \(s_1 = 49\), reveals nothing: for any claimed secret \(s' \in \{0, 1, \ldots, 100\}\) there exists a polynomial \(g(x) = s' + a_1' x\) with \(g(1) = 49\) (namely \(a_1' = 49 - s' \bmod 101\)), so every secret is equally consistent with the observed share. This is the perfect secrecy of Shamir’s scheme.

Multi-Party Computation

MPC enables \(n\) parties holding private inputs \(x_1, \ldots, x_n\) to jointly compute a function \(f(x_1, \ldots, x_n)\) while revealing only the output — no party learns another’s input beyond what is implied by the output.

Definition (Yao's garbled circuits): Yao's garbled circuit protocol (1986) is a two-party protocol for securely computing any Boolean circuit. The garbler (Alice) assigns two random cryptographic labels \(k_w^0, k_w^1\) to each wire \(w\) in the circuit, representing the bit values 0 and 1 respectively. For each gate, she constructs a garbled table: a set of four ciphertexts, one for each input combination, where each ciphertext is the double encryption of the appropriate output label: \[ \left\{ \mathsf{Enc}_{k_a^b} \!\left( \mathsf{Enc}_{k_b^{b'}}(k_{\text{out}}^{g(b,b')}) \right) : b, b' \in \{0,1\} \right\}, \] shuffled in random order. The evaluator (Bob) receives the garbled tables, obtains Alice's input wire labels from Alice directly, and obtains his own input wire labels via oblivious transfer (see below). He evaluates gate-by-gate, decrypting exactly one entry per gate — the one consistent with the labels he holds. He learns the output label for each gate but cannot determine which bit value (\(0\) or \(1\)) any label represents.
Example (Garbled AND gate): Alice and Bob want to compute \(a \wedge b\) where Alice holds \(a\) and Bob holds \(b\), without revealing their inputs. Alice assigns labels: for wire \(a\), random strings \(k_a^0, k_a^1\); for wire \(b\), random strings \(k_b^0, k_b^1\); for the output wire, random strings \(k_{\text{out}}^0, k_{\text{out}}^1\). The AND gate truth table is:
\(a\)\(b\)\(a \wedge b\)
000
010
100
111

Alice creates four ciphertexts: \(\mathsf{Enc}_{k_a^0, k_b^0}(k_{\text{out}}^0)\), \(\mathsf{Enc}_{k_a^0, k_b^1}(k_{\text{out}}^0)\), \(\mathsf{Enc}_{k_a^1, k_b^0}(k_{\text{out}}^0)\), \(\mathsf{Enc}_{k_a^1, k_b^1}(k_{\text{out}}^1)\), and shuffles them randomly. She sends Bob the shuffled garbled table and her label for her input \(a\) (e.g., if \(a=1\), she sends \(k_a^1\)). Bob uses oblivious transfer to obtain \(k_b^{b}\) — the label for his input \(b\) — without Alice learning \(b\). Bob now tries all four ciphertexts; exactly one decrypts successfully (the one matching the labels he holds), revealing \(k_{\text{out}}^{a \wedge b}\). Alice tells Bob which label corresponds to output 1. Bob learns the output \(a \wedge b\) but cannot determine \(a\); Alice learns nothing about \(b\).

Oblivious transfer (OT): a protocol where the sender holds two messages \(m_0, m_1\) and the receiver chooses \(b \in \{0,1\}\); the receiver learns \(m_b\) without learning \(m_{1-b}\), and the sender does not learn \(b\). OT is a fundamental building block for MPC.

Security models: semi-honest (honest-but-curious) adversaries follow the protocol but try to infer additional information from their view; malicious adversaries can deviate arbitrarily. Secure MPC against malicious adversaries requires additional mechanisms (e.g., zero-knowledge proofs of correct behaviour), incurring higher overhead.

Example (Oblivious RAM — hiding access patterns): A client stores \(N\) encrypted data blocks on an untrusted cloud server. Even with perfect encryption, the access pattern — which block the client reads or writes at each time step — leaks information. For a medical database, knowing that the client accessed block number 1042 repeatedly might reveal they are monitoring a particular patient's record, even without knowing the content.

Oblivious RAM (ORAM) hides the access pattern by ensuring that the sequence of physical memory accesses observed by the server is computationally indistinguishable from a uniformly random sequence, regardless of the logical access pattern. The key insight of Path ORAM (Stefanov et al. 2013) is to arrange data in a binary tree of \(N\) leaves, map each block to a randomly chosen leaf, and always read and re-encrypt the entire path from root to that leaf on each access — even if only one block on the path is relevant. After each access, the block is remapped to a fresh random leaf and re-encrypted.

Cost: Path ORAM requires reading \(O(\log N)\) blocks per logical access, so the overhead is \(O(\log N)\) in communication and \(O(\log^2 N)\) with a client-side stash. This is far cheaper than naive approaches (downloading the entire database). ORAM is used in practice by the Signal Protocol for private contact discovery (clients can look up whether their contacts use Signal without Signal learning which contacts they queried), and is a building block in secure enclave computing (Intel SGX programs use ORAM to prevent cache side-channel attacks).

Remark (Zero-knowledge proofs): A zero-knowledge proof (ZKP) allows a prover to convince a verifier that a statement is true — say, "I know the secret key for this public key" or "this ciphertext decrypts to a value in the range [0, 100]" — without revealing anything beyond the truth of the statement. Formally, a ZKP must be complete (an honest prover always convinces an honest verifier), sound (a dishonest prover cannot convince the verifier of a false statement, except with negligible probability), and zero-knowledge (the verifier's view can be simulated without the secret).

The Schnorr identification protocol is a simple ZKP for discrete log: to prove knowledge of \(x\) with \(h = g^x\), the prover sends \(r = g^v\) (random \(v\)), the verifier challenges with \(c\), and the prover responds with \(s = v - cx \bmod q\). The verifier checks \(g^s h^c = r\). Modern systems use non-interactive ZKPs via the Fiat-Shamir transform (replace the verifier’s random challenge with a hash of the commitment).

Industrial-scale ZK systems go far beyond identification: zk-SNARKs (succinct non-interactive arguments of knowledge) prove arbitrary NP computations in \(O(1)\) proof size, used by the Zcash cryptocurrency (private transactions) and Ethereum layer-2 rollups. zk-STARKs are transparent (no trusted setup) and post-quantum secure, used by StarkNet. Applications: (1) privacy-preserving blockchains; (2) proving a machine learning model was trained on certified data without revealing the training data; (3) age verification — proving “I am over 18” without revealing the birthdate, using a ZKP over the certificate-signed birthdate from a government-issued ID.

Remark (Federated learning and gradient privacy): Federated learning (McMahan et al., Google 2016) was introduced to train the GBoard keyboard next-word prediction model on Android devices without uploading raw keystroke data to Google's servers. The protocol: each participating device downloads the current global model, trains on local data for several steps, and uploads only the gradient update (not the raw data) to the server. The server aggregates updates (e.g., by federated averaging) and updates the global model.

Federated learning reduces privacy risk but does not eliminate it: gradient inversion attacks (Zhu et al. 2019) demonstrated that a curious server can, in some settings, reconstruct training images pixel-by-pixel from a single gradient update. The attack formulates reconstruction as an optimisation problem: find input \(x'\) such that \(\nabla_\theta \mathcal{L}(f_\theta(x'), y') \approx \nabla_\theta \mathcal{L}(f_\theta(x), y)\). On small models and batches of 1–8 images, reconstruction is near-perfect. Defences include larger batch sizes (gradients of many examples cancel), DP-SGD (noising gradients), and secure aggregation (the server only sees the sum of all device gradients, never any individual gradient). Combining federated learning with DP-SGD and secure aggregation gives end-to-end formal privacy guarantees — the architecture deployed by Apple (QuickType keyboard) and Google (GBoard).

Homomorphic Encryption

Homomorphic encryption allows computation on encrypted data. Given encryptions \(\text{Enc}(a)\) and \(\text{Enc}(b)\), a server can compute \(\text{Enc}(f(a, b))\) for some class of functions \(f\), without learning \(a\) or \(b\).

TypeSupported operationsExamples
Partial (PHE)Addition or multiplicationPaillier (additive), ElGamal (multiplicative)
Somewhat (SHE)Bounded-depth circuitsBFV, BGV
Fully (FHE)Any circuitGentry 2009 (ideal lattices); CKKS (approximate reals)

Gentry’s FHE construction (2009) was a theoretical breakthrough: he showed that somewhat homomorphic encryption could be made fully homomorphic by bootstrapping — homomorphically evaluating the decryption circuit to refresh the noise. Modern FHE schemes (CKKS for approximate arithmetic on real/complex numbers, BFV/BGV for exact integers) are based on the Learning With Errors (LWE) or Ring-LWE hardness assumption.

Remark (The FHE breakthrough and current state): Craig Gentry's 2009 PhD thesis constructing the first fully homomorphic encryption scheme was described by cryptographers as "solving a 30-year-old open problem" — the problem of whether it was possible to perform arbitrary computations on encrypted data had been open since Rivest, Adleman, and Dertouzos posed it in 1978. Gentry's scheme used ideal lattices and was wildly impractical (billions of times slower than plaintext), but it established feasibility.

The practical landscape has improved substantially. Current FHE schemes target specific workloads: CKKS (Cheon-Kim-Kim-Song, 2017) supports approximate arithmetic over real and complex numbers — well-suited for machine learning inference on encrypted data. BFV and BGV support exact integer arithmetic — useful for private statistics. TFHE enables gate-by-gate bootstrapping, fast for small circuits. Google (FHE transpiler), Microsoft (SEAL), and IBM (HElib) all maintain production-grade FHE libraries. Practical deployments include private contact discovery (computing set intersections without revealing either party’s contact list), private genomic analysis, and outsourced ML inference where the server learns neither the model nor the query. Overhead remains large (100× to 10000× versus plaintext), but specialised hardware accelerators and algorithmic improvements continue to close the gap.

FHE overhead remains large: 100× to 10000× slower than plaintext computation. Applications include medical data analysis without exposing patient records to cloud providers, private set intersection, and outsourced ML inference.

Private Information Retrieval

PIR allows a client to retrieve the \(i\)-th element from a database of \(n\) elements without the server learning \(i\). Trivial PIR requires downloading the whole database (communication cost \(O(n)\)).

Information-theoretic PIR with \(k \geq 2\) non-colluding servers achieves polylogarithmic communication (Chor et al. 1995). Computational PIR with a single server uses HE to achieve sublinear communication under cryptographic assumptions. The communication-computation trade-off is an active research area; deployments include private browsing history and private contact discovery (Signal uses PIR-adjacent techniques for contact intersection).

Example (Two-server information-theoretic PIR): Alice wants to retrieve the \(i\)-th bit from a database \(x = (x_1, \ldots, x_n) \in \{0,1\}^n\) held by two non-colluding servers S1 and S2, without either server learning \(i\). The Chor-Goldreich-Kushilevitz-Sudan (1995) two-server PIR works as follows:

Alice picks a uniformly random subset \(S \subseteq [n]\) and sends \(S\) to S1 and \(S \Delta \{i\}\) (the symmetric difference) to S2. Each server responds with the XOR of the bits indexed by the received set: S1 sends \(a_1 = \bigoplus_{j \in S} x_j\) and S2 sends \(a_2 = \bigoplus_{j \in S \Delta \{i\}} x_j\). Alice computes \(a_1 \oplus a_2 = x_i\), the desired bit. Correctness: \(a_1 \oplus a_2 = \bigoplus_{j \in S} x_j \oplus \bigoplus_{j \in S \Delta \{i\}} x_j = x_i\) (the XOR telescopes, leaving only \(x_i\)).

Privacy: S1 sees only \(S\), a uniformly random subset — independent of \(i\). S2 sees \(S \Delta \{i\}\), also a uniformly random subset (because \(S\) was random). Neither server can distinguish this query from one for any other index. Communication cost: \(O(n)\) bits per server (to describe the subsets), which is not better than downloading the whole database. Sublinear PIR requires recursion or algebraic constructions, achieving \(O(n^{1/3})\) communication at the cost of additional servers.

The BGW Protocol and Arithmetic Circuits

For multi-party computation over arithmetic operations (addition, multiplication of numbers rather than bits), the BGW protocol (Ben-Or, Goldwasser, Wigderson 1988) provides information-theoretic MPC for \(n\) parties against up to \(t < n/3\) malicious corruptions (or \(t < n/2\) semi-honest).

The protocol computes any arithmetic circuit over a finite field using Shamir secret sharing as the underlying primitive. Addition gates are free: given shares \([a]_i, [b]_i\) of secrets \(a, b\), each party computes \([a+b]_i = [a]_i + [b]_i\) locally (no communication needed, since polynomial addition is linear). Multiplication gates are expensive: multiplying two degree-\((t)\) Shamir sharings produces a degree-\((2t)\) sharing; reducing back to degree \(t\) requires an interactive resharing step (one round of communication).

Remark (Practical MPC applications): MPC has transitioned from a theoretical tool to a practical one. Notable deployments: (1) Private auction: the Danish Sugar Beet auction (2008) used MPC to run a double auction for agricultural contracts, with farmers and buyers submitting bids without any party seeing others' bids. The result: the first real-world commercial MPC deployment. (2) Private set intersection (PSI): Google and Apple use PSI in their COVID-19 contact tracing protocols — phones exchange cryptographic tokens without revealing the full list of other devices encountered. PSI also underlies Signal's private contact discovery: a new user can check which of their phone contacts use Signal without revealing their contact list to Signal's servers. (3) Secure genome analysis: medical researchers compute statistics on patients' genomic data across multiple hospitals without pooling the raw data, complying with privacy regulations while still enabling research.

Internet Censorship Architecture

National censorship systems combine multiple techniques:

  • DNS hijacking/poisoning: return false IP addresses for blocked domains
  • IP blocking: block packets to/from specific IP ranges
  • Keyword filtering via DPI: inspect packet payloads for banned strings; discard or reset connections
  • BGP route injection: divert traffic at the routing level

The GFW actively probes suspected circumvention servers: when an obfs4 bridge is discovered, the GFW initiates a connection and characterises the handshake to identify the protocol. Defences must maintain cryptographic undetectability of the circumvention handshake.

Censorship measurement platforms (OONI — Open Observatory of Network Interference) deploy probes in censored networks to document what is blocked and how.

Surveillance Capitalism and Tracking

Shoshana Zuboff’s concept of surveillance capitalism describes how companies collect behavioural data as a raw material, refine it into predictive products, and sell predictions to advertisers and other buyers — creating incentives to maximise data collection.

Third-party tracking: websites load resources from tracking domains (Google Analytics, Facebook Pixel); these third parties observe browsing behaviour across sites via cookies. Browser fingerprinting identifies users without cookies using the combination of browser version, fonts, canvas rendering, and dozens of other attributes — largely stateless and unkillable by cookie blocking. Supercookies use browser storage mechanisms (localStorage, IndexedDB, HSTS cache) that survive cookie clearing.

GDPR (EU, 2018) and CCPA (California, 2020) mandate consent for data collection, data subject access rights, and the right to erasure. Enforcement has been uneven; GDPR fines have reached hundreds of millions of euros (Meta: €1.2B in 2023).


Chapter 9: Privacy Regulations and Re-identification Research

Privacy Regulation Landscape

The legal framework for data privacy has developed unevenly across jurisdictions, creating a patchwork of obligations that multinational organisations must navigate simultaneously.

GDPR (General Data Protection Regulation), effective May 2018 in the European Union, is the most comprehensive and influential privacy law currently in force. It applies to any organisation that processes personal data of EU residents, regardless of where the organisation is based. Core principles: (1) lawfulness, fairness, and transparency — processing requires a legal basis, most often consent or legitimate interest; (2) purpose limitation — data collected for one purpose may not be repurposed without fresh legal basis; (3) data minimisation — only the minimum data necessary for the stated purpose may be collected; (4) accuracy, storage limitation, integrity and confidentiality. Data subjects hold several rights: the right of access (obtain a copy of their data), the right to rectification (correct inaccurate data), the right to erasure (“right to be forgotten”), the right to portability (receive data in a machine-readable format), and the right to object to automated decision-making. Fines reach up to 4% of annual global turnover or €20M, whichever is greater. The GDPR has prompted significant structural changes: cookie consent banners, data protection officers, privacy impact assessments, and cross-border data transfer mechanisms (Standard Contractual Clauses, now under the EU-US Data Privacy Framework after Schrems II invalidated Privacy Shield in 2020).

CCPA (California Consumer Privacy Act), effective January 2020 (strengthened by CPRA in 2023), grants California residents the right to know what personal information is collected, the right to opt out of the sale of personal information to third parties, the right to deletion, and the right to non-discrimination for exercising these rights. Unlike GDPR, CCPA does not require a legal basis for processing — it operates as an opt-out regime rather than opt-in. CCPA covers businesses with over $25M in annual revenue, or those that buy/sell data on 50,000+ consumers, or derive 50%+ of revenue from selling data.

COPPA (Children’s Online Privacy Protection Act), effective since 2000 in the United States, prohibits collecting personal information from children under 13 without verifiable parental consent. Operators of websites and online services directed at children must post privacy notices, provide parents with access to and the ability to delete their child’s data, and cannot condition participation on disclosure of more information than necessary. The FTC actively enforces COPPA; YouTube was fined $170M in 2019 for collecting data on child users without consent.

Remark (GDPR's right to erasure vs. blockchain immutability): The GDPR's right to erasure ("right to be forgotten") requires controllers to delete an individual's personal data upon request, absent a competing legal basis for retention. This principle collides directly with the architectural properties of public blockchains. A public blockchain — whether Bitcoin, Ethereum, or any permissionless ledger — is designed to be immutable: once a transaction is written and confirmed, no participant can alter or delete it without invalidating the cryptographic chain of hashes for all subsequent blocks. This immutability is not a bug but the central security feature.

If personal data is written to a public blockchain (for example, a user’s name or medical record hash embedded in a transaction), it cannot be “erased” in any GDPR-compliant sense. Proposed mitigations include: (1) storing personal data off-chain and putting only a cryptographic hash on-chain (but the hash itself may constitute personal data under GDPR, particularly if it is linkable to an individual); (2) crypto-shredding — encrypting the data with a key stored off-chain and then deleting the key, rendering the on-chain ciphertext meaningless without technically removing it (regulators have not uniformly accepted this as genuine erasure); (3) using permissioned blockchains with a central administrator who can redact records (which undermines decentralisation). This tension is a fundamental architectural incompatibility, not a solvable engineering problem. Organisations building blockchain-based systems that handle personal data of EU residents must design carefully to avoid storing personal data on-chain in the first place.

Re-identification Research

The field of re-identification research studies the gap between the formal notion of “anonymised data” and the practical reality of what auxiliary information an adversary can exploit.

Example (Sweeney 2000 — uniqueness of (ZIP, gender, birthdate)): Latanya Sweeney's landmark study (2000) demonstrated that 87% of the US population is uniquely identified by the combination of 5-digit ZIP code, gender, and date of birth. The argument is combinatorial: the US has approximately 43,000 distinct ZIP codes. With birth years spanning roughly 90 years (1900–1990), 365 days per year, and 2 genders, the number of possible combinations is \(43{,}000 \times 90 \times 365 \times 2 \approx 2.8 \times 10^9\) — larger than the US population of 280 million at the time. Most combinations are occupied by at most one individual.

Sweeney did not merely compute statistics. She demonstrated the attack concretely: she purchased the Massachusetts Group Insurance Commission hospital discharge dataset, which contained patients’ ZIP codes, birthdates, genders, and medical diagnoses (but not names — the state had “anonymised” the data by removing direct identifiers). She separately purchased the Cambridge, MA voter registration rolls, which contained names, ZIP codes, birthdates, and genders. By joining the two datasets on (ZIP, gender, birthdate), she identified the complete medical record — diagnoses, prescriptions, and physician notes — of the Governor of Massachusetts, William Weld, and mailed it to him.

The immediate policy consequence was the HIPAA (Health Insurance Portability and Accountability Act) Safe Harbour provision, which defines 18 specific data elements that must be removed or generalised before health data can be considered de-identified for purposes of research sharing: name, geographic subdivisions smaller than state, dates more specific than year, phone numbers, fax numbers, email addresses, SSN, medical record numbers, health plan beneficiary numbers, account numbers, certificate and license numbers, VINs, device serial numbers, URLs, IP addresses, biometric identifiers, full-face photographs, and any other unique identifying numbers. The Safe Harbour provision operationalises the lesson: the joint distribution of innocuous-seeming demographic attributes uniquely identifies individuals at scale.

Remark (Netflix Prize de-anonymisation and auxiliary information): The Netflix Prize dataset (2006) released approximately 100 million movie ratings from 480,000 users, with usernames replaced by random identifiers. Netflix described this as "anonymised." Narayanan and Shmatikoff (2008) showed that knowing as few as 6 movie ratings — even with approximate dates and small errors in the ratings — is sufficient to uniquely identify a user in the dataset with high probability. Their de-anonymisation algorithm did not rely on any quasi-identifier theory; it used the observation that an individual's pattern of movie ratings is a unique fingerprint, and the public IMDb database provided the auxiliary dataset of self-reported ratings.

The attack required: a target’s IMDb public ratings page (publicly available for users who rate movies on IMDb), matching against the Netflix dataset using a Bayesian approach that tolerated noise. In experiments with 50 users with known IMDb profiles, they correctly identified 68% using only 2 movies with approximate dates, and over 99% with 8 movies. Netflix had originally planned a second data release; the research led them to cancel it. The Federal Trade Commission subsequently cited this work in regulatory guidance, and it became a touchstone for the inadequacy of naive anonymisation.

The lesson extends beyond Netflix. It is now understood that the set of attributes that constitute quasi-identifiers cannot be enumerated in advance, because auxiliary datasets are unbounded — any combination of released attributes might be linkable to some auxiliary source. This is precisely what differential privacy addresses: its guarantee holds regardless of what auxiliary information the adversary possesses, because it bounds the privacy loss in terms of adjacent databases, not in terms of an enumerated quasi-identifier set.


Chapter 10: Zero-Knowledge Proofs

A zero-knowledge proof (ZKP) is an interactive or non-interactive protocol in which a prover convinces a verifier that a statement is true without revealing why it is true — that is, without revealing any witness or secret beyond the truth of the statement itself. ZKPs are among the most important cryptographic primitives, with applications ranging from authentication protocols to blockchain privacy (zk-SNARKs in Zcash and Ethereum rollups).

10.1 Definition and Properties

Definition (Zero-knowledge proof system): A proof system (P, V) for a language L is zero-knowledge if it satisfies three properties:
  • Completeness: if the statement \(x \in L\) is true, an honest prover with witness \(w\) convinces the verifier — \(\Pr[\text{V accepts}] = 1\).
  • Soundness: if \(x \notin L\), no (possibly computationally unbounded) cheating prover convinces the verifier except with negligible probability \(\leq \epsilon\).
  • Zero-knowledge: there exists a probabilistic polynomial-time simulator S such that, for any \(x \in L\), the output of \(S(x)\) is computationally indistinguishable from the transcript of an honest execution (P(x,w), V(x)). Since S has no witness, this formalises that the verifier learns nothing beyond \(x \in L\).

The simulator exists because every transcript can be simulated — any verifier strategy could have generated the same-looking transcript on its own, without learning any secret from the actual interaction. The zero-knowledge property is a rigorous formalisation of “nothing extra is revealed.”

10.2 Schnorr Identification Protocol

The Schnorr protocol (1989) is the prototype interactive ZKP. It proves knowledge of a discrete logarithm:

Statement: I know \(x\) such that \(h = g^x\) in a group \(\mathbb{G}\) of prime order \(q\).

Protocol (Schnorr identification):
  1. Commit: Prover picks random \(r \leftarrow \mathbb{Z}_q\), sends \(R = g^r\) (commitment).
  2. Challenge: Verifier picks random \(c \leftarrow \mathbb{Z}_q\) and sends \(c\).
  3. Response: Prover computes \(s = r + cx \bmod q\) and sends \(s\).
  4. Verify: Verifier accepts iff \(g^s = R \cdot h^c\).
Example (Schnorr over \(\mathbb{Z}_{11}^*\)): Let \(q = 5\) (prime), \(g = 2\), \(h = g^x = 2^3 = 8\) (so the secret is \(x = 3\)). Prover picks \(r = 4\), sends \(R = 2^4 = 16 \equiv 5 \pmod{11}\). Verifier sends challenge \(c = 2\). Prover sends \(s = r + cx = 4 + 2\cdot3 = 10 \pmod{5} \equiv 0\).

Verify: \(g^s = 2^0 = 1\). But \(R \cdot h^c = 5 \cdot 8^2 = 5\cdot64 = 5\cdot9 = 45 \equiv 1 \pmod{11}\). ✓

Security: The commitment \(R = g^r\) uses fresh randomness every session; without knowing \(r\), no eavesdropper can compute \(x\) from \((R, c, s)\) — this would require solving DLOG. The zero-knowledge property holds because the simulator can produce valid-looking transcripts by picking \(s\) and \(c\) first, then computing \(R = g^s/h^c\) — bypassing the need to know \(x\).

10.3 Fiat–Shamir Heuristic: Non-Interactive ZKPs

The Fiat–Shamir transform converts an interactive ZKP (Schnorr-type) into a non-interactive proof by replacing the random verifier challenge with a hash of the transcript so far:

Transform (Fiat–Shamir): Replace the verifier's random challenge \(c\) with \(c = H(x, R)\), where \(H\) is a cryptographic hash function modelled as a random oracle. The prover computes \(c\) themselves (no interaction) and sends the triple \((R, c, s)\). The verifier recomputes \(c' = H(x, R)\) and checks \(c' = c\) and \(g^s = R\cdot h^c\).

Fiat–Shamir proofs are digital signatures in disguise: Schnorr + Fiat–Shamir = Schnorr signatures. The security of the Fiat–Shamir transform relies on the random oracle model for \(H\); it has been shown to be insecure for arbitrary interactive proofs with non-uniform extractors, but is secure in the random oracle model for Schnorr-like protocols.

10.4 zk-SNARKs and Succinct Proofs

A zk-SNARK (Zero-Knowledge Succinct Non-Interactive Argument of Knowledge) is a non-interactive ZKP where the proof size and verification time are both very small (constant or logarithmic in the circuit size), enabling proof of arbitrary NP statements:

Definition (zk-SNARK properties): A zk-SNARK for an NP language L satisfies:
  • Succinctness: proof size is \(O(\lambda)\) or \(O(\text{poly}(\lambda) \log N)\), independent (or nearly so) of the statement size \(N\).
  • Non-interactivity: a single proof string suffices (no rounds).
  • Argument of knowledge: a cheating prover cannot produce an accepting proof without knowing the witness (under cryptographic assumptions).
  • Zero-knowledge: the proof reveals nothing beyond statement validity.

The Groth16 zk-SNARK (2016) produces proofs of 3 group elements regardless of the computation size, verifiable in milliseconds. It powers Zcash’s shielded transactions: a user proves knowledge of a spending key and consistency of transaction amounts without revealing either. Ethereum Layer-2 rollups (zkSync, StarkNet) use zk-SNARKs/STARKs to prove correct execution of thousands of transactions in a single on-chain proof, reducing gas costs by 10–100×.

Remark (Trusted setup problem): Many practical zk-SNARKs (Groth16, Pinocchio) require a **trusted setup**: a one-time ceremony in which random parameters called the **common reference string (CRS)** are generated, and the toxic waste (randomness used) must be securely destroyed. If any party in the ceremony retains the toxic waste, they can produce false proofs — forging arbitrary statements as true. Zcash's Sapling ceremony involved hundreds of participants in a multi-party computation: the CRS is secure as long as at least one participant honestly deleted their toxic waste. STARKs (Succinct Transparent ARguments of Knowledge) avoid this problem entirely by replacing the CRS with a hash function (no trusted setup), at the cost of larger proof sizes (tens of kilobytes vs. hundreds of bytes for SNARKs).

Chapter 11: Side-Channel Attacks

Side-channel attacks exploit information leaked through the physical implementation of a cryptographic algorithm — timing, power consumption, electromagnetic radiation, or acoustic emanations — rather than weaknesses in the algorithm itself. Mathematically perfect algorithms can be completely broken by side-channel attacks on their implementations.

11.1 Timing Attacks

Timing attacks (Kocher 1996) exploit the fact that many implementations take variable time depending on secret data. RSA implemented with a simple square-and-multiply loop that skips the “multiply” step when a key bit is 0 leaks the key bit through the timing difference.

Example (Timing attack on RSA square-and-multiply): In a naïve implementation, computing \(g^d \bmod n\) via square-and-multiply performs a squaring for every bit of \(d\) and an extra multiplication for every 1-bit. If squaring takes 100 ns and multiplication takes 110 ns, a 2048-bit key has expected time \(2048 \times 100 + (\text{Hamming weight of } d) \times 10\) ns. An adversary who can query the implementation many times and measure the total time with sufficient precision can recover the Hamming weight of \(d\), and with more refined analysis, recover individual bits.

Kocher’s original attack (1995) recovered 512-bit and 1024-bit RSA private keys from timing measurements of a smart card, using approximately 300,000 queries. The attack is a statistical estimation problem: for each candidate bit value, the predicted timing for a batch of known plaintexts is computed; the correlation between predicted and measured times reveals the true bit.

The countermeasure is Montgomery multiplication with constant-time implementation: perform the multiplication in every step regardless of the key bit (using dummy operations or always-on multiply), and ensure that the code executes in constant time by removing all data-dependent branches and memory accesses. Modern cryptographic libraries (BoringSSL, mbedTLS) implement RSA in constant time; failure to do so has resulted in critical vulnerabilities (Lucky13, ROBOT).

11.2 Power Analysis Attacks

Simple Power Analysis (SPA) reads a single power trace and identifies operations (squarings vs. multiplications) by visual inspection. Differential Power Analysis (DPA) (Kocher, Jaffe, Jun 1999) uses statistical analysis of many power traces to recover AES keys:

Attack (DPA on AES): Target a single intermediate value in AES, e.g., the output of the first AddRoundKey and SubBytes: \(v = \text{SBox}[p_i \oplus k_i]\) for plaintext byte \(p_i\) and key byte \(k_i\). For each candidate key byte \(k'\), compute the predicted \(v' = \text{SBox}[p_i \oplus k']\) for all traces. Compute the correlation (or difference-of-means) between the Hamming weight of \(v'\) (predicted power consumption model) and the measured power at the time step where \(v\) is computed. The correct \(k'\) produces a high correlation spike; all wrong candidates give near-zero correlation. Recovering all 16 key bytes of AES-128 requires only \(O(1000)\) traces.
Remark (Countermeasures against power analysis): The standard countermeasure is **masking**: at the start of each encryption, a random mask \(m\) is XOR'd into every intermediate value. The computation proceeds on masked values; the mask is removed at the output. This breaks the correlation between the sensitive variable \(v\) and the power trace because an adversary's model of \(v\)'s Hamming weight is now decorrelated from the actual masked value \(v \oplus m\). First-order masking (one mask) defeats DPA but is broken by second-order DPA (combining two leakage points). Higher-order masking uses multiple shares, defeating \(d\)-th order DPA but incurring a quadratic area overhead in the number of shares. Side-channel security is now a required criterion in the NIST post-quantum standardization process — all submitted algorithms were required to have efficient masked implementations.

11.3 Spectre and Meltdown: Microarchitectural Side Channels

Modern processors execute instructions speculatively and out-of-order for performance. Meltdown (Lipp et al., 2018) and Spectre (Kocher et al., 2018) are microarchitectural side-channel attacks that exploit this speculation:

Meltdown: The CPU transiently (speculatively) executes memory accesses to kernel memory before the memory-protection fault is delivered, leaving traces in the CPU cache. An attacker reads these traces via a cache timing side channel (Flush+Reload): measure which cache lines are warm to determine which kernel memory address was transiently loaded. This allows any user-space process to read all of kernel memory. Patched by kernel page-table isolation (KPTI), which unmaps the kernel’s page tables during user-space execution at a 5–30% performance cost.

Spectre: Tricks the CPU’s branch predictor into speculatively executing code it should not execute, leaking information from across security boundaries (e.g., from another process or from the hypervisor). Unlike Meltdown (patched in hardware and OS), Spectre variants remain fundamentally difficult to patch because they exploit the inherent CPU optimisation of speculative execution.


Chapter 12: Applied Cryptography Engineering

Cryptographic algorithms are only as secure as their implementations. This chapter surveys common implementation pitfalls and the engineering principles that prevent them.

12.1 Padding Oracle Attacks

A padding oracle attack (Vaudenay 2002) exploits a server that reports whether a decrypted ciphertext has valid PKCS#7 padding — a single bit of information — to decrypt arbitrary ciphertexts.

Example (Padding oracle on AES-CBC): Suppose the last block of a ciphertext decrypts to some unknown plaintext \(P_{n}\). An attacker submits a modified last ciphertext block \(C_{n-1}' = C_{n-1} \oplus \Delta\) (flipping specific bytes of the preceding block, which controls the XOR input to the decryption). For each value of \(\Delta\), the server reports "valid padding" or "invalid padding." When \(\Delta\) is chosen so that the last byte of \(P_n \oplus \Delta\) equals \(\texttt{0x01}\) (valid PKCS#7 padding of length 1), the oracle accepts. This reveals \(P_n[\text{last byte}]\). Repeating for the second-to-last byte (target padding \(\texttt{0x02}\texttt{0x02}\)) reveals it, and so on. The full \(n\)-byte block is recovered in \(\leq 256 \times n\) oracle queries — feasible in practice.

POODLE (2014) exploited this against SSLv3 CBC; Lucky13 (2013) exploited the timing difference in HMAC verification in TLS 1.0/1.1. The correct fix is Encrypt-then-MAC: compute the MAC over the ciphertext, and verify the MAC in constant time before decrypting. If the MAC fails, reject immediately — the recipient never decrypts, so there is no padding oracle. AEAD modes (AES-GCM) are immune by construction.

12.2 Nonce Reuse and Its Consequences

Many AEAD modes require a unique nonce (number used once) per encryption under a given key. Nonce reuse is catastrophic:

  • AES-GCM nonce reuse: if two messages \(m_1, m_2\) are encrypted under the same (key, nonce) pair in GCM, the XOR of ciphertexts equals the XOR of plaintexts \(c_1 \oplus c_2 = m_1 \oplus m_2\). Knowing either message immediately reveals the other. Worse, the authentication key \(H\) can be recovered from the two authentication tags, allowing the attacker to forge arbitrary authenticated ciphertexts indefinitely. This broke the PS3’s RSA signing key (Sony used a constant \(r\) in ECDSA, which is equivalent to nonce reuse) and various TLS session resumption implementations.

The lesson: nonce management is a primary engineering concern. Use XChaCha20-Poly1305 (extended nonce, safe for random generation even at scale) or AES-SIV (deterministic AEAD, produces identical ciphertexts for identical plaintexts but is nonce-misuse resistant) for contexts where nonce uniqueness cannot be guaranteed.

12.3 Length Extension Attacks on Hash Functions

MD5 and SHA-1/SHA-2 use the Merkle–Damgård construction: hash is computed by initialising a state with a fixed IV and repeatedly applying a compression function. Given \(H(m)\) and \(\text{len}(m)\), an attacker can compute \(H(m \| \text{padding} \| m')\) for any extension \(m'\) without knowing \(m\). This breaks the naive MAC construction \(\text{MAC}(k, m) = H(k \| m)\): knowing the tag on \(m\) allows forging the tag on \(m \| \text{padding} \| m'\).

The fix is HMAC: \(\text{HMAC}(k, m) = H(k \oplus \texttt{opad} \| H(k \oplus \texttt{ipad} \| m))\). The double hashing prevents length extension. SHA-3 (Keccak) uses a sponge construction rather than Merkle–Damgård and is immune to length extension by design.


Chapter 13: Quantum Cryptography

Classical cryptography relies on computational hardness assumptions: problems like integer factorisation are believed to be hard for classical computers, but Shor’s algorithm solves them in polynomial time on a quantum computer. Quantum cryptography offers two distinct responses to this threat: (1) protocols whose security is based on the laws of physics rather than computational hardness, and (2) post-quantum cryptography (classical algorithms believed to be hard for both classical and quantum computers — already covered in the companion CO 485 notes).

13.1 BB84 Quantum Key Distribution

The BB84 protocol (Bennett and Brassard, 1984) is the first and most studied quantum key distribution (QKD) scheme. It allows two parties, Alice and Bob, to establish a shared secret key with security guaranteed by the laws of quantum mechanics, not computational assumptions.

Protocol (BB84 QKD): Alice sends Bob a sequence of photons, each encoded in one of two bases:
  • Rectilinear basis (+): horizontal polarisation = 0, vertical = 1.
  • Diagonal basis (×): 45° polarisation = 0, 135° = 1.
Alice randomly selects a basis and a bit for each photon. Bob randomly selects a basis to measure each photon. After transmission, they communicate the bases used (publicly, not the bits). They keep only the bits where their bases matched — this is the sifted key. They sacrifice a random subset of the sifted key to estimate the quantum bit error rate (QBER). If QBER exceeds a threshold (~11%), they abort (an eavesdropper is present). Otherwise, they apply error correction and privacy amplification to extract a final secret key.
Example (BB84 trace, 8 photons): Alice's bits: 0,1,1,0,1,0,0,1. Alice's bases: +,×,+,×,+,×,+,+. Bob's bases: ×,×,+,+,+,×,×,+. Matching basis positions: 2, 3, 5, 8 (1-indexed). Sifted key bits: position 2 (Alice: 1, Bob: 1 ✓), position 3 (Alice: 1, Bob: 1 ✓), position 5 (Alice: 1, Bob: 1 ✓), position 8 (Alice: 1, Bob: 1 ✓). Sifted key: 1111 (assuming ideal channel with no eavesdropper or errors).

13.2 Security of BB84

The security of BB84 follows from two principles of quantum mechanics:

  1. No-cloning theorem: an unknown quantum state cannot be copied. An eavesdropper Eve cannot clone the photon, measure it, and forward the original — she must interact with the photon and thereby disturb it.

  2. Measurement disturbs state: measuring a photon in the wrong basis produces a random result and collapses the state. Eve’s interception introduces errors in Bob’s measurements: if Eve intercepts every photon (measuring in a random basis and resending), she introduces a 25% QBER in the sifted key. The threshold for aborting (11%) detects even Eve intercepting 44% of photons — the information-theoretic security proof (Lo and Chau 1999; Shor and Preskill 2000) shows that BB84 is unconditionally secure (composably secure) against arbitrary quantum eavesdropping, with key length depending only on the observed QBER.

Remark (Practical QKD limitations): BB84's information-theoretic security is compelling in theory but faces severe practical challenges:
  • Photon number splitting attacks: practical laser sources emit multi-photon pulses with some probability. Eve can split off one photon from a multi-photon pulse and store it until the basis is announced, then measure in the correct basis. This bypasses the no-cloning argument. The decoy-state method (Hwang 2003; Lo et al. 2005) uses random pulse intensities to detect photon-splitting attacks.
  • Distance limitation: photon loss in optical fibre grows exponentially with distance (~0.2 dB/km for telecom fibre). Commercial QKD systems (Toshiba, ID Quantique) operate up to ~100km without quantum repeaters; the Chinese Micius satellite demonstrated 1200km ground-to-satellite QKD in 2017.
  • Side channels in the hardware: QKD implementations have been broken by side-channel attacks on the detectors (detector blinding attacks, time-shift attacks). The device-independent QKD (DI-QKD) model closes these loopholes but requires near-perfect Bell inequality violation, currently impractical.

Chapter 14: Threat Modelling and Security Engineering

Cryptographic primitives are only components of secure systems. This chapter discusses how to reason about security at the system level: identifying what needs to be protected, from whom, and how.

14.1 STRIDE Threat Modelling

The STRIDE model (Microsoft, Shostack 2014) systematically enumerates threats along six dimensions:

Definition (STRIDE threat categories):
  • Spoofing identity: an attacker impersonates a legitimate user or system. Countermeasure: strong authentication (passwords, 2FA, mutual TLS).
  • Tampering with data: unauthorised modification of data in transit or at rest. Countermeasure: integrity checks (HMAC, digital signatures, filesystem integrity monitoring).
  • Repudiation: a user or system denies performing an action. Countermeasure: audit logs, digital signatures (non-repudiation).
  • Information disclosure: confidential data is exposed. Countermeasure: encryption, access control, data classification.
  • Denial of service: a system is made unavailable. Countermeasure: rate limiting, load balancing, DDoS mitigation.
  • Elevation of privilege: attacker gains higher access than authorised. Countermeasure: least privilege, privilege separation, sandboxing.

14.2 Attack Trees and Kill Chains

An attack tree decomposes a top-level security goal into subgoals using AND/OR decomposition. The root is the attacker’s goal; leaves are primitive attacks. AND nodes require all children; OR nodes require at least one. Attack trees enable systematic enumeration of attack paths and assignment of cost, probability, or detection difficulty to each leaf.

The cyber kill chain (Lockheed Martin 2011) models advanced persistent threats as a sequence of stages: Reconnaissance → Weaponisation → Delivery → Exploitation → Installation → Command and Control → Actions on Objectives. Breaking the kill chain at any stage (e.g., detecting the C&C beaconing before exfiltration) defeats the attack. This stage-based model underlies modern security operations centre (SOC) playbooks and SIEM (Security Information and Event Management) alerting logic.

14.3 Defence in Depth

Defence in depth is the security architecture principle of deploying multiple independent layers of security controls so that the failure of any single control does not result in total compromise. Each layer assumes the outer layers may have already failed:

  1. Network layer: firewalls, network segmentation, IDS/IPS.
  2. Host layer: endpoint detection and response (EDR), host firewalls, OS hardening, full-disk encryption.
  3. Application layer: input validation, output encoding (preventing XSS, SQL injection), CSRF tokens, security headers.
  4. Identity layer: MFA, privileged access management (PAM), just-in-time access.
  5. Data layer: encryption at rest, tokenisation of sensitive fields (e.g., PAN data), data loss prevention (DLP).
  6. Process layer: security awareness training, patch management, incident response planning, penetration testing.
Remark (The economics of security: attacker vs. defender asymmetry): A fundamental asymmetry shapes all security engineering: the defender must protect against all attack paths simultaneously, while the attacker need only find one. This means the cost ratio between defence and offence is inherently unfavourable for defenders. Security economics (Ross Anderson, Varian) studies how this asymmetry leads to systematic underinvestment in security: firms bear the cost of a breach, but so do customers and society, creating externalities that the market does not price correctly. The GDPR's substantial fines (up to 4% of global turnover) attempt to internalise these externalities — forcing firms to treat security investment as a cost of business rather than optional. The empirical literature (Gordon and Loeb model) suggests the optimal security investment is roughly 37% of the expected loss from a breach — significantly more than most firms spend, but constrained by the difficulty of measuring expected losses.

14.4 Common Vulnerability Patterns

SQL Injection (SQLi): User-supplied input is concatenated directly into an SQL query. The attacker provides input that alters the query structure, extracting the entire database or bypassing authentication. Prevention: parameterised queries (prepared statements) in which user input is always treated as data, never as SQL syntax.

Cross-site scripting (XSS): A web application echoes user input in a page without sanitisation; an attacker injects <script> tags that execute in victims’ browsers. Consequences: cookie theft (session hijacking), defacement, phishing. Prevention: context-sensitive output encoding (HTML encode <, >, &, " before rendering in HTML context) and Content Security Policy (CSP) headers.

Insecure deserialisation: Applications deserialise user-controlled data using language-native object serialisation (Java, PHP, Python pickle). Attackers craft malicious payloads that, when deserialised, execute arbitrary code via gadget chains in the application’s class path. Prevention: never deserialise from untrusted sources; use data-only formats (JSON, Protocol Buffers) instead of object serialisation.

SSRF (Server-Side Request Forgery): An application makes HTTP requests to URLs controlled by the user, allowing an attacker to probe internal network services (AWS metadata endpoint at 169.254.169.254, Redis, internal admin APIs) that are inaccessible from the public internet. The Capital One breach (2019) exploited an SSRF in a WAF to extract AWS IAM credentials from the instance metadata endpoint, compromising 100 million customer records.

These four vulnerability classes — SQLi, XSS, insecure deserialisation, SSRF — recur in the OWASP Top 10 across every year of the list. Understanding their root cause (failure to separate code from data) is more valuable than memorising specific payloads.


Chapter 15: Formal Security Proofs and Reduction-Based Cryptography

Modern cryptography is a mathematical discipline: every claimed security result is a proof. A security proof is a reduction demonstrating that breaking the proposed scheme is at least as hard as solving a well-studied problem (RSA, CDH, LWE). If no efficient algorithm for the hard problem is known, the reduction implies no efficient attack on the scheme.

15.1 The Security Game Framework

Security properties are defined via security games between a challenger \(\mathcal{C}\) and an adversary \(\mathcal{A}\):

Definition (IND-CPA security game for symmetric encryption): The IND-CPA game proceeds as follows:
  1. Setup: \(\mathcal{C}\) generates a key \(k \leftarrow \text{Gen}(1^\lambda)\).
  2. Learning phase: \(\mathcal{A}\) may adaptively query \(\mathcal{C}\) for encryptions of chosen plaintexts (unlimited queries).
  3. Challenge: \(\mathcal{A}\) submits two equal-length messages \(m_0, m_1\). \(\mathcal{C}\) picks \(b \leftarrow \{0,1\}\) uniformly and returns \(c^* = \text{Enc}_k(m_b)\).
  4. Guess: \(\mathcal{A}\) outputs \(b'\). \(\mathcal{A}\) wins if \(b' = b\).
The scheme is IND-CPA secure if for all polynomial-time \(\mathcal{A}\), \(\Pr[\mathcal{A} \text{ wins}] \leq 1/2 + \text{negl}(\lambda)\). IND-CPA captures the intuition that ciphertexts reveal no information about plaintexts even to an adversary who can choose many plaintexts to encrypt.
Remark (ECB mode is not IND-CPA secure): ECB (Electronic Codebook) mode encrypts each block independently with the same key. It trivially fails IND-CPA: the adversary submits \(m_0 = \texttt{AA}\) (two identical blocks) and \(m_1 = \texttt{AB}\) (two distinct blocks). The ECB ciphertext of \(m_0\) has two identical ciphertext blocks (\(c = c \| c\)), while the ciphertext of \(m_1\) has two distinct blocks. The adversary wins with probability 1. This is precisely the "ECB penguin" — a bitmap image encrypted block-by-block in ECB reveals the outlines of the image in the ciphertext because identical pixel blocks produce identical ciphertext blocks. No modern, correctly designed scheme uses ECB.

15.2 Reduction-Based Security

A security reduction shows that any efficient adversary breaking the scheme can be converted into an efficient algorithm solving the underlying hard problem:

Theorem (IND-CPA security of AES-CTR from pseudorandom permutation): Assume AES is a pseudorandom permutation (PRP) family — i.e., for any polynomial-time distinguisher \(D\), \(\Pr[D^{AES_k} = 1] - \Pr[D^{\pi} = 1] \leq \text{negl}(\lambda)\) where \(\pi\) is a truly random permutation. Then AES-CTR mode (with a uniformly random nonce \(N\)) is IND-CPA secure.
Proof sketch (reduction): Suppose there exists a polynomial-time adversary \(\mathcal{A}\) that wins the IND-CPA game against AES-CTR with non-negligible advantage \(\epsilon\). We construct a distinguisher \(D\) for the PRP assumption:

\(D\) receives oracle access to either AES\(_k\) (for a random key \(k\)) or a truly random permutation \(\pi\). \(D\) simulates the IND-CPA challenger for \(\mathcal{A}\): for each encryption query \(m\), pick a fresh nonce \(N\) and compute the keystream by querying the oracle on \((N, 1), (N, 2), \ldots\); XOR with \(m\). When \(\mathcal{A}\) submits \(m_0, m_1\), encrypt \(m_b\) the same way. Feed the ciphertext to \(\mathcal{A}\) and output \(1\) iff \(\mathcal{A}\) guesses correctly.

If the oracle is AES\(_k\), this is a perfect simulation of the IND-CPA game, so \(D\) outputs 1 with probability \(1/2 + \epsilon\). If the oracle is a random permutation \(\pi\), the keystream is uniformly random (since nonces are fresh), so the ciphertext is a one-time pad and \(\mathcal{A}\) wins with probability exactly \(1/2\). Hence \(D\)’s advantage is \(\epsilon\) — contradicting the PRP assumption if \(\epsilon\) is non-negligible.

This template — simulate the scheme’s challenger using the hard-problem oracle, extract the adversary’s advantage as the distinguisher’s advantage — underlies nearly every security proof in modern symmetric and public-key cryptography.

15.3 The Random Oracle Model

Many practical constructions (Fiat–Shamir, OAEP padding for RSA, PSS signatures) are proved secure in the random oracle model (ROM): all hash function calls are modelled as calls to a truly random function (an oracle) that returns a uniformly random output for each new query and the same output for repeated queries. The ROM proof strategy:

  1. Model \(H\) as a random oracle accessible to all parties (including adversary).
  2. Program the random oracle to embed the hard problem into the proof.
  3. Show that any adversary winning the security game must either solve the hard problem or detect that the oracle was “programmed” (impossible if the programming is indistinguishable from truly random).

ROM proofs are not unconditionally sound (Canetti, Goldreich, Halevi 1998 showed that some ROM-secure schemes are insecure when \(H\) is instantiated by any concrete function), but they are widely accepted as evidence of security in practice, and no concrete ROM-based scheme has been broken via this theoretical gap.


Chapter 16: Post-Quantum Transition

The standardisation of post-quantum cryptographic algorithms by NIST in 2024 marks the beginning of a long transition period. This chapter surveys the migration challenges and the current state of post-quantum deployment.

16.1 NIST Post-Quantum Standards

In August 2024, NIST finalized three primary post-quantum standards:

Post-quantum algorithms (NIST PQC Standards 2024):
  • ML-KEM (Kyber, FIPS 203): a key encapsulation mechanism based on Module-LWE. Replaces ECDH for key agreement. Parameters: Kyber-768 provides 128-bit post-quantum security with 1184-byte public keys and 1088-byte ciphertexts — much larger than a 32-byte EC public key but practical for TLS.
  • ML-DSA (Dilithium, FIPS 204): a digital signature scheme based on Module-LWE/SIS. Signatures are 2420–4595 bytes depending on parameter set. Already deployed in Linux kernel signing experiments and Let's Encrypt test deployments.
  • SLH-DSA (SPHINCS+, FIPS 205): a hash-based signature scheme relying only on the collision resistance of a hash function. Extremely conservative (security depends only on hash functions), but large signatures (8–50 KB). Suitable for firmware signing where signature size is not a bottleneck.

16.2 Hybrid Key Exchange

The TLS 1.3 working group has standardised hybrid key exchange (RFC 8446 extension): the shared secret is the concatenation of a classical ECDH secret and a post-quantum KEM secret, providing security against both classical and quantum adversaries. Even if one of the two algorithms is later found to be broken (either classically or quantumly), the session is still protected by the other. Google and Cloudflare have been deploying hybrid ECDH+Kyber in Chrome and TLS endpoints since 2023 in preparation for the post-quantum transition.

Remark (Harvest-now, decrypt-later attacks): A critical threat motivating urgent migration is the harvest now, decrypt later attack: an adversary with sufficient storage capabilities is today recording TLS-encrypted traffic in anticipation of a future cryptographically relevant quantum computer capable of breaking ECDH. Sensitive data encrypted today (classified government communications, medical records, financial transactions) may still be sensitive in 10–20 years, by which time a sufficiently large quantum computer may exist. The only defence against this retroactive decryption threat is migrating to post-quantum key exchange before the quantum computer exists — not after. NIST and NSA have published migration timelines recommending transition by 2030, with all legacy algorithms deprecated by 2035.

16.3 Migration Challenges

Migrating public-key infrastructure to post-quantum algorithms is a decade-long engineering project with several specific challenges:

  1. Key and signature size: RSA-2048 public keys are 256 bytes; Kyber-768 public keys are 1184 bytes. In constrained environments (IoT devices, embedded systems with limited RAM/flash), this may require hardware upgrades.

  2. TLS handshake overhead: Larger key exchange messages increase TLS handshake sizes. For mobile devices on high-latency connections, this increases connection setup latency measurably. Kyber has been designed with this in mind (it is fast and compact for a post-quantum KEM), but Dilithium signatures are significantly larger than ECDSA signatures.

  3. Certificate chains: X.509 certificates in the public PKI chain public keys and signatures. A full certificate chain (root CA → intermediate CA → leaf) with Dilithium signatures would be several kilobytes, compared to under 5KB for the current RSA/ECDSA chain. This affects every HTTPS connection.

  4. Cryptographic agility: Systems must be designed for cryptographic agility — the ability to swap algorithms without major re-engineering — since post-quantum algorithms are newer and may yet have vulnerabilities discovered. The Rainbow signature scheme, a NIST finalist, was broken by a single researcher in a weekend in 2022; the SIKE (isogeny-based) KEM was broken classically in 2022 in an hour on a laptop. Hybrid approaches and agile architectures protect against such surprises.


Chapter 17: Access Control and Authentication

Cryptographic mechanisms secure the channel and authenticate message integrity, but access control determines which authenticated principals may access which resources. Access control and authentication are complementary layers of the security stack.

17.1 Authentication Factors and Multi-Factor Authentication

Authentication proves identity via one or more factors:

  • Knowledge factor (something you know): passwords, PINs, security questions.
  • Possession factor (something you have): TOTP tokens (Google Authenticator), FIDO2 hardware keys (YubiKey), smart cards.
  • Inherence factor (something you are): biometrics (fingerprint, face recognition, voice).

TOTP (Time-based One-Time Password, RFC 6238): a HMAC-based OTP using the current Unix time divided by a 30-second window: \(\text{OTP} = \text{HOTP}(K, \lfloor t/30 \rfloor)\) where \(\text{HOTP}(K, c) = \text{Truncate}(\text{HMAC-SHA1}(K, c))\). The 6-digit OTP is valid for 30 seconds. TOTP requires no network communication and is phishing-resistant if users verify the domain before entering.

FIDO2 / WebAuthn: eliminates shared secrets entirely. During registration, the authenticator generates a keypair; the public key is stored at the server, the private key stays on the device. During authentication, the authenticator signs a challenge (including the origin domain) with the private key. This is phishing-resistant because the signed challenge includes the actual domain — a phishing site with a lookalike domain receives a signature over the wrong origin and cannot use it for the real site.

Remark (Phishing resistance of different MFA methods): Not all MFA is equally secure against phishing. SMS OTP (used by many banks) is not phishing-resistant: a real-time phishing proxy forwards the victim's credentials to the real site, triggers an SMS OTP, prompts the victim for it, and forwards it before expiry — the attacker authenticates as the victim. TOTP is similarly vulnerable to real-time proxying. FIDO2 is the only commonly deployed MFA method that is phishing-resistant by design: the signature covers the origin domain, so even a live phishing proxy cannot extract a valid authenticator response. This is why CISA (US Cybersecurity and Infrastructure Security Agency) recommends migrating high-value accounts to FIDO2 hardware keys.

17.2 Access Control Models

Definition (Access control matrix): The access control matrix \(M\) has rows indexed by subjects (users, processes) and columns by objects (files, resources). Entry \(M[s, o]\) lists the permissions subject \(s\) holds on object \(o\) (e.g., read, write, execute). In practice, the full matrix is stored either as:
  • ACL (Access Control List): per-object column of (subject, permissions) pairs. Unix file permissions and NTFS ACLs use this model. Efficiently answers "who can access this file?"
  • Capability list: per-subject row of (object, permissions) pairs. Capability-based systems (Capsicum, seL4) use this model. Efficiently answers "what can this process access?"

Mandatory Access Control (MAC): the OS enforces security labels — subjects and objects are assigned classification levels (Top Secret > Secret > Confidential > Unclassified). The Bell-LaPadula model enforces: no read up (a subject cannot read objects at a higher classification than their clearance) and no write down (a subject cannot write to objects at a lower classification, preventing leak). MAC is used in military and government systems (SELinux, AppArmor implement MAC for Linux).

Role-Based Access Control (RBAC): permissions are assigned to roles, and users are assigned to roles. RBAC simplifies administration in large organisations: a new employee is assigned the appropriate role and immediately inherits the correct permissions, without the administrator needing to enumerate individual permissions. The principle of least privilege — granting only the minimum permissions needed for a task — is more easily enforced with RBAC than with arbitrary ACL entries.

17.3 PKI and Certificate Chains

Public Key Infrastructure (PKI) solves the key distribution problem for public-key cryptography at scale: how does Alice know that the public key \(pk_B\) actually belongs to Bob? PKI answers this via a chain of trust:

  1. A Certificate Authority (CA) digitally signs a certificate binding a domain name to a public key: \(\text{Cert} = (\text{domain}, pk, \text{validity}, \ldots, \text{Sig}_{CA}(\text{domain}, pk, \ldots))\).
  2. Browsers trust a fixed set of root CAs (embedded in the OS/browser trust store; approximately 100–150 root CAs are trusted by default in Chrome and Firefox).
  3. Root CAs sign intermediate CA certificates; intermediates sign leaf certificates. The chain from leaf to root establishes trust.

Certificate revocation is the Achilles’ heel of PKI: when a private key is compromised, the associated certificate must be revoked. OCSP (Online Certificate Status Protocol) allows real-time revocation queries, but most browsers use OCSP stapling (the server includes a signed OCSP response in the TLS handshake) or CRLsets (pre-downloaded revocation lists). The 2011 DigiNotar breach — a Dutch CA was compromised, and attackers issued fraudulent certificates for google.com, used to MITM Iranian users — led to DigiNotar being removed from all trust stores and its bankruptcy within weeks. Certificate Transparency (CT, RFC 6962) addresses this systemically: all CAs are required to log every certificate they issue to public, append-only Merkle-tree logs; browsers require SCTs (Signed Certificate Timestamps) proving log inclusion before accepting a certificate. This makes fraudulent certificate issuance detectable.


Summary: Security Primitives at a Glance

GoalPrimitiveStandardNotes
ConfidentialityAES-256-GCMNIST FIPS 197AEAD; fresh nonce required
Integrity + authHMAC-SHA-256RFC 2104Constant-time comparison required
Key agreementX25519 + Kyber-768RFC 7748, FIPS 203Hybrid for PQ transition
Digital signaturesEd25519 / ML-DSARFC 8032, FIPS 204Ed25519 for classical, ML-DSA for PQ
Password hashingArgon2idRFC 9106Memory-hard; bcrypt/scrypt acceptable
Key derivationHKDF-SHA-256RFC 5869Extract-then-expand pattern
Secure channelTLS 1.3RFC 8446Mandatory for all network comms
Zero-knowledgeSchnorr / zk-SNARKFIPS 186 (Schnorr)SNARKs for blockchain privacy

Chapter 18: Privacy-Preserving Machine Learning

Machine learning systems trained on personal data face a fundamental privacy tension: the model must learn from individual data points, yet individuals should not be identifiable from the trained model. Privacy-preserving ML addresses this at the model training and inference stages.

18.1 Differential Privacy in Machine Learning

The standard mechanism for private ML is DP-SGD (Differentially Private Stochastic Gradient Descent, Abadi et al. 2016): at each training step, gradients are clipped to a maximum L2 norm (limiting the influence of any single example), Gaussian noise is added to the clipped gradients, and the noisy gradients are used to update the model weights.

Algorithm (DP-SGD): At each iteration \(t\):
  1. Sample a mini-batch \(B_t\) with probability \(q = |B|/N\).
  2. For each example \(x_i \in B_t\), compute gradient \(g_i = \nabla_\theta L(\theta; x_i)\).
  3. Clip: \(\bar{g}_i = g_i / \max(1, \|g_i\|_2 / C)\) where \(C\) is the clipping norm.
  4. Add noise: \(\tilde{g} = \frac{1}{|B|}\left(\sum_i \bar{g}_i + \mathcal{N}(0, \sigma^2 C^2 I)\right)\).
  5. Update: \(\theta_{t+1} = \theta_t - \eta \tilde{g}\).
The per-step privacy cost is \((\alpha, \epsilon_0)\)-Rényi DP; privacy over \(T\) steps is tracked via the **moments accountant** (Abadi et al.) or **Rényi DP composition**. The total \((\epsilon, \delta)\)-DP guarantee depends on \(T\), \(q\), \(\sigma\), and \(C\).
Example (DP-SGD parameters for MNIST classification): Training a convolutional network on MNIST (60,000 training examples) with \(|B| = 256\) (sampling rate \(q = 256/60000 \approx 0.004\)), clipping norm \(C = 1.0\), noise multiplier \(\sigma = 1.1\), and 60 epochs (\(T = 60\times60000/256 \approx 14000\) steps) achieves \((\epsilon = 1.0, \delta = 10^{-5})\)-DP with test accuracy approximately 98.5% — nearly identical to the non-private baseline of 99.3%. The 0.8% accuracy loss is the price of strong privacy guarantees. For more sensitive datasets (healthcare, financial) with \(\epsilon = 0.1\), the accuracy loss is larger and hyper-parameter tuning is critical. The Google team (Kairouz et al.) reports that DP fine-tuning of large pre-trained models (GPT-2, BERT) achieves near non-private accuracy at \(\epsilon = 10\), because the model already captures most task knowledge from public pre-training.

18.2 Federated Learning

Federated learning (McMahan et al. 2017) trains a model across decentralised devices (phones, hospitals) without centralising the raw data. Each round:

  1. A central server sends the current model \(\theta_t\) to a subset of devices.
  2. Each device trains locally on its data for \(E\) epochs, producing a local model update \(\Delta_i = \theta_t' - \theta_t\).
  3. The server aggregates updates: \(\theta_{t+1} = \theta_t + \frac{1}{K}\sum_i \Delta_i\) (FedAvg).
  4. Repeat until convergence.

Federated learning reduces data centralisation but does not provide privacy by itself: gradient updates can leak significant information about training data through gradient inversion attacks (Zhu et al. 2019 showed that individual training images can be reconstructed from a single gradient update to near-perfect accuracy). The combination of federated learning with DP-SGD (called federated DP) provides the strongest practical guarantee: local DP applied before aggregation, or central DP applied to the aggregated gradients. Apple uses federated DP for keyboard next-word prediction and emoji suggestions on iOS, with published privacy parameters (\(\epsilon \approx 2\) per day per user).

18.3 Membership Inference Attacks

Membership inference attacks (Shokri et al. 2017) determine whether a specific record was in a model’s training set, given black-box access to the model. The attack trains a meta-classifier on the distinction between “member” (training set) and “non-member” (test set) behaviour: models tend to have higher confidence on training examples (overfitting). ML models trained without privacy guarantees can leak substantial membership information — attack success rates of 60–80% have been demonstrated on production models (including a GPT-2 fine-tuned on medical notes).

Differential privacy directly mitigates membership inference: \(\epsilon\)-DP implies that the membership inference advantage of any adversary is bounded by \(e^\epsilon - 1\). For small \(\epsilon\) (strong privacy), membership inference is essentially no better than random guessing.

18.4 Machine Unlearning

Privacy regulations (GDPR right to erasure, CCPA deletion right) create a legal obligation to delete an individual’s data from trained models — so-called machine unlearning. Exact unlearning (retraining from scratch without the deleted record) is computationally feasible for small models but prohibitive for large neural networks trained on billions of records. Approximate unlearning algorithms (SISA training: sharded-isolated-sliced-aggregated; gradient ascent on the deleted records; Newton’s method in function space) aim to achieve unlearning with significantly less computation. The soundness of approximate unlearning — whether the post-unlearning model is genuinely statistically indistinguishable from a model trained without the record — remains an active research question, and no algorithm has been formally certified under GDPR compliance. This gap between the legal obligation and the technical reality is a significant open problem at the intersection of ML and privacy law.


Chapter 19: Oblivious RAM and Private Computation

19.1 Oblivious RAM

When a client outsources storage to an untrusted server, the server can observe which memory locations are accessed, even if the stored data is encrypted. Access patterns can reveal significant information: the sequence of database lookups reveals which records were queried; the sequence of file accesses reveals browsing history or document structure.

Oblivious RAM (ORAM) (Goldreich-Ostrovsky 1987, 1996) is a cryptographic primitive that hides access patterns. An ORAM scheme allows a client to read and write a remote array of \(N\) blocks such that the server observes a sequence of accesses that is statistically independent of the logical access pattern, regardless of the data and logical access sequence.

Theorem (ORAM bandwidth lower bound): Any ORAM scheme that hides access patterns perfectly against a computationally unbounded server requires amortised \(\Omega(\log N)\) bandwidth overhead per access (Goldreich-Ostrovsky). That is, \(N\) logical accesses require \(\Omega(N \log N)\) physical accesses to the server.

The PathORAM scheme (Stefanov et al. 2013) achieves \(O(\log^2 N)\) bandwidth overhead and is practically deployed in TrustZone and SGX enclaves for private database queries. The construction organises server storage as a binary tree of “buckets”; each logical access reads and rewrites a complete root-to-leaf path, shuffling the block to a new random position. This ensures the server sees uniformly random paths regardless of which logical block is accessed.

19.2 Trusted Execution Environments

Trusted Execution Environments (TEEs) provide hardware-enforced isolation: a secure enclave executes code and holds data in encrypted memory, shielded from the host OS, hypervisor, and other processes. Even the hardware manufacturer cannot read enclave memory after deployment (ideally).

  • Intel SGX (Software Guard Extensions): creates encrypted enclaves in user space. Remote attestation allows a remote party to verify the enclave’s code hash and hardware identity, establishing trust. SGX has been extensively attacked via side channels (cache timing, speculative execution — Foreshadow), but remains widely deployed in cloud confidential computing (Microsoft Azure, AWS Nitro).
  • ARM TrustZone: splits the processor into Normal World (untrusted OS) and Secure World (trusted OS for sensitive operations — biometric data, payment keys, DRM). Used in nearly all smartphones.
  • AMD SEV (Secure Encrypted Virtualisation): encrypts the memory of entire virtual machines, protecting tenant data from a compromised hypervisor. SEV-SNP (2021) adds integrity guarantees and attestation.

TEEs enable confidential computing — cloud customers can process sensitive data in the cloud without trusting the cloud provider’s infrastructure, verified by remote attestation. The NIST Cybersecurity Framework 2.0 (2024) explicitly references TEEs as a control for sensitive data processing in untrusted environments.

19.3 Secure Multiparty Computation in Practice

The Yao garbled circuit protocol (Chapter 8) handles two-party computation; for \(n > 2\) parties, the BGW protocol provides information-theoretic MPC. In practice, the main deployment challenges are:

Communication complexity: BGW requires \(O(n^2)\) point-to-point messages per gate for an \(n\)-party computation. For large \(n\) (hundreds of parties), broadcast-based protocols and secret-sharing optimisations reduce this. The SPDZ protocol (Damgård et al.) uses preprocessing (offline) plus fast online evaluation, achieving near-linear communication in \(n\) for the online phase.

Deployment: Notable real-world MPC deployments: (1) The Danish Sugar Beet auction (2008) — first commercial MPC deployment; (2) Privacy-preserving set intersection for COVID contact tracing in PEPP-PT and DP3T; (3) Boston Equal Pay study (2017) — Azer et al. used MPC to compute salary statistics across companies without revealing individual company data, publishing evidence of gender wage gaps while maintaining employer confidentiality.

The gap between MPC research and practice is narrowing. Modern frameworks (MOTION, SCALE-MAMBA, MP-SPDZ) enable non-experts to implement MPC protocols with security proofs. The main remaining barriers are latency (MPC typically requires multiple rounds of interaction, each adding network RTT) and the need for secure channels between all parties, which requires a PKI or out-of-band key exchange before computation begins.

Back to top