CO 486/CS 766: Theory of Quantum Information
Debbie Leung
Estimated study time: 4 hr 41 min
Table of contents
Sources and References
Primary textbook — John Watrous, The Theory of Quantum Information (Cambridge University Press, 2018). The Fall 2011 lecture notes (LN2011) that formed the basis of the textbook are cited throughout.
Supplementary texts — T. M. Cover and J. A. Thomas, Elements of Information Theory, 2nd ed. (Wiley, 2006); Michael Nielsen and Isaac Chuang, Quantum Computation and Quantum Information (Cambridge University Press, 2000); John Preskill, Lecture Notes for Physics 219: Quantum Computation, Chapter 10 (2022).
Online resources — Debbie Leung, QIC 820 / CO 486 / CS 766 course page (UWaterloo, Fall 2023): https://www.math.uwaterloo.ca/~wcleung/qic820-f2023.html; Preskill Phys 219 lecture notes at theory.caltech.edu; arXiv references cited inline for research results.
Chapter 1: Mathematical Foundations
The theory of quantum information rests on a precise algebraic framework. This chapter develops the necessary mathematical infrastructure: complex Euclidean spaces as the arena for quantum states, linear operators as the language for transformations and measurements, the vec operator as a bridge between operators and vectors, and convexity as the geometry underlying optimisation.
1.1 Complex Euclidean Spaces
Classical information is stored in a finite set of symbols. Quantum information generalises this by allowing probability amplitudes — complex numbers — in place of probabilities. The natural home for these amplitudes is a complex Euclidean space indexed by the classical symbol set.
and the Euclidean norm is \(\|u\| = \sqrt{\langle u,u\rangle}\). The standard basis is \(\{e_a : a \in \Sigma\}\) with \(e_a(b) = \mathbf{1}[a=b]\). More generally, the \(p\)-norm is \(\|u\|_p = \bigl(\sum_a |u(a)|^p\bigr)^{1/p}\) and the sup-norm is \(\|u\|_\infty = \max_a|u(a)|\).
The inner product is linear in the second argument and conjugate-linear in the first — the physics convention, opposite to the mathematics convention. We use the Dirac bra-ket notation \(|u\rangle\) for \(u \in \mathcal{X}\) and \(\langle u|\) for its dual (conjugate transpose), so \(\langle u|v\rangle = \langle u,v\rangle\).
The norms satisfy the Cauchy-Schwarz inequality \(|\langle u,v\rangle| \leq \|u\|\|v\|\), with equality iff \(u\) and \(v\) are proportional. The Euclidean norm and the 2-norm coincide on vectors. The norms are related by \(\|u\|_\infty \leq \|u\|_2 \leq \|u\|_1 \leq \sqrt{|\Sigma|}\,\|u\|_2 \leq |\Sigma|\,\|u\|_\infty\).
1.2 Linear Operators and Operator Norms
- Hermitian: \(A = A^*\). The set \(\operatorname{Herm}(\mathcal{X})\) is a real vector space. Eigenvalues of Hermitian operators are real.
- Positive semidefinite (PSD): \(A \in \operatorname{Pos}(\mathcal{X})\) iff \(A = B^*B\) for some \(B\), equivalently all eigenvalues of \(A\) are \(\geq 0\). Write \(A \geq 0\).
- Positive definite: \(A \in \operatorname{Pd}(\mathcal{X})\) iff all eigenvalues of \(A\) are \(> 0\). Write \(A > 0\).
- Unitary: \(UU^* = U^*U = I\). These form the group \(\mathcal{U}(\mathcal{X})\).
- Projector: \(\Pi^2 = \Pi = \Pi^*\). Projectors correspond to subspaces of \(\mathcal{X}\).
The spectral theorem allows us to define operator functions: for \(f:\mathbb{R} \to \mathbb{R}\) and Hermitian \(A = \sum_i \lambda_i \Pi_i\), set \(f(A) = \sum_i f(\lambda_i)\Pi_i\). The most important cases are \(\sqrt{A}\) (for \(A \geq 0\)) and \(\log A\) (for \(A > 0\)).
Every operator also has a singular value decomposition, which is the analogue of the spectral theorem for non-square or non-normal operators.
The three fundamental cases are:
- Trace norm \(\|A\|_1 = \sum_i s_i\) (also called nuclear norm or 1-norm).
- Frobenius norm \(\|A\|_2 = \sqrt{\sum_{a,b}|A_{ab}|^2} = \sqrt{\operatorname{Tr}(A^*A)}\).
- Spectral norm \(\|A\|_\infty = s_1\) (largest singular value; also called operator norm).
These norms satisfy \(\|A\|_\infty \leq \|A\|_2 \leq \|A\|_1\) and the duality \(\|A\|_1 = \max\{\langle A, B\rangle : \|B\|_\infty \leq 1\}\). The trace norm is especially important because it governs distinguishability of quantum states.
The trace norm has a direct operational meaning: for two quantum states \(\rho,\sigma\), the probability of correctly identifying the state (given uniformly random prior) equals \(\tfrac{1}{2}+\tfrac{1}{4}\|\rho-\sigma\|_1\). This is the Holevo-Helstrom theorem (Theorem 2.1). The spectral norm (operator norm) \(\|A\|_\infty = \lambda_{\max}(A^*A)^{1/2}\) is the largest singular value; for Hermitian \(A\) it is \(\max(|\lambda_{\max}(A)|,|\lambda_{\min}(A)|)\). The Frobenius norm \(\|A\|_2 = \sqrt{\operatorname{Tr}(A^*A)}\) is the Hilbert-Schmidt norm and is induced by the inner product \(\langle A,B\rangle = \operatorname{Tr}(A^*B)\).
A key inequality connecting trace norm and fidelity: \(\|\rho-\sigma\|_1^2 \leq 4(1-F(\rho,\sigma)^2)\). This is useful for bounding distinguishability from fidelity. For pure states \(\rho = |\psi\rangle\langle\psi|\) and \(\sigma = |\phi\rangle\langle\phi|\): \(\|\rho-\sigma\|_1 = 2\sqrt{1-|\langle\psi|\phi\rangle|^2}\) (the “Bures distance”) and \(F = |\langle\psi|\phi\rangle|\), giving the tight Fuchs-van de Graaf bound as an equality.
The Schatten norms also satisfy Hölder’s inequality: \(|\langle A, B\rangle| \leq \|A\|_p \|B\|_q\) for \(1/p + 1/q = 1\). For \(p = q = 2\) this is the Cauchy-Schwarz inequality for the Hilbert-Schmidt inner product. The Frobenius norm is submultiplicative: \(\|AB\|_2 \leq \|A\|_2\|B\|_2\). The spectral norm is submultiplicative: \(\|AB\|_\infty \leq \|A\|_\infty\|B\|_\infty\). An important inequality connecting the three: \(\|ABC\|_1 \leq \|A\|_2\|B\|_1\|C\|_2\).
1.3 The Vec Operator and Tensor Identities
The vec operator provides a canonical identification between operators (matrices) and vectors (column vectors in a tensor product space). This identification is essential for purification theory, the Choi isomorphism, and many duality arguments.
Two important special cases: setting \(C = I\) gives \(\operatorname{vec}(A) = (A\otimes I)\operatorname{vec}(I) = (I\otimes A^T)\operatorname{vec}(I)\). Setting \(A = I\) gives \((I\otimes B)\operatorname{vec}(C) = \operatorname{vec}(CB^T)\).
The tensor identity has a key consequence for quantum channels. If \(\Phi(M) = \sum_k A_k M B_k^*\), then the natural representation satisfies \(K(\Phi)\operatorname{vec}(M) = \operatorname{vec}(\Phi(M))\), so \(K(\Phi) = \sum_k A_k \otimes \bar{B}_k\). The Choi matrix is \(J(\Phi) = (\Phi\otimes I)(\operatorname{vec}(I)\operatorname{vec}(I)^*)\), and the tensor identity gives the reconstruction formula \(\Phi(M) = \operatorname{Tr}_\mathcal{X}[(I\otimes M^T)J(\Phi)]\).
The vec identity also makes the Choi-Kraus correspondence transparent: \(J(\Phi) = \sum_k \operatorname{vec}(A_k)\operatorname{vec}(A_k)^*\) for Kraus operators \(\{A_k\}\). This is the Gram matrix of the vectors \(\{\operatorname{vec}(A_k)\}\), and its PSD property gives the CP condition.
- \(\Phi\) is CP iff \(J(\Phi) \geq 0\) (Choi's theorem).
- \(\Phi\) is TP iff \(\operatorname{Tr}_\mathcal{Y}(J(\Phi)) = I_\mathcal{X}\).
- \(\Phi\) is Hermiticity-preserving iff \(J(\Phi) = J(\Phi)^*\).
- \(\Phi\) is trace-non-increasing iff \(\operatorname{Tr}_\mathcal{Y}(J(\Phi)) \leq I_\mathcal{X}\).
- Composition: if \(\Phi_1: L(\mathcal{X})\to L(\mathcal{Y})\) and \(\Phi_2: L(\mathcal{Y})\to L(\mathcal{Z})\), then \(J(\Phi_2\circ\Phi_1) = [\Phi_2\otimes I_\mathcal{X}](J(\Phi_1))\), not a simple product.
1.4 Matrix Inequalities and Operator Monotone Functions
Operator functions inherit an ordering structure from the Löwner order on \(\operatorname{Herm}(\mathcal{X})\), but this ordering interacts with matrix operations in subtle ways.
Key operator monotone functions: \(f(t) = t^\alpha\) for \(\alpha \in (0,1]\) is operator monotone; \(f(t) = \log t\) is operator monotone; \(f(t) = t^2\) is operator convex but not operator monotone. Crucially, \(A \geq B \geq 0\) does not imply \(A^2 \geq B^2\) in general (the square function is not operator monotone).
In particular, if \(A\) and \(B\) commute, equality holds. The inequality is strict when they do not commute.
The Golden-Thompson inequality is used in the proof of the data processing inequality for quantum relative entropy via the Peierls-Bogoliubov inequality.
Lieb’s concavity theorem (1973) is one of the deepest results in matrix analysis. Its many corollaries include the joint convexity of quantum relative entropy (Theorem 5.8) and the strong subadditivity of von Neumann entropy (Theorem 5.10). The original proof used complex interpolation methods; Carlen-Lieb (2012) later gave a simpler proof using the operator geometric mean.
Combined with the Golden-Thompson inequality: these inequalities are the quantum analogues of the classical moment-generating function bounds used in large deviation theory.
1.5 Convexity
Quantum information theory is pervaded by convexity arguments — density operators form a convex set, separable states form a convex set, and the feasible regions of SDPs are convex. We collect the key results here.
Convexity theorems are applied throughout: Carathéodory bounds the decomposition length of separable states; Krein-Milman identifies pure states as extreme points and underpins channel representations; the separating hyperplane theorem gives entanglement witnesses; Sion’s theorem underlies strong duality for SDPs.
Taking \(V = |v\rangle\) (a unit vector) gives the scalar Jensen inequality. Taking \(V\) to be the Stinespring isometry of a quantum channel gives the operator Jensen inequality used in quantum information.
Chapter 2: Quantum States and Channels
Quantum mechanics is a probabilistic theory, but the probabilities arise from a more fundamental description in terms of amplitudes. A quantum state is not simply a probability distribution over classical configurations — it is a positive semidefinite operator that encodes both the classical uncertainty and the quantum coherence of a system.
2.1 Registers and Density Operators
A state \(\rho \in D(\mathcal{X})\) is pure if \(\operatorname{rank}(\rho) = 1\), i.e., \(\rho = |u\rangle\langle u|\) for some unit vector \(|u\rangle\). Otherwise it is mixed.
The set \(D(\mathcal{X})\) is a compact convex set. Its extreme points are precisely the pure states \(\{|u\rangle\langle u| : \|u\| = 1\}\). By Krein-Milman, every \(\rho \in D(\mathcal{X})\) is a convex combination of pure states (this is just the spectral decomposition: \(\rho = \sum_x p(x)|e_x\rangle\langle e_x|\)). The maximally mixed state \(\frac{I}{d}\) (with \(d = \dim\mathcal{X}\)) is the centroid of \(D(\mathcal{X})\).
The non-uniqueness of the pure-state decomposition is fundamental. While the spectral decomposition \(\rho = \sum_x p(x)|e_x\rangle\langle e_x|\) uses eigenstates, the same density matrix can be written as a mixture over any ensemble of pure states consistent with the eigenvalues (via the Schrödinger mixture theorem). This non-uniqueness has a crucial physical consequence: no measurement of \(\rho\) alone can determine which pure-state ensemble was prepared. The state \(\rho\) encapsulates all operationally distinguishable information about the preparation.
The map \(\rho \mapsto \vec{r}\) is affine and identifies \(D(\mathbb{C}^2)\) with the unit ball \(\{\vec{r}:\,|\vec{r}|\leq 1\}\). Channels act as affine maps on the Bloch ball: \(\Phi(\rho) \leftrightarrow M\vec{r}+\vec{c}\) for a contraction matrix \(M\) and translation \(\vec{c}\). Unital channels (\(\Phi(I/2)=I/2\)) have \(\vec{c}=0\).
The constraint \(p(1-p) \geq |z|^2\) ensures \(\rho \geq 0\). In the Bloch sphere parameterisation, \(\rho = \tfrac{1}{2}(I + r_x X + r_y Y + r_z Z)\) with \(\vec{r} = (r_x,r_y,r_z) \in \mathbb{R}^3\), \(\|\vec{r}\| \leq 1\). Pure states correspond to \(\|\vec{r}\| = 1\) (the Bloch sphere surface); the maximally mixed state is the origin \(\vec{r} = 0\).
2.2 Measurements and POVMs
The most general quantum measurement on register \(X\) is described by a positive operator-valued measure.
When register \(X\) is in state \(\rho\), the probability of outcome \(a\) is \(p(a) = \langle M(a), \rho\rangle = \operatorname{Tr}(M(a)\rho)\). This is the Born rule.
A projective measurement \(\{P_a\}\) on a pure state \(|\psi\rangle\) yields outcome \(a\) with probability \(\|P_a|\psi\rangle\|^2 = \langle\psi|P_a|\psi\rangle\). After the measurement the state collapses to \(P_a|\psi\rangle/\|P_a|\psi\rangle\|\). For mixed states, the post-measurement state is \(P_a\rho P_a / \operatorname{Tr}(P_a\rho)\).
The key operational difference between a POVM and a projective measurement: a POVM can have more outcomes than the dimension of \(\mathcal{X}\), at the cost of not necessarily being a projection. Neumark’s (Naimark’s) dilation theorem states that any POVM on \(\mathcal{X}\) can be implemented as a projective measurement on a larger system \(\mathcal{X}\otimes\mathcal{Y}\): there exist projectors \(\{Q_a\}\) on \(\mathcal{X}\otimes\mathcal{Y}\) and a fixed state \(|0\rangle \in \mathcal{Y}\) such that \(\operatorname{Tr}(M(a)\rho) = \operatorname{Tr}(Q_a(\rho\otimes|0\rangle\langle 0|))\) for all \(\rho\).
where \(H(p) = -\sum_a p(a)\log p(a)\) is the Shannon entropy of the outcome. In particular, measuring a \(d\)-dimensional quantum state can produce at most \(\log d\) bits of Shannon entropy from any POVM.
2.3 Binary State Discrimination and the Holevo-Helstrom Theorem
A fundamental task is: given a register holding either state \(\rho_0\) (with prior \(p_0\)) or \(\rho_1\) (with prior \(p_1 = 1-p_0\)), find the measurement maximising the probability of correctly identifying the state. This is the quantum analogue of the classical hypothesis testing problem.
The key insight is that the optimal measurement projects onto the eigenspaces of the operator difference \(p_0\rho_0 - p_1\rho_1\). This is the quantum version of the Neyman-Pearson lemma: in classical hypothesis testing, the optimal test is the likelihood ratio test. Quantum mechanically, the eigenvectors of \(p_0\rho_0-p_1\rho_1\) play the role of the likelihood ratio threshold. States in the positive eigenspace are more likely to have come from \(\rho_0\); states in the negative eigenspace from \(\rho_1\).
The SDP formulation is illuminating. The discrimination probability is
\[ d = \sup_{M_0+M_1=I,\,M_0,M_1\geq 0}[p_0\langle M_0,\rho_0\rangle + p_1\langle M_1,\rho_1\rangle] = \tfrac{1}{2} + \tfrac{1}{2}\|p_0\rho_0-p_1\rho_1\|_1. \]The last equality is the Holevo-Helstrom theorem. Its dual SDP is \(\beta = \inf\operatorname{Tr}(Y)\) s.t. \(Y \geq p_0\rho_0\) and \(Y \geq p_1\rho_1\), and strong duality \(d=\beta\) follows since both are strictly feasible.
The optimal measurement projects onto the positive and negative eigenspaces of \(p_0\rho_0 - p_1\rho_1\): \(M_0 = \Pi^+\), \(M_1 = \Pi^- = I - \Pi^+\).
This is maximised by choosing \(T = \Pi^+ - \Pi^-\), and the variational formula \(\|A\|_1 = \max_{-I \leq T \leq I}\langle A, T\rangle\) gives \(\tfrac{1}{2}(1 + \|p_0\rho_0-p_1\rho_1\|_1)\). \(\square\)
With equal priors \(p_0 = p_1 = 1/2\), the maximum success probability is \(\tfrac{1}{2}(1 + \tfrac{1}{2}\|\rho_0 - \rho_1\|_1)\), so the trace distance \(\tfrac{1}{2}\|\rho_0-\rho_1\|_1\) is the bias above guessing. This gives the SDP formulation of trace distance.
2.4 Multi-State Discrimination
The binary discrimination problem extends naturally to \(n\) states. With states \(\{\rho_k\}_{k=1}^n\) prepared with priors \(\{p_k\}\), the optimal POVM satisfies the following.
Pretty-good measurement (PGM): For an ensemble \(\{p_k,|\psi_k\rangle\}\), the pretty-good measurement is \(M_k = p_k\rho^{-1/2}|\psi_k\rangle\langle\psi_k|\rho^{-1/2}\) where \(\rho = \sum_k p_k|\psi_k\rangle\langle\psi_k|\). It is not optimal in general but achieves within a constant factor of the optimal discrimination probability and is often optimal for symmetric ensembles.
2.5 Quantum Channels
- Positive if \(X \geq 0 \Rightarrow \Phi(X) \geq 0\).
- Completely positive (CP) if \(\Phi \otimes I_\mathcal{Z}\) is positive for every CES \(\mathcal{Z}\).
- Trace-preserving (TP) if \(\operatorname{Tr}(\Phi(X)) = \operatorname{Tr}(X)\) for all \(X\).
The distinction between positive and CP is important. The transpose map \(T(A) = A^T\) is positive (it maps PSD operators to PSD operators) but not CP: \((T \otimes I)(|\beta\rangle\langle\beta|)\) is not PSD for the Bell state \(|\beta\rangle\). This is why complete positivity — not just positivity — is the physically meaningful condition: any quantum channel must remain valid when applied to part of a larger system.
- Unitary channel: \(\mathcal{U}(\rho) = U\rho U^*\) for unitary \(U\). Noiseless reversible evolution.
- Dephasing channel: \(\Delta(\rho) = \sum_a |a\rangle\langle a|\rho|a\rangle\langle a|\). Destroys off-diagonal coherences; Kraus operators \(M_a = |a\rangle\langle a|\).
- Depolarising channel: \(\Omega_p(\rho) = (1-p)\rho + p\tfrac{I}{d}\). Mixes with the maximally mixed state.
- Amplitude damping (qubit): \(\Phi(\rho) = K_0\rho K_0^* + K_1\rho K_1^*\) with \(K_0 = \begin{pmatrix} 1 & 0 \\ 0 & \sqrt{1-\gamma} \end{pmatrix}\), \(K_1 = \begin{pmatrix} 0 & \sqrt{\gamma} \\ 0 & 0 \end{pmatrix}\). Models spontaneous emission.
- Partial trace \(\operatorname{Tr}_\mathcal{X}: L(\mathcal{X}\otimes\mathcal{Y}) \to L(\mathcal{Y})\): discarding subsystem \(\mathcal{X}\).
The map \(\Phi \mapsto J(\Phi)\) is a linear bijection (the Choi isomorphism). The inverse is \(\Phi(A) = \operatorname{Tr}_\mathcal{X}((I_\mathcal{Y}\otimes A^T)J(\Phi))\).
2.5 Representations of Channels and CP/TP Characterisations
Every quantum channel has four equivalent representations, each illuminating a different physical aspect.
- \(\Phi\) is completely positive.
- \(\Phi \otimes I_\mathcal{X}\) is positive.
- \(J(\Phi) \in \operatorname{Pos}(\mathcal{Y}\otimes\mathcal{X})\).
- \(\Phi\) has a Kraus representation: \(\Phi(M) = \sum_{a \in \Gamma} A_a M A_a^*\) for operators \(\{A_a\} \subset L(\mathcal{X},\mathcal{Y})\).
- As in (4), with exactly \(r = \operatorname{rank}(J(\Phi))\) Kraus operators.
- \(\Phi\) has a Stinespring dilation: \(\Phi(M) = \operatorname{Tr}_\mathcal{Z}(AMA^*)\) for isometry \(A \in L(\mathcal{X},\mathcal{Y}\otimes\mathcal{Z})\).
- As in (6), with \(\dim\mathcal{Z} = \operatorname{rank}(J(\Phi))\).
- \(\Phi\) is trace-preserving.
- The adjoint \(\Phi^*\) is unital: \(\Phi^*(I_\mathcal{Y}) = I_\mathcal{X}\).
- \(\operatorname{Tr}_\mathcal{Y}(J(\Phi)) = I_\mathcal{X}\).
- \(\sum_a A_a^* A_a = I_\mathcal{X}\).
The adjoint \(\Phi^*: L(\mathcal{Y}) \to L(\mathcal{X})\) is defined by \(\langle B, \Phi(A)\rangle = \langle \Phi^*(B), A\rangle\). For a Kraus map \(\Phi(M) = \sum_a A_a M A_a^*\), the adjoint is \(\Phi^*(B) = \sum_a A_a^* B A_a\). The TP condition \(\sum_a A_a^*A_a = I\) is equivalent to \(\Phi^*(I) = I\) (unital adjoint, the Heisenberg picture unitality).
The adjoint has the following physical interpretation. In the Schrödinger picture, \(\Phi\) evolves states: \(\rho \mapsto \Phi(\rho)\). In the Heisenberg picture, \(\Phi^*\) evolves observables: an observable \(B\) on \(\mathcal{Y}\) is “pulled back” to \(\Phi^*(B)\) on \(\mathcal{X}\), so that the expected value is \(\operatorname{Tr}(B\Phi(\rho)) = \operatorname{Tr}(\Phi^*(B)\rho)\). The TP condition ensures that the constant observable \(I_\mathcal{Y}\) (which always has expectation 1) is preserved: \(\Phi^*(I_\mathcal{Y}) = I_\mathcal{X}\). More generally, for a measurement POVM \(\{M(a)\}_a\) on the output space, the pullback \(\{\Phi^*(M(a))\}_a\) is the effective POVM on the input space.
Since \(\Delta(|0\rangle\langle 0|) = |0\rangle\langle 0|\), \(\Delta(|1\rangle\langle 1|) = |1\rangle\langle 1|\), and \(\Delta(|0\rangle\langle 1|) = \Delta(|1\rangle\langle 0|) = 0\):
\[ J(\Delta) = \begin{pmatrix}1&0&0&0\\0&0&0&0\\0&0&0&0\\0&0&0&1\end{pmatrix}. \]This is PSD (eigenvalues \(1,1,0,0\)) with rank 2, consistent with two Kraus operators. Verifying TP: \(\operatorname{Tr}_{L(\mathcal{Y})}(J(\Delta)) = |0\rangle\langle 0| + |1\rangle\langle 1| = I\). \(\checkmark\)
The four representations of a channel form a cycle: Stinespring \(\to\) Kraus (group rows of the isometry), Kraus \(\to\) Choi (apply formula), Choi \(\to\) Kraus (spectral decompose), Kraus \(\to\) natural representation (vectorise). Each representation is uniquely useful: Kraus for computation, Choi for characterising CP, Stinespring for purification arguments.
The complementary channel captures exactly the information that “leaks” to the environment during the evolution. If \(\Phi\) is a noisy communication channel, \(\hat\Phi\) models the eavesdropper’s channel. The symmetry between \(\Phi\) and \(\hat\Phi\) is not complete: while \(\hat{(\hat\Phi)}\) is isometrically equivalent to \(\Phi\) (by the fact that the Stinespring dilation is pure on \(\mathcal{Y}\otimes\mathcal{Z}\)), the dimensions of \(\mathcal{Y}\) and \(\mathcal{Z}\) can differ.
- A channel \(\Phi\) is degradable if there exists a channel \(\mathcal{D}\) such that \(\hat\Phi = \mathcal{D}\circ\Phi\) (the environment can be obtained from the output by further processing).
- A channel is anti-degradable if \(\Phi = \mathcal{D}'\circ\hat\Phi\) for some channel \(\mathcal{D}'\) (the output can be obtained from the environment).
- Degradable implies zero coherent information gap: for degradable channels, the LSD theorem's regularisation is unnecessary and \(Q(\Phi) = \max_\rho I_c(\rho,\Phi)\) (single-letter).
- Anti-degradable implies zero quantum capacity: \(Q(\Phi) = 0\), since the environment always has more information than the output.
We compute \(J(\Phi)\) directly from the definition \(J(\Phi) = (\Phi\otimes I)(|\beta\rangle\langle\beta|)\) with \(|\beta\rangle = |00\rangle+|11\rangle\):
\[ J(\Phi) = \Phi(|0\rangle\langle 0|)\otimes|0\rangle\langle 0| + \Phi(|0\rangle\langle 1|)\otimes|0\rangle\langle 1| + \Phi(|1\rangle\langle 0|)\otimes|1\rangle\langle 0| + \Phi(|1\rangle\langle 1|)\otimes|1\rangle\langle 1|. \]Computing: \(\Phi(|0\rangle\langle 0|) = |0\rangle\langle 0|\), \(\Phi(|0\rangle\langle 1|) = \sqrt{1-\gamma}|0\rangle\langle 1|\), and \(\Phi(|1\rangle\langle 1|) = (1-\gamma)|1\rangle\langle 1| + \gamma|0\rangle\langle 0|\). Thus in the basis \(\{|00\rangle,|01\rangle,|10\rangle,|11\rangle\}\):
\[ J(\Phi) = \begin{pmatrix}1 & 0 & 0 & \sqrt{1-\gamma} \\ 0 & 0 & 0 & 0 \\ 0 & 0 & \gamma & 0 \\ \sqrt{1-\gamma} & 0 & 0 & 1-\gamma\end{pmatrix}. \]Now we recover the Kraus operators from the eigenvectors of \(J(\Phi)\). The eigenvalues of \(J(\Phi)\) are \(1, 1-\gamma, \gamma, 0\) (using the block structure). The eigenvector for eigenvalue 1 is \(|u_1\rangle = |00\rangle + \sqrt{1-\gamma}|11\rangle / \sqrt{2-\gamma}\)… Actually, it is more direct to recognise that \(J(\Phi) = \operatorname{vec}(K_0)\operatorname{vec}(K_0)^* + \operatorname{vec}(K_1)\operatorname{vec}(K_1)^*\). One checks: \(\operatorname{vec}(K_0) = (1,0,0,\sqrt{1-\gamma})^T\) and \(\operatorname{vec}(K_1) = (0,0,\sqrt{\gamma},0)^T\), and their outer products sum to \(J(\Phi)\). \(\checkmark\)
Then \(\Omega_p(\rho) = \operatorname{Tr}_{\mathbb{C}^4}(A\rho A^*)\). Check: \(A^*A = (1-p)I + \tfrac{p}{3}(X^2+Y^2+Z^2) = (1-p)I + pI = I\). ✓ The environment \(\mathbb{C}^4\) registers which Pauli error occurred (or no error). Tracing out the environment destroys this record, giving the depolarising noise.
2.5b Quantum Instruments and Post-Measurement States
The Kraus representation captures the marginal statistics of a measurement but not the post-measurement states. The more complete framework is the quantum instrument.
The key relationship: every POVM \(\{M(a)\}\) arises as the marginal of a quantum instrument. Specifically, \(M(a) = (\Phi_a)^*(I_\mathcal{Y})\) where \((\Phi_a)^*\) is the adjoint of \(\Phi_a\). The simplest instrument associated with POVM element \(M(a) = K_a^*K_a\) is the Lüders instrument \(\Phi_a(\rho) = K_a\rho K_a^*\). The corresponding post-measurement state is \(K_a\rho K_a^*/\operatorname{Tr}(K_a^*K_a\rho)\).
The classical register stores the measurement outcome; the quantum register stores the post-measurement state. The overall map \(\mathcal{J}\) is a CPTP channel.
2.5c Random Unitary Channels and Pauli Channels
A practically important class of quantum channels are the random unitary channels (also called unital channels of Pauli type), which model incoherent noise processes.
For a qubit, the most important random unitary channels are the Pauli channels:
\[ \Phi(\rho) = (1-p_x-p_y-p_z)\rho + p_x X\rho X + p_y Y\rho Y + p_z Z\rho Z, \]parameterised by error probabilities \(p_x, p_y, p_z \geq 0\) with \(p_x+p_y+p_z \leq 1\). Special cases: the depolarising channel has \(p_x = p_y = p_z = p/3\); the dephasing channel has \(p_x=p_y=0\), \(p_z=p\); the bit-flip channel has \(p_x=p\), \(p_y=p_z=0\).
The Pauli channel has a particularly clean Choi matrix: since \(\{I,X,Y,Z\}/\sqrt{2}\) is an orthonormal basis for \(L(\mathbb{C}^2)\), the Choi matrix decomposes as \(J(\Phi) = \sum_P\hat\Phi_P\, P\otimes P^T\), where \(\hat\Phi_P\) are the eigenvalues of \(\Phi\) in the Pauli basis. The CP condition on \(J(\Phi)\geq 0\) gives constraints on the probabilities \((p_x,p_y,p_z)\).
2.6 Discrete Weyl Operators and Quantum Teleportation
The generalised Pauli operators form an orthonormal basis for \(L(\mathcal{X})\) and underlie the quantum one-time pad and teleportation.
- Unitarity: Each \(W_{b,c}\) is unitary.
- Orthogonality: \(\langle W_{b,c}, W_{b',c'}\rangle = d\cdot\mathbf{1}[b=b',c=c']\), so \(\{W_{b,c}/\sqrt{d}\}_{b,c}\) is an orthonormal basis for \(L(\mathcal{X})\).
- Commutation: \(XZ = \omega ZX\), so \(W_{b,c}W_{b',c'} = \omega^{bc'} W_{b+b',c+c'}\).
- Depolarisation: \(\frac{1}{d^2}\sum_{b,c}W_{b,c}AW_{b,c}^* = \operatorname{Tr}(A)\cdot\frac{I}{d}\) for all \(A\).
Setup: Alice and Bob share the maximally entangled state \(|\beta_{00}\rangle = \tfrac{1}{\sqrt{d}}\sum_a|a\rangle_A|a\rangle_B\). Alice wishes to send unknown state \(|\psi\rangle_C = \sum_a\alpha_a|a\rangle_C\) to Bob using only classical communication.
Step 1: Alice measures registers \(C, A\) in the Bell basis \(\{(W_{b,c}\otimes I)|\beta_{00}\rangle\}_{b,c}\), obtaining outcome \((b,c)\) each with probability \(1/d^2\).
Step 2: Alice sends the classical outcome \((b,c)\) to Bob (costs \(2\log_2 d\) classical bits).
Step 3: Bob applies \(W_{b,c}^*\) to his register \(B\), recovering \(|\psi\rangle\).
Why it works: The pre-measurement state of \(C,A,B\) is \(\sum_{b,c}\tfrac{1}{d}(W_{b,c}\otimes I)|\beta_{00}\rangle_{AB}\otimes W_{b,c}^*|\psi\rangle_B\) (expansion in the Bell basis). Alice's measurement collapses \(B\) to \(W_{b,c}^*|\psi\rangle\) for the observed \((b,c)\).
Task: Alice wants to send 2 classical bits to Bob using only 1 qubit of quantum communication, with a pre-shared Bell pair.
Setup: Alice and Bob share \(|\beta_{00}\rangle = \tfrac{1}{\sqrt{2}}(|00\rangle+|11\rangle)\).
Step 1: Alice encodes her 2-bit message \((b,c) \in \{0,1\}^2\) by applying \(W_{b,c}\) to her qubit:
- (0,0): apply \(I\) → joint state \(|\beta_{00}\rangle\)
- (0,1): apply \(Z\) → joint state \(|\beta_{01}\rangle = \tfrac{1}{\sqrt{2}}(|00\rangle-|11\rangle)\)
- (1,0): apply \(X\) → joint state \(|\beta_{10}\rangle = \tfrac{1}{\sqrt{2}}(|01\rangle+|10\rangle)\)
- (1,1): apply \(iY\) → joint state \(|\beta_{11}\rangle = \tfrac{1}{\sqrt{2}}(|01\rangle-|10\rangle)\)
Step 3: Bob performs a Bell measurement on both qubits and identifies which of the four Bell states he received, recovering 2 classical bits.
Resource accounting: 1 pre-shared ebit + 1 qubit communication → 2 classical bits transmitted. Combined with teleportation: 2 cbits + 1 ebit → 1 qubit. These are inverse protocols, and together they confirm that 1 ebit + 1 qubit \(\leftrightarrow\) 2 cbits (in resource terms).
For \(\Phi\) to clone \(|+\rangle\), we’d need \(\Phi(|+\rangle\langle+|) = |++\rangle\langle++| = \tfrac{1}{4}(|00\rangle+|01\rangle+|10\rangle+|11\rangle)(\langle 00|+\langle 01|+\langle 10|+\langle 11|)\), which requires \(\Phi(|0\rangle\langle 1|) = \tfrac{1}{2}(|00\rangle\langle 11|+|01\rangle\langle 00|+|01\rangle\langle 11|+\ldots)\). But the Choi matrix of \(\Phi\) would fail to be positive semidefinite (one can check that the off-diagonal requirements are inconsistent with CP). More elegantly: the fidelity argument — if \(\Phi\) clones, then for any two non-orthogonal states \(|\psi\rangle,|\phi\rangle\), \(|\langle\psi|\phi\rangle|^2 = F(\Phi(|\psi\rangle\langle\psi|),\Phi(|\phi\rangle\langle\phi|)) = F(|\psi\psi\rangle\langle\psi\psi|,|\phi\phi\rangle\langle\phi\phi|) = |\langle\psi|\phi\rangle|^4\). This requires \(|\langle\psi|\phi\rangle|^2 = |\langle\psi|\phi\rangle|^4\), which holds only for \(|\langle\psi|\phi\rangle|\in\{0,1\}\) — i.e., orthogonal or equal states. Contradiction. \(\square\)
Chapter 3: Purifications and Fidelity
Purification is the idea that any mixed state of a system can be regarded as part of a pure state on a larger system. This perspective converts questions about mixed states into questions about pure states, often dramatically simplifying proofs. The fidelity function, which quantifies how close two quantum states are, has its deepest characterisation through purifications via Uhlmann’s theorem.
3.1 Purifications
The connection to the vec operator is fundamental: \(|u\rangle\) purifies \(P\) iff \(|u\rangle = \operatorname{vec}(A)\) for some \(A \in L(\mathcal{Y},\mathcal{X})\) with \(AA^* = P\). This uses the identity \(\operatorname{Tr}_\mathcal{Y}(\operatorname{vec}(A)\operatorname{vec}(A)^*) = AA^*\).
The unitary equivalence theorem (Theorem 3.2) is the quantum analogue of the fact that all probability distributions with the same marginal can be related by a coupling. Operationally: if you have access to only the \(\mathcal{X}\) part of a purification, the choice of purifying system \(\mathcal{Y}\) is irrelevant — all purifications of \(\rho\) look the same to Alice. Bob can rotate between them without affecting Alice.
where \(\{|e_k\rangle\}\) is orthonormal in \(\mathcal{X}\), \(\{|f_k\rangle\}\) is orthonormal in \(\mathcal{Y}\), and \(r = \operatorname{rank}(\operatorname{Tr}_\mathcal{Y}|u\rangle\langle u|)\) is the Schmidt rank. The Schmidt coefficients \(\{\lambda_k\}\) are the eigenvalues of \(\operatorname{Tr}_\mathcal{Y}|u\rangle\langle u|\).
all have Schmidt rank 2 and entropy of entanglement 1 ebit — they are maximally entangled. The state \(|00\rangle\) has Schmidt rank 1 — it is a product state. For an arbitrary pure state in \(\mathbb{C}^2\otimes\mathbb{C}^d\), the Schmidt rank determines whether the state is entangled: rank 1 iff product, rank \(\geq 2\) iff entangled. For a mixed state \(\rho^{AB}\), the Schmidt number is the minimum Schmidt rank across all pure-state decompositions \(\rho = \sum_k p_k|\psi_k\rangle\langle\psi_k|\). A separable state has Schmidt number 1; an entangled state has Schmidt number \(\geq 2\). Witnessing Schmidt number \(\geq k\) requires entanglement witnesses from the dual cone of states with Schmidt number \(< k\).
The purification of a bipartite state plays a crucial role in proofs about conditional entropy, quantum channel capacity, and Uhlmann’s theorem. A key identity: for any pure tripartite state \(|\psi\rangle^{ABR}\), \(S(\rho^A) = S(\rho^{BR})\) and \(S(\rho^{AB}) = S(\rho^R)\). These follow from the Schmidt decomposition across each bipartition and the purity of the global state.
- \(S(AB) = S(C)\) and \(S(AC) = S(B)\) and \(S(BC) = S(A)\).
- The conditional entropies satisfy \(S(A|B) = -S(A|C)\) and cyclic permutations.
- The quantum mutual information satisfies \(I(A:B) = S(A)+S(B)-S(C)\) and is symmetric in all cyclic permutations.
3.2 The Fidelity Function
- Symmetry: \(F(P,Q) = F(Q,P)\).
- Range: For \(\rho,\sigma \in D(\mathcal{X})\), \(0 \leq F(\rho,\sigma) \leq 1\).
- Pure-mixed: \(F(|\psi\rangle\langle\psi|,\sigma) = \sqrt{\langle\psi|\sigma|\psi\rangle}\).
- Pure-pure: \(F(|\psi\rangle\langle\psi|,|\phi\rangle\langle\phi|) = |\langle\psi|\phi\rangle|\).
- Multiplicativity: \(F(P_1\otimes P_2, Q_1\otimes Q_2) = F(P_1,Q_1)F(P_2,Q_2)\).
- Extremes: \(F(\rho,\sigma) = 1\) iff \(\rho = \sigma\); \(F(\rho,\sigma) = 0\) iff \(\operatorname{supp}(\rho)\perp\operatorname{supp}(\sigma)\).
3.3 Uhlmann’s Theorem
Uhlmann’s theorem gives the most elegant characterisation of fidelity: it is the maximal overlap between purifications. The theorem can be understood intuitively as follows: the fidelity \(F(\rho,\sigma)\) measures how “close” the two states are in terms of how much their purifications can be aligned. Since all purifications of \(\rho\) in a fixed system \(\mathcal{X}\otimes\mathcal{Y}\) are related by unitaries on \(\mathcal{Y}\), the question of maximising \(|\langle u|v\rangle|\) over purifications \(|v\rangle\) of \(\sigma\) reduces to the variational problem \(\max_V |\operatorname{Tr}(A^*BV^T)|\), which by the trace-norm variational formula equals \(\|A^*B\|_1\).
By the variational formula \(\|C\|_1 = \max_U |\operatorname{Tr}(CU)|\), maximising over unitaries \(V\) gives
\[ \max_V |\langle u|v\rangle| = \|A^*B\|_1 = \|\sqrt{P}\sqrt{Q}\|_1 = F(P,Q), \]using the SVD and the polar decomposition for the last equality. \(\square\)
The proof of Uhlmann’s theorem via SDPs proceeds differently and more directly shows the connection between fidelity and the primal optimisation. Following Watrous (Lecture 6, LN2011) and the in-course SDP notes from Debbie Leung, the key steps are:
where \(|u\rangle\) is a fixed purification of \(P\). The dual SDP is
\[ \beta = \inf_{Z \in \operatorname{Herm}(\mathcal{X})} \langle Q, Z\rangle \quad\text{s.t.}\quad Z\otimes I_\mathcal{Y} \geq uu^*. \]The dual constraint \(Z\otimes I_\mathcal{Y} \geq uu^*\) is equivalent, via a Schur complement argument, to \(\langle Z^{-1}, P\rangle \leq 1\) for \(Z > 0\). Restricting to \(\langle Z^{-1},P\rangle = 1\) (scaling invariance) and squaring gives \(\beta^2 = \inf_{Z>0}\langle P,Z^{-1}\rangle\langle Q,Z\rangle\). By Alberti’s theorem, this equals \(F(P,Q)^2\), so \(\beta = F(P,Q)\). Strong duality follows from the strictly feasible primal point \(W = Q\otimes\tfrac{I_\mathcal{Y}}{\dim\mathcal{Y}}\) and the dual point \(Z = \|u\|^2 I_\mathcal{X}\). The primal optimal value \(d = F(P,Q)^2\) is achieved by \(W = |v^*\rangle\langle v^*|\) where \(|v^*\rangle\) is the optimal purification of \(Q\) achieving the fidelity overlap. \(\square\)
Uhlmann’s theorem can also be derived from the SDP formulation of fidelity. The primal SDP for \(F(P,Q)\) is
\[ d = \sup\;\tfrac{1}{2}\operatorname{Tr}(X)+\tfrac{1}{2}\operatorname{Tr}(X^*)\quad\text{s.t.}\quad\begin{bmatrix}P & X \\ X^* & Q\end{bmatrix}\geq 0. \]The dual is \(\beta = \inf\;\tfrac{1}{2}\langle P,Y\rangle+\tfrac{1}{2}\langle Q,Z\rangle\) s.t. \(\begin{bmatrix}Y & I \\ I & Z\end{bmatrix}\geq 0\). Strong duality (\(d=\beta\), attained) is verified using Slater’s condition. The dual with the substitution \(Z = Y^{-1}\) (optimal when \(P,Q>0\)) becomes Alberti’s theorem. The primal with the Schur complement gives \(\|V\|_\infty\leq 1\) which connects to the variational formula \(F(P,Q) = \sup_{\|V\|_\infty\leq 1}\operatorname{Re}\operatorname{Tr}(V^*\sqrt{P}\sqrt{Q}) = \|\sqrt{P}\sqrt{Q}\|_1\). This is exactly the statement of Uhlmann’s theorem with \(V = U\) a unitary (the variational formula is maximised by a unitary).
3.3b Process Fidelity and the Entanglement Fidelity
Beyond the state fidelity \(F(\rho,\sigma)\), there is a natural fidelity measure for quantum channels that compares the “distance” between two quantum processes.
where \(|u\rangle \in \mathcal{X}\otimes\mathcal{X}\) is any purification of \(\rho\) with \(\operatorname{Tr}_\mathcal{X}(|u\rangle\langle u|) = \rho\). The entanglement fidelity measures how well \(\Phi\) preserves the entanglement of \(\rho\) with a reference system.
The entanglement fidelity has the following key properties. First, it is independent of the choice of purification \(|u\rangle\) (by the unitary equivalence of purifications and linearity). Second, for the identity channel \(\Phi = \operatorname{id}\), \(F_e(\rho,\operatorname{id}) = 1\) for any \(\rho\). Third, the average fidelity over all pure states is related to \(F_e\) by \(\bar F(\Phi) = \int F(\Phi(|\psi\rangle\langle\psi|),|\psi\rangle\langle\psi|)d|\psi\rangle = \frac{dF_e(\rho_{\text{mm}},\Phi)+1}{d+1}\) where \(\rho_{\text{mm}} = I/d\) (Horodecki-Horodecki-Horodecki 1999).
where \(J(\Phi), J(\Psi)\) are their Choi matrices and \(d = \dim\mathcal{X}\). The process fidelity measures how similar the Choi matrices are, and reduces to \(F(\rho,\sigma)\) when \(\Phi,\Psi\) are replacer channels \(\Phi(\cdot)=\rho, \Psi(\cdot)=\sigma\).
3.4 Alberti’s Theorem
While Uhlmann characterises fidelity through purifications, Alberti’s theorem gives a variational characterisation through an optimisation over positive definite operators — the form that naturally arises as the dual of the fidelity SDP.
3.5 Fuchs-van de Graaf Inequalities
The fidelity and trace distance are the two primary distance measures for quantum states. They are related by tight inequalities.
Upper bound: By Uhlmann, choose purifications \(|u\rangle,|v\rangle\) achieving \(F(\rho,\sigma) = |\langle u|v\rangle|\). Then \(\|\rho-\sigma\|_1 \leq \|uu^*-vv^*\|_1 = 2\sqrt{1-|\langle u|v\rangle|^2} = 2\sqrt{1-F^2}\) (using data processing under the partial trace and the pure-state trace distance formula). Rearranging: \(F \leq \sqrt{1-\tfrac{1}{4}\|\rho-\sigma\|_1^2}\). \(\square\)
3.6 Data Processing and Joint Concavity
- \(\|\Phi(P)-\Phi(Q)\|_1 \leq \|P-Q\|_1\).
- \(F(\Phi(P),\Phi(Q)) \geq F(P,Q)\).
(2): Let \(X\) be optimal for \(F(P,Q) = \tfrac{1}{2}\operatorname{Tr}(X)+\tfrac{1}{2}\operatorname{Tr}(X^*)\) s.t. \(\begin{bmatrix}P&X\\X^*&Q\end{bmatrix}\geq 0\). Since \(\Phi\) is CP, \(\begin{bmatrix}\Phi(P)&\Phi(X)\\\Phi(X)^*&\Phi(Q)\end{bmatrix}\geq 0\), making \(\Phi(X)\) feasible for \(F(\Phi(P),\Phi(Q))\). By TP, \(\tfrac{1}{2}\operatorname{Tr}(\Phi(X))+\tfrac{1}{2}\operatorname{Tr}(\Phi(X)^*) = \tfrac{1}{2}\operatorname{Tr}(X)+\tfrac{1}{2}\operatorname{Tr}(X^*) = F(P,Q)\). \(\square\)
Chapter 4: Semidefinite Programming
Semidefinite programming (SDP) is to quantum information what linear programming is to classical combinatorial optimisation. Many fundamental quantities — trace distance, fidelity, diamond norm, optimal state discrimination — are exactly expressible as SDPs, enabling both computation and duality-based proofs.
In quantum information, SDPs appear in two distinct roles. First, as computational tools: given the numerical data (Kraus operators, state descriptions, Hamiltonian parameters), SDPs compute exact optimal values efficiently using interior-point methods (e.g., SeDuMi, CVXPY, MOSEK in \(O(n^{3.5})\) time for an \(n\)-dimensional variable). Second, as proof techniques: the primal-dual structure gives certificates of optimality and allows compact proofs of quantum information theorems by finding explicit dual witnesses. The Holevo-Helstrom theorem, Uhlmann’s theorem, and the quantum capacity hierarchy all admit SDP-based proofs that are often more transparent than direct analytical arguments.
A key principle is that most quantum information quantities arising from optimisation over POVMs, channels, or states are SDPs because the feasible region (density matrices, POVMs, channels) is a spectrahedron — a convex set defined by linear matrix inequalities. The Choi isomorphism maps the CP condition to a PSD constraint, converting channel optimisation into standard SDP form.
The duality structure of SDPs has a direct physical interpretation in quantum information. The primal optimises over physical objects (states, measurements, channels). The dual provides witnesses: the dual feasible point certifies that the primal objective cannot exceed a certain value, and the dual optimal witnesses (entanglement witnesses, measurement witnesses) identify the physical constraints. This is why SDP duality is so powerful in quantum information: optimal protocols and impossibility proofs are two sides of the same SDP.
4.1 The Primal and Dual SDP
where \(A \in \operatorname{Herm}(\mathcal{X})\), \(B \in \operatorname{Herm}(\mathcal{Y})\), and \(\Phi: L(\mathcal{X})\to L(\mathcal{Y})\) is a Hermiticity-preserving linear map. The variable \(X \in \operatorname{Pos}(\mathcal{X})\) is the primal variable. Set \(d = -\infty\) if infeasible and \(d = +\infty\) if unbounded.
The variable \(Y \in \operatorname{Herm}(\mathcal{Y})\) is the dual variable. The adjoint \(\Phi^*\) satisfies \(\langle Y,\Phi(X)\rangle = \langle\Phi^*(Y),X\rangle\).
The dual is derived via Lagrange multipliers. Associate dual variable \(Y \in \operatorname{Herm}(\mathcal{Y})\) to the equality constraint \(\Phi(X)=B\) and \(S \in \operatorname{Pos}(\mathcal{X})\) to the inequality \(X \geq 0\). The Lagrangian is \(\langle A,X\rangle + \langle B-\Phi(X),Y\rangle + \langle X,S\rangle = \langle B,Y\rangle + \langle X, A - \Phi^*(Y) + S\rangle\). For this to be bounded as \(X\) varies over \(\operatorname{Pos}(\mathcal{X})\), we need \(A - \Phi^*(Y) + S = 0\), giving the dual constraint.
The sign convention: the dual variable \(Y\) for the equality constraint is free (unconstrained as a Hermitian matrix, since the constraint is an equality). The dual variable \(S\) for the inequality \(X \geq 0\) is positive semidefinite (since \(-S\) is the Lagrange multiplier for \(X \geq 0\), and positivity of \(-S\) would mean \(S \leq 0\), but we choose the sign so that \(S \geq 0\) pairs with \(\langle X,S\rangle \geq 0\)).
The notation \(\Phi^*\) for the adjoint of the constraint map should not be confused with the adjoint (Heisenberg-picture) channel. In the SDP context, \(\Phi: L(\mathcal{X})\to L(\mathcal{Y})\) is any Hermiticity-preserving linear map, and its adjoint \(\Phi^*: L(\mathcal{Y})\to L(\mathcal{X})\) satisfies \(\langle Y, \Phi(X)\rangle = \langle\Phi^*(Y), X\rangle\). When \(\Phi = \Phi_{\text{ch}}\) is a quantum channel, the adjoint of the constraint map and the adjoint of the channel coincide, but for general SDP constraint maps they differ.
The primal optimal is \(d = 0\) (take \(t\to 0^+\)) and the dual optimal is \(\beta = 0\), so \(d = \beta\). However, this particular instance does not attain strong duality — a gap exists between attainment in the primal and dual.
4.2 Weak and Strong Duality
The duality gap \(\beta - d\) can be strictly positive. Closing the gap requires a regularity condition. Example 4.1 shows a gap exists between attainment (not between the values \(d=\beta=0\)), but Example 4.4 below shows what happens when Slater’s condition fails: the primal optimal is approached but not attained. This is analogous to the LP degeneracy situation, but for matrix variables.
A useful test: if the SDP arises from a quantum information problem with a natural “maximally noisy” or “identity” solution, strict feasibility usually holds for both primal and dual — precisely the regime of most quantum information SDPs.
4.3 Complementary Slackness
Complementary slackness provides necessary and sufficient conditions for optimality when strong duality holds — it is the quantum analogue of complementary slackness in LP.
The complementary slackness conditions can be interpreted geometrically: \(XS = 0\) means the support of \(X\) is orthogonal to the support of \(S\). Since \(S = \Phi^*(Y) - A\), this says “the primal optimal \(X\) is supported only on the subspace where the dual inequality \(\Phi^*(Y) \geq A\) is tight (equality holds).” This is exactly analogous to the classical LP condition that a basic feasible solution has nonzero variables only where the corresponding dual constraints are active.
These say: \(M_0\) is supported only on the subspace where \(Y = \tfrac{1}{2}\rho_0\) (the “boundary” between the two hypotheses), and similarly for \(M_1\). In the Helstrom case with \(p_0=p_1=1/2\), the optimal \(Y = \tfrac{1}{2}\Pi^+\rho_0\Pi^+ + \tfrac{1}{2}\Pi^-\rho_1\Pi^-\) and the slackness conditions are immediately satisfied: \(M_0 = \Pi^+\) is supported on the positive eigenspace of \(\rho_0-\rho_1\), so \(M_0(Y-\tfrac{1}{2}\rho_0) = \Pi^+(\tfrac{1}{2}\Pi^+\rho_0\Pi^+ - \tfrac{1}{2}\rho_0)\Pi^+ = 0\) since \(\Pi^+\rho_0\Pi^+ = \rho_0\Pi^+\) on the positive eigenspace. ✓
Primal: The discrimination probability is \(d = \sup\tfrac{1}{2}[\langle M,\rho\rangle + \langle M',\sigma\rangle]\) s.t. \(M + M' = I\), \(M,M' \geq 0\). This is an SDP in the block variable \(X = \operatorname{diag}(M,M') \in L(\mathbb{C}^2\oplus\mathbb{C}^2)\).
Dual variable: Associate \(Y \in \operatorname{Herm}(\mathbb{C}^2)\) to the equality constraint \(M+M' = I\). The dual constraint is \(\Phi^*(Y) \geq A\) where \(A = \tfrac{1}{2}\operatorname{diag}(\rho,\sigma)\) and \(\Phi^*(Y) = \operatorname{diag}(Y,Y)\). So the constraint is \(Y \geq \tfrac{1}{2}\rho\) and \(Y \geq \tfrac{1}{2}\sigma\). The dual objective is \(\operatorname{Tr}(Y)\).
Dual SDP: \(\beta = \inf\operatorname{Tr}(Y)\) s.t. \(Y \geq \tfrac{1}{2}|0\rangle\langle 0|\) and \(Y \geq \tfrac{1}{2}|1\rangle\langle 1|\). Since \(\rho = |0\rangle\langle 0|\) and \(\sigma = |1\rangle\langle 1|\) are orthogonal, the minimum \(Y = \tfrac{1}{2}(|0\rangle\langle 0|+|1\rangle\langle 1|) = \tfrac{I}{2}\), giving \(\beta = 1\). The primal achieves this with \(M = |0\rangle\langle 0|\), \(M' = |1\rangle\langle 1|\): success probability \(= \tfrac{1}{2}(1+1) = 1\). Strong duality holds: \(d = \beta = 1\).
Complementary slackness check: With optimal \(M = |0\rangle\langle 0|\), \(M' = |1\rangle\langle 1|\), and \(Y = I/2\): The slackness operator for \(M\) is \(Y - \tfrac{1}{2}\rho = \tfrac{I}{2} - \tfrac{1}{2}|0\rangle\langle 0| = \tfrac{1}{2}|1\rangle\langle 1|\). We need \(M(Y-\tfrac{1}{2}\rho) = |0\rangle\langle 0|\cdot\tfrac{1}{2}|1\rangle\langle 1| = 0\). ✓
For boundedness over \(X \geq 0\), we need \(A - yI + S = 0\), i.e., \(S = yI - A\), which requires \(S \geq 0\), i.e., \(y \geq \lambda_{\max}(A)\). The dual is thus: \(\beta = \inf\{y : y \geq \lambda_{\max}(A)\} = \lambda_{\max}(A)\). The primal achieves this with \(X = |v\rangle\langle v|\) the top eigenvector of \(A\). This is the variational formula for the maximum eigenvalue: \(\lambda_{\max}(A) = \sup_{\rho \in D(\mathcal{X})}\operatorname{Tr}(A\rho) = \sup_{\|u\|=1}\langle u|A|u\rangle\). Strong duality is immediate since both sides equal \(\lambda_{\max}(A)\).
4.4 Computing Adjoint Maps
To use any specific SDP, one must compute the adjoint \(\Phi^*\) of the constraint map \(\Phi\). Two methods are available.
So \(\Phi^*(Z) = I_\mathcal{X}\otimes Z\). This confirms the standard result: the adjoint of the partial trace is the tensor-with-identity map.
(from the supplementary SDP notes). We use Method 2 (inner product) to find \(\Psi^*(Y)\). Compute \(\langle Y, \Psi(X)\rangle = \operatorname{Tr}(Y^*\Psi(X))\):
\[ \langle Y,\Psi(X)\rangle = Y_{11}(X_{11}+X_{23}+X_{32}) + Y_{22}X_{22} = Y_{11}X_{11} + Y_{22}X_{22} + Y_{11}X_{23} + Y_{11}X_{32}. \]This must equal \(\langle\Psi^*(Y),X\rangle = \sum_{a,b}[\Psi^*(Y)]_{ab}^*X_{ab}\). Reading off coefficients:
\[ [\Psi^*(Y)]_{11} = Y_{11},\quad [\Psi^*(Y)]_{22} = Y_{22},\quad [\Psi^*(Y)]_{23} = Y_{11},\quad [\Psi^*(Y)]_{32} = Y_{11}. \]All other entries of \(\Psi^*(Y)\) are 0. Hence
\[ \Psi^*(Y) = \begin{pmatrix}Y_{11} & 0 & 0 \\ 0 & Y_{22} & Y_{11} \\ 0 & Y_{11} & 0\end{pmatrix}. \]One can verify this from the Kraus representation: \(\Psi(X) = M_1XM_1^* + M_2XM_2^* + M_3XM_3^*\) where
\[ M_1 = \begin{pmatrix}1&0&0\\0&0&0\\0&0&0\end{pmatrix},\; M_2 = \begin{pmatrix}0&0&0\\0&1&0\\0&0&0\end{pmatrix},\; M_3 = \tfrac{1}{\sqrt{2}}\begin{pmatrix}0&0&1\\0&0&0\\1&0&0\end{pmatrix} + \tfrac{1}{\sqrt{2}}\begin{pmatrix}0&1&0\\0&0&0\\0&0&0\end{pmatrix}. \]The adjoint Kraus operators give \(\Psi^*(Y) = \sum_k M_k^*YM_k\), recovering the same formula.
4.4b Application: Quantum State Tomography as Semidefinite Program
Quantum state tomography (QST) is the process of estimating a quantum state \(\rho\) from the outcomes of multiple measurements. The key computational step — fitting the experimental frequencies to a valid density matrix — is naturally an SDP.
where \(n_k\) are the raw counts. The constraint \(\rho \in D(\mathcal{X})\) (i.e., \(\rho\geq 0, \operatorname{Tr}(\rho)=1\)) is a semidefinite constraint, making QST an SDP (for the least-squares version) or a convex optimisation (for maximum likelihood).
The regularised QST problem can also be formulated as a semidefinite program: \(\min_\rho \lambda\operatorname{Tr}(\rho^2) - \sum_k f_k\log\operatorname{Tr}(M_k\rho)\) s.t. \(\rho\geq 0\), \(\operatorname{Tr}(\rho)=1\). The Tikhonov regularisation term \(\lambda\operatorname{Tr}(\rho^2) = \lambda\|\rho\|_2^2\) penalises pure states and encourages mixed solutions, acting as a prior for the maximum entropy principle. The resulting SDP has unique solution when the measurement is informationally complete (i.e., the POVM spans \(\operatorname{Herm}(\mathcal{X})\)).
4.5 Application: Optimal State Discrimination
- \(Y = \sum_k p_k M_k\rho_k\) (Hermitian), and
- \(Y \geq p_k\rho_k\) for all \(k\).
Both primal (uniform POVM: \(M_k = I/n\)) and dual (\(Y = \lambda I\) for large \(\lambda\)) are strictly feasible, so strong duality holds and both optima are attained.
At optimality, complementary slackness \(M_k(Y - p_k\rho_k) = 0\) for each \(k\) gives \(M_kY = p_kM_k\rho_k\). Summing over \(k\) and using \(\sum_kM_k = I\): \(Y = \sum_k p_k M_k\rho_k\). This is Hermitian (the primal objective evaluated at the optimal), which is condition (1). Condition (2) is dual feasibility. The converse follows by reversing the argument: conditions (1) and (2) imply the primal and dual objectives coincide, so the POVM is optimal. \(\square\)
4.6 Application: Fidelity as an SDP
The dual SDP is \(\beta = \inf\{\tfrac{1}{2}\langle P,Y\rangle + \tfrac{1}{2}\langle Q,Z\rangle : \begin{bmatrix}Y & I \\ I & Z\end{bmatrix}\geq 0\}\), and the dual optimal with \(Y = Z^{-1}\) recovers Alberti’s theorem.
Maximising over \(V\) by the variational formula: \(\sup\operatorname{Re}\operatorname{Tr}(X) = \|P^{1/2}Q^{1/2}\|_1 \cdot (\text{phase})\). Choosing the phase of \(X\) to make \(\operatorname{Tr}(X) = \operatorname{Tr}(X^*)\) gives \(F(P,Q)\). Strong duality: \(X=0\) is primal feasible (for \(P,Q\geq 0\)) and \(Y=Z=2I\) is strictly dual feasible, so Slater applies. \(\square\)
4.7 The Diamond Norm
where the supremum over auxiliary systems \(\mathcal{Z}\) is attained at \(\mathcal{Z} \cong \mathcal{X}\) (Lemma 20.2 of LN2011).
When two channels \(\Phi_0,\Phi_1\) are given uniformly at random, the optimal probability of correctly identifying which was applied equals \(\tfrac{1}{2} + \tfrac{1}{4}\|\Phi_0-\Phi_1\|_\diamond\). The auxiliary system \(\mathcal{Z}\) is essential: without it (using only \(\|\cdot\|_1\)), the discrimination probability may be strictly lower.
This can be cast as an SDP. Writing the trace norm \(\|A\|_1 = \max_{U\in\mathcal{U}(\mathcal{Y}\otimes\mathcal{X})}\operatorname{Re}\operatorname{Tr}(A^*U)\), the diamond norm equals the value of the SDP:
\[ \tfrac{1}{2}\|\Phi\|_\diamond = \sup_{X,\rho}\Bigl\{\operatorname{Re}\operatorname{Tr}(X) : \begin{bmatrix}(\Phi\otimes I)(\rho) & X \\ X^* & (\Phi^{\dagger}\otimes I)(\rho)\end{bmatrix}\geq 0,\;\rho \in D(\mathcal{X}\otimes\mathcal{X})\Bigr\}. \]For the special case where \(\Phi_0,\Phi_1\) are quantum channels and one wants \(\|\Phi_0-\Phi_1\|_\diamond\), the auxiliary system restriction to \(\mathcal{Z}\cong\mathcal{X}\) (dim\(\mathcal{Z}=\) dim\(\mathcal{X}\)) is sufficient, making this a finite-dimensional SDP.
Take the maximally entangled input \(|\beta\rangle\langle\beta|\). Then \((\text{id}\otimes I)(|\beta\rangle\langle\beta|) = |\beta\rangle\langle\beta|\) and \((\Omega\otimes I)(|\beta\rangle\langle\beta|) = \tfrac{I}{d}\otimes\tfrac{I}{d}\). The trace norm is
\[ \||\beta\rangle\langle\beta| - \tfrac{I}{d^2}\|_1 = \bigl(1-\tfrac{1}{d^2}\bigr) + (d^2-1)\cdot\tfrac{1}{d^2} = 2\bigl(1-\tfrac{1}{d^2}\bigr). \]For a qubit (\(d=2\)): \(\|\text{id}-\Omega\|_\diamond = 2(1-1/4) = 3/2\). Since the identity and depolarising channels are “maximally distinguishable” (one preserves quantum information, the other destroys it), this large diamond norm value \(3/2\) confirms that a single use of the unknown channel with an entangled probe can reliably distinguish them.
4.8 Application: Quantum Channel Simulation and the Reverse Shannon Theorem
A natural question arising from the diamond norm is: when can one channel \(\Phi\) simulate another channel \(\Psi\) using shared entanglement and classical communication? The reverse Shannon theorem answers this question and provides a unified framework connecting channel simulation, classical capacity, and entanglement consumption.
The reverse Shannon theorem gives a duality between the forward problem (maximise communication rate given a channel) and the reverse problem (minimise the resources needed to simulate a channel). Together with the HSW theorem, it implies that the classical capacity equals the channel simulation cost in the entanglement-assisted regime.
Chapter 5: Quantum Entropy and Information
Shannon’s information theory and quantum mechanics meet in the study of entropy. The classical Shannon entropy governs the compression of classical data; its quantum generalisation, the von Neumann entropy, governs quantum data compression and lies at the heart of every capacity theorem for quantum channels.
5.1 Shannon Entropy and the Asymptotic Equipartition Property
in bits (logarithm base 2), with the convention \(0\log 0 = 0\). For a binary source with bias \(p\), \(H(p) = -p\log p - (1-p)\log(1-p)\) is the binary entropy function.
The typical set is \(T_{n,\delta} = \{x^n \in \Omega^n : x^n\text{ is }\delta\text{-typical}\}\).
- \(p(T_{n,\delta}) \geq 1-\varepsilon\) — the typical set has high probability.
- \((1-\varepsilon)2^{n(H-\delta)} \leq |T_{n,\delta}| \leq 2^{n(H+\delta)}\) — the typical set is exponentially smaller than \(\Omega^n\).
- For any \(A \subseteq \Omega^n\) with \(p(A) \geq 1-\varepsilon\), \(|A| \geq (1-2\varepsilon)2^{n(H-\delta)}\) — the typical set is the smallest high-probability set.
The AEP is the rigorous form of the intuitive statement that “most sequences from a source look the same.” More precisely: the typical sequences each have probability roughly \(2^{-nH}\), and there are roughly \(2^{nH}\) of them. The remaining \(|\Omega|^n - 2^{nH}\) atypical sequences collectively have negligible probability.
The AEP is the foundational tool in both classical and quantum Shannon theory. Conceptually, it says that a source with entropy \(H\) behaves, asymptotically, like a uniform distribution over a set of size \(2^{nH}\). All compression, channel coding, and capacity theorems follow from this: the source can be compressed to \(nH\) bits with negligible error (Shannon’s source coding theorem), and a channel with capacity \(C\) can transmit at rate \(R < C\) by identifying the \(2^{nR}\) messages with codewords in the typical set of the output (Shannon’s noisy channel theorem).
It measures the information shared between \(X\) and \(Y\): how much knowing one reduces uncertainty about the other. \(I(X:Y) \geq 0\) (from the non-negativity of KL divergence) with equality iff \(X\perp Y\).
5.2 Shannon’s Noiseless Coding Theorem
- Direct: If \(R > H\), there exist encoder-decoder pairs with \(nR\) bits achieving error probability \(\leq \varepsilon\) for all large \(n\).
- Converse: If \(R < H\), any encoder-decoder with \(nR\) bits has error probability bounded away from 0 for all large \(n\).
Converse: Any encoder with \(nR < n(H-\delta)\) bits maps onto at most \(2^{nR}\) codewords. These cover at most \(2^{nR}\) source sequences. By AEP property (3), any set of size \(< (1-2\varepsilon)2^{n(H-\delta)}\) has probability \(< 1-\varepsilon\), so the decoder fails with probability \(> \varepsilon\). \(\square\)
5.3 Von Neumann Entropy
- Range: \(0 \leq S(\rho) \leq \log\dim\mathcal{X}\).
- Purity: \(S(\rho) = 0\) iff \(\rho\) is pure. \(S(\rho) = \log d\) iff \(\rho = I/d\).
- Unitary invariance: \(S(U\rho U^*) = S(\rho)\).
- Additivity: \(S(\rho\otimes\sigma) = S(\rho)+S(\sigma)\).
- Concavity: \(S(\sum_k p_k\rho_k) \geq \sum_k p_k S(\rho_k)\).
- Subadditivity: For bipartite \(\rho^{AB}\), \(S(AB) \leq S(A)+S(B)\), with equality iff \(\rho^{AB} = \rho^A\otimes\rho^B\).
The von Neumann entropy has a maximum entropy interpretation: among all states \(\rho\) satisfying linear constraints \(\operatorname{Tr}(O_k\rho) = c_k\), the one maximising \(S(\rho)\) is the thermal (Gibbs) state \(\rho = Z^{-1}e^{-\sum_k\beta_k O_k}\), where \(Z = \operatorname{Tr}(e^{-\sum_k\beta_k O_k})\) and \(\{\beta_k\}\) are Lagrange multipliers. For a single energy constraint \(\operatorname{Tr}(H\rho) = E\): the maximum entropy state is the Boltzmann-Gibbs state \(\rho_\beta = e^{-\beta H}/\operatorname{Tr}(e^{-\beta H})\) with temperature \(T = 1/(k_B\beta)\). This is the quantum statistical mechanics principle of maximum entropy (Jaynes 1957).
where \(p_\beta = \tfrac{e^{-\beta\omega/2}}{2\cosh(\beta\omega/2)}\) is the ground state population. The entropy is \(S(\rho_\beta) = h(p_\beta)\). At infinite temperature \(\beta\to 0\): \(p_\beta\to 1/2\), \(S\to 1\) bit (maximally mixed). At zero temperature \(\beta\to\infty\): \(p_\beta\to 1\), \(S\to 0\) (pure ground state). The entropy is maximised at infinite temperature, confirming the Gibbs state as the maximum entropy state for the energy constraint.
Equality holds if and only if \(\rho^{AB} = \rho^A\otimes\rho^B\).
Hence \(S(\rho^{AB}) \leq S(\rho^A)+S(\rho^B)\). Equality iff \(\rho^{AB} = \rho^A\otimes\rho^B\) (by equality in Klein’s). \(\square\)
The Araki-Lieb inequality shows that quantum conditional entropy \(S(A|B) = S(AB)-S(B)\) satisfies \(S(A|B) \geq -S(A)\). This is remarkable: while classically \(H(A|B) \geq 0\) always, quantum conditional entropy can be as negative as \(-S(A)\), achieved when \(\rho^{AB}\) is pure (so \(S(AB)=0\) and \(S(A|B) = -S(A)\)).
5.3b Concavity of Entropy and the Subadditivity Chain
The concavity of von Neumann entropy \(S(\sum_k p_k\rho_k) \geq \sum_k p_k S(\rho_k)\) has a clean proof via quantum relative entropy: \(S(\bar\rho\|\bar\rho) - \sum_k p_k S(\rho_k\|\bar\rho) = S(\bar\rho)\cdot\sum_k p_k - \sum_k p_k S(\rho_k) = -\sum_k p_k[S(\rho_k)-S(\bar\rho)+D(\rho_k\|\bar\rho)]\). Since \(D\geq 0\) and \(S(\bar\rho) = S(\sum_kp_k\rho_k)\), this gives concavity.
More precisely, the concavity identity is:
\[ S\!\Bigl(\sum_k p_k\rho_k\Bigr) - \sum_k p_kS(\rho_k) = \sum_k p_k S(\rho_k\|\bar\rho) \geq 0. \]The right side is the Holevo information \(\chi\) of the ensemble \(\{(p_k,\rho_k)\}\) with average state \(\bar\rho = \sum_k p_k\rho_k\). This connects the concavity of entropy directly to the Holevo bound on accessible information (Theorem 5.11): the information gap between “knowing which \(\rho_k\)” and “having \(\bar\rho\)” exactly equals the Holevo information, which upper-bounds how much Alice can communicate to Bob through the ensemble.
5.4 Quantum Data Compression
The proof uses the typical subspace: the eigenspace of \(\rho^{\otimes n}\) spanned by eigenvectors with eigenvalue close to \(2^{-nS(\rho)}\). Projecting onto this subspace and transmitting the projection succeeds with high fidelity by the quantum AEP, using only \(n(S(\rho)+\delta)\) qubits. The converse uses Fannes’ inequality and Winter’s argument.
To make this precise: let \(\rho = \sum_x p(x)|e_x\rangle\langle e_x|\) be the spectral decomposition. The eigenvectors of \(\rho^{\otimes n}\) are \(|e_{x_1}\rangle\otimes\cdots\otimes|e_{x_n}\rangle\) with eigenvalue \(\prod_i p(x_i) = p(x^n)\). The \(\delta\)-typical subspace is
\[ \mathcal{S}_{n,\delta} = \operatorname{span}\{|e_{x_1}\rangle\otimes\cdots\otimes|e_{x_n}\rangle : x^n \in T_{n,\delta}\}, \]and the typical projector \(\Pi_{n,\delta}\) projects onto \(\mathcal{S}_{n,\delta}\).
- \(\operatorname{Tr}(\Pi_{n,\delta}\rho^{\otimes n}) \geq 1-\varepsilon\) — the typical subspace has high probability.
- \((1-\varepsilon)2^{n(S(\rho)-\delta)} \leq \dim\mathcal{S}_{n,\delta} \leq 2^{n(S(\rho)+\delta)}\).
- The typical projector satisfies \(2^{-n(S(\rho)+\delta)}I \leq \Pi_{n,\delta}\rho^{\otimes n}\Pi_{n,\delta} \leq 2^{-n(S(\rho)-\delta)}\Pi_{n,\delta}\).
The TTS (typical subspace transmission) protocol uses an isometry \(V: \mathcal{S}_{n,\delta}\to\mathbb{C}^{\lceil 2^{n(S+\delta)}\rceil}\) to compress into \(n(S+\delta)\) qubits. The decoder applies \(V^\dagger\) followed by replacement of the discarded part with a fixed state \(|0\rangle\). The fidelity with the input (referenced against a purification) is shown to be \(\geq 1-O(\varepsilon)\) by the quantum AEP.
where \(\mathcal{E}\) (compress) and \(\mathcal{D}\) (decompress) are the encoding/decoding channels. For the average input state \(\rho^{\otimes n}\): \(\operatorname{Tr}(\Pi_{n,\delta}\rho^{\otimes n}) \geq 1-\varepsilon\) by Quantum AEP property (1). By Fannes’ inequality, this implies fidelity close to 1.
More precisely: let \(|u\rangle = \operatorname{vec}(\sqrt{\rho^{\otimes n}}/\|\cdot\|)\) be a purification reference. The fidelity with the identity channel is \(F_e(\rho^{\otimes n},\text{id}) = 1\). The TTS channel has entanglement fidelity \(F_e(\rho^{\otimes n},\mathcal{E}\circ\mathcal{D}) = \operatorname{Tr}(\Pi_{n,\delta}\rho^{\otimes n}) \geq 1-\varepsilon\). By the Fuchs-van de Graaf inequality (applied to the compressed channel versus identity), the worst-case input fidelity also approaches 1. \(\square\)
5.5 Entanglement Concentration and Dilution
For bipartite pure state \(|\psi\rangle_{AB} = \sum_v\sqrt{p(v)}|v\rangle_A|v\rangle_B\) in Schmidt form, define the entropy of entanglement \(E(|\psi\rangle) = S(\operatorname{Tr}_B|\psi\rangle\langle\psi|) = H(p)\).
The equality of concentration and dilution rates means the entropy of entanglement is the unique entanglement measure for bipartite pure states under asymptotic LOCC. If classical communication is charged, dilution costs only \(O(\sqrt{n})\) extra cbits — savings quantified by the entanglement spread \(\Delta(|\psi\rangle) = \log\operatorname{rank}(\rho_A) - \log(1/\|\rho_A\|_\infty)\).
The eigenvalues of \(\rho\) are \(\lambda_\pm = \tfrac{1}{2}\bigl(1 \pm \sqrt{1 - 4\det\rho}\bigr)\). We have \(\det\rho = \tfrac{7}{64} - \tfrac{1}{64} = \tfrac{6}{64} = \tfrac{3}{32}\), so
\[ \lambda_\pm = \tfrac{1\pm\sqrt{1-3/8}}{2} = \tfrac{1\pm\sqrt{5/8}}{2}. \]Numerically \(\lambda_+ \approx 0.951\) and \(\lambda_- \approx 0.049\). The von Neumann entropy is \(S(\rho) = h(\lambda_+) \approx 0.287\) bits.
The Schumacher compression protocol requires only \(\approx 0.287n\) qubits to encode \(n\) copies of this source with high fidelity, compared to \(n\) qubits naively. The typical subspace has dimension \(\approx 2^{0.287n}\): for \(n=100\), roughly \(2^{28.7} \approx 4\times 10^8\) dimensions out of the full \(2^{100}\) — a dramatic compression.
To see this concretely for small \(n=2\): the eigenbasis of \(\rho^{\otimes 2}\) consists of products of eigenvectors \(\{|e_+\rangle,|e_-\rangle\}\) with eigenvalues \(\lambda_+^2 \approx 0.904\), \(\lambda_+\lambda_- \approx 0.047\) (twice), and \(\lambda_-^2 \approx 0.002\). The typical sequences at the threshold \(2^{-2S} \approx 2^{-0.574}\) are the two states \(|e_+e_-\rangle,|e_-e_+\rangle\) along with \(|e_+e_+\rangle\). Together these carry probability \(0.904 + 2\times 0.047 \approx 0.998\), confirming that the typical subspace has dimension 3 and captures nearly all the probability mass.
- Alice measures her 4 qubits in the eigenbasis of \(\rho_A = \tfrac{3}{4}|0\rangle\langle 0|+\tfrac{1}{4}|1\rangle\langle 1|\). She obtains a typical string \(x_1x_2x_3x_4 \in \{0,1\}^4\).
- The global joint state after her measurement collapses to a product of \(\sim nE = 3.24\) Schmidt components. A typical string with \(\sim n\cdot\tfrac{3}{4} = 3\) zeros and \(\sim n\cdot\tfrac{1}{4} = 1\) one corresponds to a state with Schmidt coefficient \(\approx 2^{-n H(3/4,1/4)} = 2^{-3.24}\).
- Alice communicates her measurement outcome (classical): \(4\) cbits. Bob applies a local unitary to his 4 qubits based on Alice's announcement.
- The result is approximately \(\lfloor nE\rfloor = 3\) maximally entangled pairs, with fidelity approaching 1 as \(n\to\infty\).
5.6 Quantum Relative Entropy
An immediate application: \(S(\rho\|I/d) = \log d - S(\rho) \geq 0\), giving the upper bound \(S(\rho) \leq \log d\).
This confirms the formula \(S(\rho\|I/d) = \log d - S(\rho)\). Note that \(S(\rho\|\sigma) > 0\) since \(\rho \neq \sigma\) (Klein’s inequality). If we had taken \(\sigma = \rho\), then \(S(\rho\|\rho) = 0\).
The quantum relative entropy also satisfies the data processing inequality: for any channel \(\Phi\), \(S(\Phi(\rho)\|\Phi(\sigma))\leq S(\rho\|\sigma)\). This is one of the most fundamental results in quantum information — it says that processing quantum information cannot increase the distinguishability of two states. The classical analogue is the data processing inequality for classical divergences.
A natural question is: when does equality hold? The answer is given by the Petz recovery map, which provides a universal sufficient condition for equality and connects data processing to approximate quantum error correction.
where \(\Phi^*\) is the adjoint (Heisenberg-picture) channel. The Petz map is designed so that if \(\Phi\) is a degrading of some quantum channel and \(\sigma\) is a fixed point, then \(\mathcal{P}_{\sigma,\Phi}\circ\Phi = \operatorname{id}\) — the channel is perfectly recoverable.
In words: the data processing inequality is tight if and only if the state \(\rho\) can be perfectly recovered from its image \(\Phi(\rho)\) using the Petz map built from \(\sigma\).
This characterisation has deep implications for approximate quantum error correction. If a channel \(\Phi\) only slightly decreases \(S(\rho\|\sigma)\), then there exists a near-perfect recovery map — and quantitatively, the Fawzi-Renner theorem (2015) shows that \(S(\rho\|\sigma) - S(\Phi(\rho)\|\Phi(\sigma)) \geq -2\log F(\rho, \mathcal{P}_{\sigma,\Phi}\circ\Phi(\rho))\), giving a recovery fidelity bound purely from the relative entropy decrease.
- Dephasing channel \(\Delta(\rho) = \rho\) (fixes diagonal states): \(S(\Delta(\rho)\|\Delta(\sigma)) = S(\rho\|\sigma) \approx 0.189\). Equality — dephasing is information-lossless here since both inputs are already diagonal.
- Erasure channel \(\Phi_p\) (erases with probability \(p=1/2\)): the output is a mixture of the original state and an erasure flag. The relative entropy is \(\approx (1-p) \cdot 0.189 + 0 = 0.095\) bits — strictly less. The erasure discards half the information.
- Completely depolarising channel \(\Omega\): maps every state to \(I/d\), so \(\Omega(\rho) = \Omega(\sigma) = I/2\) and \(S(\Omega(\rho)\|\Omega(\sigma)) = 0\). Maximally noisy channel destroys all distinguishability.
5.6b Quantum Hypothesis Testing and Stein’s Lemma
A fundamental task in statistics is hypothesis testing: given many copies of an unknown state, decide whether it is \(\rho\) (null hypothesis) or \(\sigma\) (alternative). The quantum relative entropy governs the optimal error exponents.
- Type I error (false alarm): \(\alpha_n = \operatorname{Tr}((I-T_n)\rho^{\otimes n})\) — probability of rejecting \(\rho\) when it is true.
- Type II error (missed detection): \(\beta_n = \operatorname{Tr}(T_n\sigma^{\otimes n})\) — probability of accepting \(\rho\) when \(\sigma\) is the true state.
In other words, the optimal Type II error decreases exponentially with rate exactly \(S(\rho\|\sigma)\), independent of the constraint \(\varepsilon\).
The quantum Stein’s lemma establishes the quantum relative entropy as the fundamental distinguishability measure in the asymptotic hypothesis testing regime. Its classical analogue (Stein’s lemma for classical distributions) uses the KL divergence. The proof of the quantum version is substantially harder and uses the operator inequality techniques developed for quantum entropy — specifically, Lemma 3.17 of Hiai-Petz which is equivalent to the data processing inequality.
5.7 Strong Subadditivity
equivalently \(S(X|YZ) \leq S(X|Z)\) (conditioning on more information reduces uncertainty).
- Quantum mutual information: \(S(X:Y)_\rho = S(\rho^X)+S(\rho^Y)-S(\rho^{XY}) \geq 0\).
- Conditional entropy: \(S(X|Y)_\rho = S(\rho^{XY})-S(\rho^Y)\) (can be negative for entangled states).
- SSA is equivalent to \(S(X:Y|Z) \geq 0\), i.e., conditioning on \(Z\) cannot increase mutual information.
This negative value has operational meaning: Bob’s knowledge of his half of the Bell state actually helps the classical transmission of Alice’s qubit — it provides a resource. In the quantum teleportation protocol, the ebit precisely compensates for the missing qubit communication: \(-1\) ebit of conditional entropy means Bob can reconstruct Alice’s qubit from the 2 classical bits she sends, using the pre-shared entanglement.
Contrast with a classically correlated state \(\sigma^{AB} = \tfrac{1}{2}(|00\rangle\langle 00|+|11\rangle\langle 11|)\): here \(S(\sigma^{AB}) = 1\), \(S(\sigma^B) = 1\), so \(S(A|B)_\sigma = 0\). Classical correlations always give non-negative conditional entropy.
where \(E\) is the environment of the Stinespring dilation and the conditional entropy \(S(A|E)\) is evaluated on the state after the channel acts. The single-letter hashing bound is \(Q \geq \max_\rho S(A|E)_\rho = \max_\rho[S(\Phi(\rho)) - S_e(\rho,\Phi)]\), where \(S_e\) is the entropy exchange. SSA implies this quantity is non-negative precisely when the channel can transmit quantum information.
5.8 Holevo’s Theorem
where \(h(q) = -q\log q-(1-q)\log(1-q)\) is the binary entropy.
Fano’s inequality lower-bounds the error rate in any decoding scheme. Combined with Holevo’s theorem, it implies the quantum Holevo bound: transmitting \(n\) classical bits reliably requires at least \(n\) qubits in the channel, even with optimal encoding and measurement.
The HSW (Holevo-Schumacher-Westmoreland) theorem (1997) establishes that the classical capacity of a quantum channel \(\Phi: L(\mathcal{X})\to L(\mathcal{Y})\) is
\[ C = \lim_{n\to\infty}\frac{1}{n}\chi_n(\Phi),\quad \chi_n(\Phi) = \max_{\mathcal{E}}\chi(\Phi^{\otimes n}(\mathcal{E})), \]where the max is over all ensembles on \(\mathcal{X}^{\otimes n}\). The regularisation over \(n\) is generally necessary (Hastings 2008 showed strict superadditivity of \(\chi\) for some channels), making the classical capacity uncomputable in general.
The typical subspace argument shows: with high probability, the output \(\Phi^{\otimes n}(\rho_{x^n})\) lies in a typical subspace of dimension \(\approx 2^{nS(\Phi(\bar\rho))}\), while the contribution from each codeword lies in a subspace of dimension \(\approx 2^{n\sum_x p(x)S(\Phi(\rho_x))}\). Since \(R < \chi = S(\Phi(\bar\rho)) - \sum_x p(x)S(\Phi(\rho_x))\), the number of codewords \(2^{nR}\) times the per-codeword subspace dimension \(2^{n\sum p(x)S(\Phi(\rho_x))}\) is still exponentially smaller than the typical subspace. Hence a random measurement succeeds with high probability in identifying the message. \(\square\)
Here \(Q\) is the quantum capacity, \(C\) the classical capacity, and \(C_E\) the entanglement-assisted classical capacity. The gap \(C_E - C\) measures how much entanglement assistance helps, and the gap \(C - Q\) measures the advantage of classical over quantum communication. All three are in general different: for the erasure channel with probability \(p\), \(Q = (1-2p)^+ \cdot 1\) qubit, \(C = (1-p)\) bit, and \(C_E = 2(1-p)\) bits (superdense coding doubling).
Chapter 6: Separability and Entanglement Theory
Entanglement is the defining resource of quantum information. This chapter develops the mathematical theory of entanglement: how to characterise it, how to detect it, and how it behaves under local operations and classical communication.
The distinction between separable and entangled states is not merely a mathematical curiosity — it has profound operational consequences. Entangled states violate Bell inequalities, which are constraints that any classically correlated (separable) state must satisfy. Detecting this violation operationally certifies entanglement without any characterisation of the measurement devices (device-independent certification).
The mathematical theory of entanglement is rich and still not fully understood. The separability problem is NP-hard; the connection between PPT and separability breaks down above \(2\times 3\) dimensions; the quantum capacities remain uncomputable in general; and the relationship between entanglement measures is complex. The framework of SDP relaxations (§6.3b) and convex optimisation over quantum states (§6.7, §6.8) provides the best current tools for these problems, and many of the key results in this chapter are proved using the SDP duality machinery of Chapter 4.
6.1 Separable Operators
The set of separable operators is \(\operatorname{Sep}(\mathcal{X}_A:\mathcal{X}_B)\). The separable density operators are \(\operatorname{SepD}(\mathcal{X}_A:\mathcal{X}_B) = \operatorname{Sep}\cap D(\mathcal{X}_A\otimes\mathcal{X}_B)\). Operators outside \(\operatorname{Sep}\) are entangled.
6.2 The Horodecki Criterion and Entanglement Witnesses
Before developing the algebraic theory of separability, it is worth understanding the operational test that motivated the field: the CHSH inequality.
Quantum mechanics allows violation up to the Tsirelson bound \(2\sqrt{2}\).
If \(\langle B_0+B_1\rangle_Q = \beta_+\) and \(\langle B_0-B_1\rangle_Q = \beta_-\), then \(|\beta_+| + |\beta_-| \leq 2\) (since \(|\langle B_j\rangle_Q| \leq 1\)). Thus the expression is \(\leq |\langle A_0\rangle|\,|\beta_+| + |\langle A_1\rangle|\,|\beta_-| \leq |\beta_+|+|\beta_-| \leq 2\). For separable mixtures, convexity preserves the bound. \(\square\)
This violates the CHSH inequality, proving that no separable state could produce these correlations. The value \(2\sqrt{2}\) is exactly the Tsirelson bound — the maximum achievable by any quantum state.
Deciding separability is NP-hard in general (Gurvits 2003). However, necessary conditions give powerful tests that suffice to detect most entangled states in practice.
The separating hyperplane theorem applied to the convex set \(\operatorname{Sep}\) gives the abstract existence of a witness for every entangled state. The constructive challenge is to find explicit witnesses that are practically measurable in the lab. An entanglement witness \(H\) must be measurable (i.e., expressible as a sum of local observables \(H = \sum_i a_i\otimes b_i\)) and satisfy \(\langle H,P\otimes Q\rangle \geq 0\) for all products \(P,Q\geq 0\).
A common construction: given a linear map \(\Lambda\) that is positive but not CP, define \(H = (I\otimes\Lambda^*)(\Pi^+)\) where \(\Pi^+ = |\beta\rangle\langle\beta|\) is the maximally entangled projector. Then \(\langle H,\rho\rangle = \langle\Pi^+,(I\otimes\Lambda)\rho\rangle = F(\Lambda(\rho)) \geq 0\) for separable \(\rho\) (since \(\Lambda\) is positive and \(\rho^A\geq 0\) for product states), but \(\langle H,|\beta\rangle\langle\beta|\rangle = \operatorname{Tr}(\Pi^+ (I\otimes\Lambda)(|\beta\rangle\langle\beta|)) < 0\) when \(\Lambda\) is not CP.
the maximally entangled Bell state with \(|\beta_{00}\rangle = \tfrac{1}{\sqrt{2}}(|00\rangle+|11\rangle)\). The partial transpose acts as \((T\otimes I)\rho\): it transposes the \(A\) subsystem’s matrix elements, which exchanges \(|01\rangle\langle 10|\) and \(|10\rangle\langle 01|\):
\[ (T\otimes I)\rho = \begin{pmatrix}1/2 & 0 & 0 & 0 \\ 0 & 0 & 1/2 & 0 \\ 0 & 1/2 & 0 & 0 \\ 0 & 0 & 0 & 1/2\end{pmatrix}. \]The eigenvalues of this matrix are \(1/2, 1/2, 1/2, -1/2\). The negative eigenvalue \(-1/2\) proves the Bell state is entangled (not PPT), hence not separable. The optimal entanglement witness is the eigenvector of \((T\otimes I)\rho\) with the negative eigenvalue: \(H = |\psi^-\rangle\langle\psi^-|\) where \(|\psi^-\rangle = \tfrac{1}{\sqrt{2}}(|01\rangle-|10\rangle)\).
where \(\mathbb{F} = \sum_{i,j}|ij\rangle\langle ji|\) is the swap operator. Werner states are invariant under \((U\otimes U)\rho(U\otimes U)^*\) for any unitary \(U\). For \(\alpha > 0\), \(W_\alpha\) is entangled; for \(\alpha \leq 0\), \(W_\alpha\) is separable. For \(\alpha \in (0,1/d]\), \(W_\alpha\) is PPT and entangled (bound entangled in dimension \(d\geq 3\)); for \(\alpha > 1/d\), \(W_\alpha\) is NPT.
6.3 The Ball of Separable States
Theorem 6.3 says there is a Frobenius-norm ball of radius 1 around the identity operator \(I\) that lies entirely within \(\operatorname{Sep}\), after normalisation. This gives a concrete sufficient condition for separability and is used in the theory of quantum error correction (highly mixed states near the maximally mixed state are separable).
6.3b SDP Relaxations of Separability
Since exact separability testing is NP-hard, one uses SDP-based relaxations. The basic relaxation is the PPT test (Def. 6.3), which is an SDP feasibility check. More powerful is the DPS (Doherty-Parrilo-Spedalieri) hierarchy of SDP relaxations.
The DPS hierarchy satisfies \(\operatorname{Sep} \subset \cdots \subset \operatorname{Sep}^{(k+1)} \subset \operatorname{Sep}^{(k)} \subset \cdots \subset \operatorname{Sep}^{(1)} = \operatorname{PPT}\). The remarkable result (Brandão-Harrow 2013) is that for any \(\varepsilon > 0\), there exists \(k = O(\log(d)/\varepsilon^2)\) such that \(\operatorname{Sep}^{(k)} \subseteq \{\rho : \min_{\sigma\in\operatorname{Sep}}\tfrac{1}{2}\|\rho-\sigma\|_1 \leq \varepsilon\}\). This gives an efficient algorithm for approximating separability to within trace-norm precision, establishing that the separability problem has a quasipolynomial-time algorithm in fixed precision.
6.4 Separable Maps and Schmidt Rank
- \(\Phi\) has product Kraus operators: \(\Phi(M) = \sum_j(A_j\otimes B_j)M(A_j\otimes B_j)^*\).
- \(J(\Phi) \in \operatorname{Sep}(\mathcal{Y}_A\mathcal{X}_A:\mathcal{Y}_B\mathcal{X}_B)\).
- \(\Phi\) decomposes as a sum of product channels.
6.5 LOCC
The chain of inclusions is: \(\text{LO} \subset \text{LOCC}_1 \subset \text{LOCC}_N \subset \text{LOCC}^\infty \subset \text{LOCC} \subsetneq \text{SEP} \subset \text{PPT-preserving}\).
Protocol: Alice measures her qubit in the computational basis \(\{|0\rangle,|1\rangle\}\) and announces the outcome \(a\in\{0,1\}\). If the state is \(|\beta_{00}\rangle\): outcome \(a=0\) leaves Bob in \(|0\rangle\) (probability 1/2), and outcome \(a=1\) leaves Bob in \(|1\rangle\) (probability 1/2). If the state is \(|\beta_{01}\rangle\): outcome \(a=0\) leaves Bob in \(|1\rangle\), and outcome \(a=1\) leaves Bob in \(|0\rangle\). After Alice’s announcement: Bob sees \(|a\rangle\) iff state was \(|\beta_{00}\rangle\), and \(|1-a\rangle\) iff state was \(|\beta_{01}\rangle\). Bob measures his qubit: if his outcome agrees with \(a\), they have \(|\beta_{00}\rangle\); if it disagrees, they have \(|\beta_{01}\rangle\). This 1-round LOCC protocol succeeds with probability 1. It requires only 1 cbit of classical communication.
Contrast with 4 Bell states: Walgate-Hardy-Short-Vedral (2000) showed that any two orthogonal bipartite pure states can be perfectly discriminated by LOCC. But all four Bell states \(\{|\beta_{00}\rangle,|\beta_{01}\rangle,|\beta_{10}\rangle,|\beta_{11}\rangle\}\) cannot be discriminated perfectly by any LOCC protocol. This is the archetypal example of a separable operation outside LOCC.
(where \(|a-b\rangle = |a\rangle-|b\rangle\) unnormalised) together with 4 more form a basis with no additional product vector orthogonal to all of them. Hence the state \(\rho = \tfrac{1}{4}(I-\sum_{i=1}^5|u_i\rangle\langle u_i|)\) is PPT but entangled.
6.6 Nielsen’s Theorem
That is, \(|y\rangle\) is more entangled than \(|x\rangle\) iff the Schmidt coefficients of \(|y\rangle\) are more spread out (majorised by those of \(|x\rangle\)).
Explicit protocol: The doubly stochastic matrix mapping \(p\to q\) is \(\Lambda = \begin{pmatrix}1/2&1/2\\1/2&1/2\end{pmatrix}\), the maximally mixing matrix. By Birkhoff’s theorem: \(\Lambda = \tfrac{1}{2}P_{\text{id}} + \tfrac{1}{2}P_{\text{swap}}\).
Step 1 — Alice’s measurement: Alice measures in the eigenbasis \(\{|0\rangle_A,|1\rangle_A\}\) of \(\rho^A_x = \tfrac{3}{4}|0\rangle\langle 0|+\tfrac{1}{4}|1\rangle\langle 1|\):
- Outcome \(a=0\) (prob 3/4): state collapses to \(|00\rangle_B = |0\rangle_A|0\rangle_B\).
- Outcome \(a=1\) (prob 1/4): state collapses to \(|11\rangle_B = |1\rangle_A|1\rangle_B\).
Step 2 — Alice communicates \(a\): Bob receives the bit \(a\).
Step 3 — Bob’s correction: Bob applies a random unitary from the set \(\{I, X\}\) each with probability 1/2 (drawn independently of \(a\)):
- If he applies \(I\): state is \(|a\rangle_A|a\rangle_B\).
- If he applies \(X\): state is \(|a\rangle_A|(1-a)\rangle_B\).
6.7 Entanglement Measures
- Entanglement cost: \(E_c(\rho) = \inf\{r : \exists\text{ LOCC protocol converting ebit}^{\otimes nr}\to\rho^{\otimes n}\text{ for large }n\}\).
- Distillable entanglement: \(E_d(\rho) = \sup\{r : \exists\text{ LOCC protocol converting }\rho^{\otimes n}\to\text{ebit}^{\otimes nr}\text{ for large }n\}\).
where \(h\) is the binary entropy. For \(\alpha = 1\) (Bell state), \(E_f = 1\) ebit. For \(\alpha \leq 1/3\) the state is separable and \(E_f = 0\).
6.8 PPT States and Bound Entanglement
- PPT: because no product state is orthogonal to all five UPB vectors, \((T\otimes I)(\rho)\geq 0\).
- Entangled: the projector onto the orthogonal complement of the UPB entangles with any purification.
- Bound entangled: PPT implies \(E_d = 0\), but the entanglement cannot be distilled.
The concept of activation of bound entanglement (Brandão-Plenio 2008) shows that bound entangled states, despite having \(E_d = 0\), can still serve as a resource when used alongside other entangled states. In particular, a bound entangled state tensored with a small amount of free entanglement can sometimes yield more distillable entanglement than the free entanglement alone — “activating” the latent entanglement in the bound state. This demonstrates that the bound-free distinction is not a sharp resource boundary, and understanding when and how much bound entanglement can be activated is an active area of research. The Rains bound \(E_d(\rho) \leq R(\rho) = \min_{S:\,(T\otimes I)(S)\geq 0}S(\rho\|S)\) provides an SDP-computable upper bound on distillable entanglement and unifies several earlier results.
For qubit-qubit states \(\rho\) that are Bell-diagonal, \(R(\rho)\) can be computed explicitly using the structure of the PPT set in \(2\times 2\) systems. The Rains bound is strictly tighter than the logarithmic negativity \(E_N(\rho) = \log\|(T\otimes I)(\rho)\|_1\) for some states, but is harder to compute.
The partial transpose of the Bell state has eigenvalues \(+1/2,+1/2,+1/2,-1/2\). So \((T\otimes I)(W_\alpha)\) has eigenvalues \(\tfrac{1-\alpha}{4}+\tfrac{\alpha}{2} = \tfrac{1+\alpha}{4}\) (thrice) and \(\tfrac{1-\alpha}{4}-\tfrac{\alpha}{2} = \tfrac{1-3\alpha}{4}\). For \(\alpha > 1/3\) the last eigenvalue is negative. The trace norm is \(\|(T\otimes I)(W_\alpha)\|_1 = 3\cdot\tfrac{1+\alpha}{4} + |\tfrac{1-3\alpha}{4}| = \tfrac{3+3\alpha}{4}+\tfrac{3\alpha-1}{4} = \tfrac{2+6\alpha}{4} = \tfrac{1+3\alpha}{2}\) for \(\alpha > 1/3\). Thus the logarithmic negativity is
\[ E_N(W_\alpha) = \log\tfrac{1+3\alpha}{2}. \]For \(\alpha = 1\) (Bell state): \(E_N = \log 2 = 1\) ebit. For \(\alpha = 1/3\) (separability threshold): \(E_N = \log 1 = 0\). The logarithmic negativity continuously interpolates between 0 (separable) and 1 ebit (maximally entangled) for Werner states.
6.9 Monogamy of Entanglement
One of the most counterintuitive features of quantum entanglement is that it is monogamous: if Alice and Bob share a maximally entangled state, then Alice cannot be entangled with any third party Charlie. This contrasts sharply with classical correlations, which can be shared freely among many parties.
where \(\tau(\rho) = C(\rho)^2\) is the tangle (square of the concurrence \(C\)). The concurrence of a two-qubit state \(\rho\) is \(C(\rho) = \max(0, \lambda_1 - \lambda_2 - \lambda_3 - \lambda_4)\) where \(\lambda_1\geq\lambda_2\geq\lambda_3\geq\lambda_4\) are the square roots of the eigenvalues of \(\rho(Y\otimes Y)\bar\rho(Y\otimes Y)\).
The CKW inequality is the quantitative version of monogamy: Alice’s entanglement with Bob plus Alice’s entanglement with Charlie cannot exceed Alice’s entanglement with the joint system \(BC\). The slack \(\tau(A:BC) - \tau(A:B) - \tau(A:C)\) is called the residual tangle or 3-tangle, a genuine tripartite entanglement measure.
Chapter 7: Quantum Channel Capacities
The capacity theory for quantum channels is the quantum analogue of Shannon’s channel coding theorems, but far richer due to the multiplicity of resources (classical bits, qubits, entanglement) that can be transmitted and consumed.
7.1 Classical Capacity
The regularisation over \(n\) in the HSW theorem is necessary in general (Hastings 2008 proved that \(\chi\) is not additive), which makes \(C(\Phi)\) uncomputable. However, for degradable channels (channels \(\Phi\) for which there exists a map \(\mathcal{N}\) with \(\Phi_E = \mathcal{N}\circ\Phi\), where \(\Phi_E\) is the complementary channel), the Holevo information is additive and \(C = \chi(\Phi)\).
The relationship between the HSW capacity and the classical capacity of the quantum measurement channel is important. If one is restricted to product input states (no entanglement between uses), the maximum rate is \(\chi^{(1)}(\Phi) = \max_{\{p_k,\rho_k\}}\chi(\Phi(\mathcal{E}))\). The full classical capacity may be larger because entangled codewords across multiple uses can increase the Holevo information. Concretely: for the qubit depolarising channel \(\Omega_p\), entangled codewords provide no advantage (\(\chi\) is additive) and \(C(\Omega_p) = 1 - h(p) - p\log 3\). For the qubit erasure channel, no entanglement advantage either. The Hastings channel requires a specific carefully constructed channel where entanglement increases the Holevo information by a constant factor.
The dephasing channel contracts \(r_x\) and \(r_y\) by factor \((1-2p)\) but leaves \(r_z\) intact. The optimal encoding uses the computational basis states \(\{\rho_0 = |0\rangle\langle 0|, \rho_1 = |1\rangle\langle 1|\}\) with equal priors. Then \(\bar\rho = I/2\), \(S(\Delta_p(I/2)) = 1\), and \(S(\Delta_p(|0\rangle\langle 0|)) = S(\Delta_p(|1\rangle\langle 1|)) = 0\) (the computational basis is preserved). So \(\chi(\Delta_p) = 1\) bit regardless of \(p\)! The dephasing channel has full classical capacity — it only destroys off-diagonal quantum coherence, not the classical information in the diagonal.
7.2 Quantum Capacity and the LSD Theorem
The single-letter lower bound is the coherent information \(I_c(\rho,\Phi) = S(\Phi(\rho)) - S_e(\rho,\Phi)\), where \(S_e\) is the entropy exchange (equal to \(S(AE)_{\rho^{AE}}\)). SSA implies \(I_c \leq S(\Phi(\rho)) \leq \log\dim\mathcal{Y}\), and the LSD theorem says \(Q \geq \max_\rho I_c(\rho,\Phi)\), with equality for degradable channels.
7.3 Entanglement-Assisted Capacity
where \(I(A:B) = S(A) + S(B) - S(AB)\) is the quantum mutual information, and the maximisation is over input states. This is the fully quantum version of Shannon’s noisy channel theorem. Unlike \(C(\Phi)\) and \(Q(\Phi)\), \(C_E\) is always given by a single-letter formula with no regularisation needed.
- Teleportation: \(1Q + 1E \geq 2C\)
- Superdense coding: \(1Q + 1E \leq 2C\) (using entanglement to double classical capacity)
- Quantum capacity lower bound: \(C_E \geq C + Q\)
- All these equalities hold for specific channels (e.g., identity channel: \(C = Q = 1\), \(C_E = 2\))
where \(\Phi_E\) is the complementary channel. For the maximally mixed input \(\rho = I/2\): \(S(\Omega_p(I/2)) = 1\) (maximally mixed output since \(\Omega_p(I/2) = I/2\)). The environment after the Stinespring dilation of the depolarising channel has \(S_e(I/2, \Omega_p) = h(p) + (1-p)\log 3\) (the entropy of the error syndrome). So \(I(A:B) = 1 - h(p) - (1-p)\log 3\), which for \(p=0\) gives 2 bits (pure channel: \(C_E = 2\)), and for \(p=3/4\) gives 0 (completely depolarising). The entanglement-assisted capacity \(C_E = \max_\rho I(A:B)_\rho\) equals \(2 - 2h(p/2)\) by a more careful optimisation.
7.4 Private Classical Capacity
Classical information can be transmitted through a quantum channel not only reliably, but also privately — so that an eavesdropper (who holds the channel environment) obtains negligible information. This is the quantum-mechanical basis for quantum key distribution.
where \(\chi\) is the Holevo information and \(\hat\Phi\) is the complementary channel. The single-letter lower bound is the coherent information, showing that \(P(\Phi) \geq Q(\Phi)\).
The inequality \(P \geq Q\) has a beautiful physical interpretation: any rate at which quantum information can be reliably transmitted implies the same rate of private classical communication (encode the classical message in a qubit state). The deeper result is that \(P = Q\) for many natural channels — the quantum capacity of transmitting qubits is exactly the same as the private classical capacity. This connection was made precise by the Devetak-Shor (2005) equivalence: private codes can be converted to quantum codes and vice versa.
For \(p < 1/2\): private capacity equals quantum capacity \(Q = 1-2p\). For \(p \geq 1/2\): \(P = Q = 0\). The erasure channel achieves perfect privacy when \(p = 0\) (identity channel: eavesdropper gets nothing) and zero privacy for \(p > 1/2\) (eavesdropper has more information than the receiver).
The inequalities are in general strict. The channel may have \(C > 0\) but \(Q = 0\) (e.g., a classical replacement channel), or \(P < C\) (e.g., a channel with entanglement-breaking noise structure). The equality \(P = Q\) holds for degradable channels, and \(C = C_E/2\) holds when the channel is the identity (saturating the teleportation/superdense coding balance). Understanding all gaps between these capacities is an active research area.
7.5 Quantum Key Distribution and the BB84 Protocol
Quantum key distribution (QKD) is the application of quantum information theory to cryptography. It allows two parties to establish a shared secret key whose security is guaranteed by the laws of physics, not computational hardness assumptions.
- Correctness: Alice and Bob's keys agree with high probability.
- Security: The eavesdropper Eve has negligible mutual information about \(K\), even given control over the quantum channel.
- Preparation: Alice chooses random bits \(b_i\in\{0,1\}\) and random bases \(\theta_i\in\{Z,X\}\). She sends qubits: if \(\theta_i = Z\): send \(|b_i\rangle\); if \(\theta_i = X\): send \(|b_i\rangle_X = H|b_i\rangle\) where \(H\) is the Hadamard gate.
- Measurement: Bob measures each qubit in a random basis \(\theta'_i\in\{Z,X\}\).
- Sifting: Alice and Bob publicly announce their basis choices \(\theta_i, \theta'_i\) and keep only those bits where \(\theta_i = \theta'_i\) (half the bits on average). These form the sifted key.
- Error estimation: They sacrifice a random fraction of the sifted key to estimate the quantum bit error rate (QBER). If QBER exceeds a threshold \(\sim 11\%\), they abort.
- Classical post-processing: Error correction (information reconciliation) and privacy amplification produce the final key.
Security argument: Any eavesdropper who measures in the wrong basis disturbs the qubit, contributing to the QBER. The privacy amplification step extracts a shorter key from which Eve’s information is exponentially small. The security proof uses the BB84 key rate formula (Shor-Preskill 2000):
\[ \ell \geq n(1 - h(e_Z) - h(e_X)), \]where \(\ell\) is the secure key length, \(n\) is the sifted key length, \(e_Z\) is the bit error rate, and \(e_X\) is the phase error rate. This formula connects directly to the quantum capacity: the key rate equals the coherent information \(I_c\) of the depolarising channel with the measured error rates.
Chapter 8: Quantum Error Correction
Quantum error correction (QEC) is the theory of protecting quantum information against noise. The key insight — that quantum information can be encoded in larger Hilbert spaces in ways that allow errors to be detected and corrected without disturbing the encoded information — is the foundation of fault-tolerant quantum computation.
The existence of quantum error correction is perhaps the most surprising fact in quantum information theory: despite the no-cloning theorem (Theorem 2.4) and the no-broadcasting theorem for mixed states, it is still possible to protect quantum information against a restricted class of errors. The resolution of this apparent contradiction is that QEC does not copy the quantum information — it spreads it across many qubits in a highly entangled way, so that no single qubit carries enough information for the no-cloning argument to apply. The Knill-Laflamme conditions (Definition 8.2) make this precise: the error must leave no information about the logical codeword in the error syndrome.
The entropy connection is deep: correcting errors is dual to erasing entropy, and the channel capacity framework of Chapter 7 and the QEC framework of this chapter are two sides of the same coin — the quantum capacity \(Q(\Phi)\) equals the maximum code rate for which QEC is possible through \(\Phi\).
8.1 The Error Correction Framework
for some Hermitian matrix \(c = (c_{ij})\). That is, errors map \(\mathcal{C}\) to orthogonal subspaces (when \(c\) is diagonal) and leave no information about the logical codeword in the error syndrome.
8.2 Entropy and Error Correction
The connection between quantum error correction and quantum entropy is deep: the coherent information \(I_c\) is exactly the rate at which information survives a noisy channel, and the Knill-Laflamme conditions can be rephrased in terms of conditional entropy. Specifically: a code \(\mathcal{C}\) is exactly correctable for \(\Phi\) if and only if \(S(A|B)_\omega = -S(A)_\omega\) where \(\omega^{AB}\) is the state after sending half of the maximally entangled state on \(\mathcal{C}\otimes\mathcal{C}\) through \(\Phi\). This entropy formulation is equivalent to the Knill-Laflamme conditions and connects error correction directly to the negative conditional entropy criterion for entanglement (§5.3). Specifically, \(\{E_j\}\) are correctable for \(\mathcal{C}\) if and only if the “error syndrome” (which Kraus operator applied) is statistically independent of the logical information in \(\mathcal{C}\). This is the quantum analogue of the classical separation between parity check bits and message bits in linear codes.
It encodes 1 logical qubit into 5 physical qubits with distance 3, achieving the quantum Singleton bound \(5-1 = 4 = 2(3-1)\). The code is optimal: no \(((4,1,3))\) code exists. Its logical operators are \(\bar{X} = XXXXX\) and \(\bar{Z} = ZZZZZ\) (up to stabilisers).
8.3 CSS Codes and Fault Tolerance
The most practically important class of stabiliser codes are the CSS (Calderbank-Shor-Steane) codes, built from two classical linear codes with a nested structure. Their special form separates bit-flip and phase-flip error correction, making fault-tolerant gate implementation much simpler.
- \(X\)-type: \(X^{h}\) for each row \(h\) of \(H_2^T\) (equivalently, for each codeword in \(C_2^\perp\)).
- \(Z\)-type: \(Z^{h}\) for each row \(h\) of \(H_1\) (checks for \(C_1^\perp\)-errors).
The threshold theorem is the foundational result of fault-tolerant quantum computation. It says that if hardware errors are below a threshold (roughly \(10^{-3}\) to \(10^{-2}\) for modern codes), quantum computations of arbitrary length can be performed reliably by using quantum error correction with only a polylogarithmic overhead. The exponential suppression \((p/p_{\text{th}})^{2^L}\) comes from concatenation: at each level of the code hierarchy, the logical error rate squares, driving it towards zero.
- Only nearest-neighbour gates are needed (2D layout is hardware-friendly).
- The high threshold (\(\sim 1\%\)) is achievable with current superconducting qubit technology.
- Error syndromes can be measured repeatedly without collapsing the logical qubit.
8.4 Approximate Quantum Error Correction and the Knill-Laflamme Conditions Revisited
The exact Knill-Laflamme conditions require the error syndrome to carry zero information about the logical codeword. In practice, this is often relaxed to approximate error correction — tolerating a small fidelity loss.
for all \(|u\rangle,|v\rangle \in \mathcal{C}\). The Petz recovery map \(\mathcal{P}_{\sigma,\Phi}\) (with \(\sigma = P_\mathcal{C}/\dim\mathcal{C}\) the code-subspace maximally mixed state) provides the canonical approximate recovery.
With concatenated QEC, one can reduce this to \(\sim (c\theta^2)^{2^L}\) after \(L\) levels, exponentially suppressing the small-angle rotation error without needing to correct it exactly at the physical level.
Key Formulas Summary
Linear algebra:
- Vec tensor identity: \((A\otimes B)\operatorname{vec}(C) = \operatorname{vec}(ACB^T)\)
- Trace norm variational formula: \(\|A\|_1 = \max_{U\in\mathcal{U}}\operatorname{Re}\operatorname{Tr}(A^*U)\)
- Schatten duality: \(\|A\|_1 = \max\{\langle A,B\rangle : \|B\|_\infty\leq 1\}\)
Quantum states and channels:
- Choi isomorphism: \(\Phi\) is CP iff \(J(\Phi)\geq 0\); TP iff \(\operatorname{Tr}_\mathcal{Y}(J(\Phi)) = I_\mathcal{X}\)
- Kraus via Choi: \(J(\Phi) = \sum_k\operatorname{vec}(A_k)\operatorname{vec}(A_k)^*\)
- Heisenberg picture: \(\operatorname{Tr}(B\Phi(\rho)) = \operatorname{Tr}(\Phi^*(B)\rho)\)
Distance measures:
- Trace distance: \(\tfrac{1}{2}\|\rho-\sigma\|_1 = \max_{0\leq M\leq I}\operatorname{Tr}(M(\rho-\sigma))\)
- Fidelity SDP: \(F(P,Q) = \sup\{\tfrac{1}{2}\operatorname{Tr}(X)+\tfrac{1}{2}\operatorname{Tr}(X^*) : \begin{bmatrix}P&X\\X^*&Q\end{bmatrix}\geq 0\}\)
- Uhlmann's theorem: \(F(P,Q) = \max_{\text{purifications }|v\rangle\text{ of }Q}|\langle u|v\rangle|\)
- Alberti's theorem: \(F(P,Q)^2 = \inf_{Y>0}\langle P,Y^{-1}\rangle\langle Q,Y\rangle\)
- Fuchs-van de Graaf: \(1-\tfrac{1}{2}\|\rho-\sigma\|_1 \leq F(\rho,\sigma) \leq \sqrt{1-\tfrac{1}{4}\|\rho-\sigma\|_1^2}\)
Entropy and information:
- Von Neumann entropy: \(S(\rho) = -\operatorname{Tr}(\rho\log\rho)\)
- Subadditivity: \(S(AB)\leq S(A)+S(B)\); Strong subadditivity: \(S(XYZ)+S(Z)\leq S(XZ)+S(YZ)\)
- Araki-Lieb: \(|S(A)-S(B)|\leq S(AB)\)
- Klein's inequality: \(S(\rho\|\sigma) = \operatorname{Tr}[\rho(\log\rho-\log\sigma)]\geq 0\)
- Data processing: \(S(\Phi(\rho)\|\Phi(\sigma))\leq S(\rho\|\sigma)\)
- Holevo information: \(\chi(\mathcal{E}) = S(\sum_xp_x\rho_x) - \sum_xp_xS(\rho_x)\geq I_{\text{acc}}(\mathcal{E})\)
SDP duality:
- Weak duality: primal \(d\leq\beta\) dual always holds
- Slater's condition: strict feasibility (primal or dual) closes the gap and guarantees attainment
- Complementary slackness: at optimality with \(d=\beta\), \(\langle X,S\rangle = 0\), i.e., \(XS=0\)
- Diamond norm SDP: \(\|\Phi\|_\diamond = \max_{\rho\in D(\mathcal{X}\otimes\mathcal{X})}\|(\Phi\otimes I)(\rho)\|_1\)
Channel capacities:
- Classical capacity (HSW): \(C = \lim_n\tfrac{1}{n}\chi(\Phi^{\otimes n})\)
- Quantum capacity (LSD): \(Q = \lim_n\tfrac{1}{n}\max_\rho I_c(\rho,\Phi^{\otimes n})\)
- Entanglement-assisted: \(C_E = \max_\rho I(A:B)_\rho\) (single-letter, no regularisation)
- Hierarchy: \(Q\leq P\leq C\leq C_E\)
Entanglement:
- Separability: \(\rho\in\operatorname{Sep}\) iff \((\Phi\otimes I)(\rho)\geq 0\) for all positive maps \(\Phi\) (Horodecki)
- PPT is necessary for separability; sufficient in \(2\otimes 2\) and \(2\otimes 3\)
- Nielsen's theorem: LOCC maps \(|x\rangle\to|y\rangle\) iff \(\lambda(x)\prec\lambda(y)\) (majorisation)
- Monogamy (CKW): \(\tau(A:B)+\tau(A:C)\leq\tau(A:BC)\) for three-qubit states
Quantum error correction:
- Knill-Laflamme: errors \(\{E_j\}\) correctable iff \(\langle u|E_i^*E_j|v\rangle = c_{ij}\langle u|v\rangle\) for all \(|u\rangle,|v\rangle\in\mathcal{C}\)
- Quantum Singleton bound: \(n-k\geq 2(d-1)\)
- Fault-tolerance threshold: below \(p_{\text{th}}\approx 10^{-2}\), logical error rate \(\to 0\) exponentially with concatenation depth