CS 458/658: Computer Security and Privacy
N. Asokan, Miti Mazmudar
Estimated study time: 52 minutes
Table of contents
Note: This document covers Modules 1 and 2 only. Modules 3–7 (OS Security, Network Security, Crypto & Internet Security, Database Security, Policy Issues) are not included in this export.
Module 1: Introduction to Computer Security and Privacy
What Is Our Goal?
The primary goal of a course in computer security and privacy is to develop the ability to identify security and privacy issues as they arise across the full spectrum of computing: in the software programs themselves, in the operating systems that host those programs, in the computer networks over which they communicate, in Internet applications (which are instances of distributed systems), and in databases that store the data those applications use. A secondary goal is to take that identification ability and apply it constructively — to design systems that are more protective of security and privacy from the outset.
What Is Security?
In the context of computers, security is traditionally defined by three properties, often remembered by the acronym CIA:
- Confidentiality — sensitive information is not disclosed to anyone who is not authorized to learn it.
- Integrity — data and systems remain in their intended state; what you receive is what was actually sent, unmodified.
- Availability — data and systems are accessible when needed; denial-of-service attacks that prevent users from reaching a web service violate availability.
A computing system is said to be secure when it satisfies all three properties simultaneously, though in practice availability is the hardest to guarantee in the face of accidental or malicious failures. Notably, availability violations are often immediately apparent to users — a website that is down under a denial-of-service attack is unmistakably broken — whereas confidentiality and integrity violations may go undetected for long periods.
Security bears a close relationship to reliability as understood in engineering. A secure system is one that you can rely on to keep sensitive data confidential, to permit only authorized access or modifications to resources, to ensure the correctness of computed results, and to be available whenever you need it. These properties map directly onto the CIA triad.
What Is Privacy?
Privacy has many definitions, but a useful one frames it as informational self-determination: you, as the subject of information about yourself, have the right to control how that information is accessed and used. “Control” encompasses who gets to see your data, who gets to use it, what they can use it for, and who they can pass it on to.
Privacy is related to confidentiality but is not identical to it. Confidentiality is a technical property of a system — information is or is not accessible to a given party. Privacy is a broader concept that encompasses consent, purpose limitation, and the individual’s agency over their own information.
PIPEDA and Canadian Privacy Law
PIPEDA (the Personal Information Protection and Electronic Documents Act) is Canada’s private-sector privacy legislation and specifies ten Fair Information Principles that organizations must follow when handling personal data:
- Identify the purpose of data collection
- Obtain consent
- Limit collection
- Limit use, disclosure, and retention
- Use appropriate safeguards
- Give individuals access
- Be accurate
- Be open
- Be accountable
- Provide recourse
As of the Spring 2021 course, Canada’s Parliament had passed Bill C-11 paving the way for the Consumer Privacy Protection Act (CPPA), which was expected to replace the personal-information-protection portion of PIPEDA. CPPA was designed to modernize the framework by incorporating ideas like meaningful consent (explanations in plain language), the right to erasure (allowing individuals to withdraw consent and require deletion of their data), stronger enforcement penalties, and a private right of action — granting individuals the ability to sue organizations that breach the act’s provisions. Internationally, the European GDPR (General Data Protection Regulation) provides a similar framework for any enterprise operating in the European Economic Area.
Other notable frameworks include Ontario’s PHIPA (Personal Health Information Protection Act) and the US HIPAA (Health Insurance Portability and Accountability Act), both of which govern security and privacy of health-related information.
Security Versus Privacy
A common misconception treats security and privacy as opposing forces — surveillance cameras improve physical security but invade privacy; contact tracing for infectious diseases is medically valuable but can be deeply privacy-invasive. The course argues that this tension is not inevitable. It is possible to design systems that achieve both properties simultaneously. Privacy-preserving contact tracing using Bluetooth (as implemented in Canada’s Covid Alert app and the Google/Apple Exposure Notification API) demonstrates that thoughtful design can satisfy both public-health and privacy requirements.
Similarly, authentication and access control can be designed with privacy in mind. A concert that authenticates attendees by verifying their legal identity at the gate learns everyone’s names — but the same admission problem can be solved by issuing anonymous tokens or QR codes that prove authorization without revealing identity.
Who Are the Adversaries?
To design effective defenses, you must first identify who you are trying to protect against. The course introduces the term adversary (used interchangeably with attacker) and distinguishes several classes:
- Murphy — not a sentient adversary but a stand-in for accidental and random failures. Murphy’s Axiom: anything that can go wrong, will go wrong.
- Amateurs — non-experts rattling doorknobs, probing systems out of curiosity or mischief.
- Script kiddies — non-experts who use pre-packaged attack tools (“scripts”) produced by expert crackers.
- Crackers — technical experts capable of breaking into systems and producing the tools that script kiddies use.
- Organized crime — well-funded, expert, profit-motivated actors.
- Government “cyberwarriors” / state actors — expert, well-funded, motivated by strategic national interests rather than profit. Revelations from figures like Edward Snowden demonstrated that state-level actors can be at least as large a threat as organized crime.
- Terrorists — rare but high-impact actors.
The key insight is captured in a single imperative that Prof. Asokan repeats throughout the course: learn to think like an attacker. You cannot defend a system you do not understand from an adversary’s perspective.
Assets, Vulnerabilities, Threats, Attacks, and Controls
These five terms form the core vocabulary of security analysis.
Assets are what you are trying to protect: hardware, software, and data — or anything of value to the system and its users.
Vulnerabilities are weaknesses in a system that an adversary could exploit to compromise an asset. A file server that does not authenticate access requests before granting them is a vulnerability.
Threats are the potential harms that could befall a system through exploitation of a vulnerability. The course identifies four major categories of threats:
| Category | CIA Property Violated | Example |
|---|---|---|
| Interception | Confidentiality | An eavesdropper reads network traffic |
| Interruption | Availability | A denial-of-service attack brings down a service |
| Modification | Integrity | An attacker alters a message in transit |
| Fabrication | Integrity | An attacker injects a forged message |
When analyzing or designing a system, you must articulate the threat model: which threats are you committing to defend against, and from which adversaries? “Whom do we want to prevent from doing what?” is the central question.
Attacks are the concrete actions by which an adversary exploits a vulnerability to realize a threat. Telling the file server you are a different user in order to read another user’s files is an attack against the authentication vulnerability.
Controls (or defenses) are technical or procedural measures that reduce or eliminate a vulnerability, thereby blocking the corresponding attack and preventing the threat from materializing. The goal of security engineering is to design controls that address the identified threats.
Methods of Defence
When a threat has been identified, there are five general strategies for dealing with it:
- Prevent — deploy technical means to stop the attack from happening entirely.
- Deter — make the attack harder or more expensive so the adversary loses the incentive to attempt it.
- Deflect — make yourself less attractive as a target; reduce the asset’s apparent value to the adversary.
- Detect — instrument the system to notice when an attack is occurring or has occurred, enabling an alarm or audit trail.
- Recover — arrange to mitigate the effects of a successful attack through backups, insurance, redundancy, or other means.
Good security practice typically deploys multiple strategies against any given threat, a philosophy known as defence in depth. Consider the threat of car theft: an immobilizer prevents the attack outright; a visible security sticker or parking in a secured facility deters or deflects; a car alarm detects; and insurance enables recovery. No single measure is sufficient, and the combination substantially raises the cost of a successful attack.
The Principle of Easiest Penetration
A system is only as strong as its weakest link. An attacker will never target the most heavily defended component when a less-protected path to the same asset exists. Defending against attacks on a database’s technical controls is futile if employees can be bribed or social-engineered. Security is not purely a technical discipline; it has legal, organizational, and psychological dimensions as well.
The Principle of Adequate Protection
Security is economics. It makes no sense to spend $100,000 protecting an asset that can cause at most $1,000 in damage. The cost of a control is not only monetary — requiring users to disconnect their computers from the Internet would dramatically improve security but at an intolerable cost to utility. Security must not harm utility; protection must be proportionate to the value of what is being protected.
Defence Mechanisms
Computer systems can be protected through five broad categories of mechanisms:
Cryptography
Cryptographic algorithms underpin many security controls. Encryption transforms data into an unreadable form for any party without the decryption key, providing confidentiality. Digital signatures authenticate the identity of a signer and ensure that signed data has not been tampered with. Message Authentication Codes (MACs) — essentially cryptographic checksums — verify the integrity of messages in transit, detecting any unauthorized modification. Cryptographic techniques also support privacy by allowing personal data to be stored in a form that automatically becomes unreadable after a specified retention period.
Software Controls
Passwords and other access-control mechanisms restrict who can use a system. Operating systems enforce isolation between users, preventing one user’s process from reading or modifying another’s data. Anomaly detection systems, including virus scanners and intrusion detection systems, watch for patterns that deviate from expected behavior. Development controls, such as code auditing and mandatory review processes, enforce quality standards on source code before it is deployed. Personal firewalls running on end-user machines provide another layer of software-based protection.
Hardware Controls
Fingerprint readers and face recognition systems provide biometric authentication, replacing or supplementing passwords with a factor derived from the user’s physical characteristics. Smart tokens — small hardware devices or smartphone apps that generate one-time codes — add a second factor to authentication, so that knowing a password is not sufficient on its own. Hardware firewalls and intrusion detection systems deployed at network perimeters inspect and filter traffic before it reaches internal systems. Trusted Execution Environments (TEEs), now widely deployed in smartphones and increasingly in laptop processors, allow sensitive computations to run in a hardened enclave isolated from the operating system and other applications.
Physical Controls
Hardware itself must be protected: locks on server rooms, security guards, and physical access controls prevent unauthorized parties from reaching the machines. Off-site backups ensure that data survives physical disasters. Common sense also applies: data centers should not be placed on earthquake fault lines, and nuclear facilities should not be sited in tsunami zones.
Policies and Procedures
Non-technical measures address classes of attack that technical controls cannot. Rules prohibiting employees from connecting personal Wi-Fi access points to corporate networks prevent accidental exposure. Password-choice policies enforce a baseline of credential strength. Training programs instill security awareness, reducing the risk of social engineering. The tension between strict security policies and operational flexibility (for example, the shift to “bring your own device” policies during the COVID-19 pandemic) is a recurring theme: policies must be balanced against practical needs.
Module 2: Program Security
Why Program Security Is Hard
Murphy’s Axiom — anything that can go wrong, will go wrong — applies with particular force to software. Programs have bugs; security-relevant programs have security bugs. The challenge of program security is to understand where these bugs come from, how attackers exploit them, and what controls can mitigate the risk.
Part 1 — Flaws, Faults, and Failures
A flaw is any defect in a program. A security flaw is a defect that affects one or more of the CIA properties. Flaws come in two distinct varieties:
- A fault is the underlying mistake — an error in the code, the data, the specification, or the development process. A fault is a potential problem, the programmer’s view of what went wrong “behind the scenes.”
- A failure is when something actually goes wrong from the user’s perspective — a deviation from desired or expected behavior. You log in to the library’s website and land in someone else’s account. The specification may itself be wrong, meaning a program can conform perfectly to its spec and still produce a failure.
The relationship is causal: faults can lead to failures, but many faults never manifest as user-observable failures.
Finding and Fixing Faults
When a user encounters a failure, the natural response is to work backwards to find the fault that caused it — this is debugging. But faults and failures can be far apart in a codebase; a login failure might originate in a memory allocation routine. More importantly, you should not wait for users to encounter failures in production. Testing — systematically exercising the system with different inputs to provoke potential failures — is the proactive approach. Security testing goes beyond functional testing by requiring the tester to think like an attacker: try the inputs that a hostile user would try, not just the well-formed inputs a cooperative user would provide. Fuzzing — submitting random or semi-random inputs to see if the program misbehaves — is one form of security-oriented testing.
Once faults are found, they are fixed by patching — small edits to the program that address the identified problem. Microsoft’s “Patch Tuesday” is a well-known instance of regular patch releases. Unfortunately, patching has significant drawbacks. The time required to distribute and apply patches to all affected systems creates a window during which the vulnerability remains exposed. Worse, attackers who analyze a patch can infer the underlying fault and construct exploits that work against unpatched systems — hence the phenomenon of “Exploit Wednesdays.” Patches applied under time pressure can also introduce new faults or fix only the superficial symptom while leaving the underlying vulnerability intact.
Alternatives to ad-hoc patching include code reviews, red team exercises (where ethical hackers actively try to break the system), and formal verification — mathematically proving that the program satisfies its security specification. Formal proofs provide the strongest possible guarantee but are expensive, time-consuming, and only feasible for relatively simple programs or in domains where the stakes justify the effort. In practice, regression testing — re-running the full test suite after every patch — is the most common way to ensure that fixes do not break existing functionality.
Unexpected Behavior
A program’s specification describes what it should do; most implementors would not care if it also does additional things. From a security perspective, this is dangerous. A malicious implementation of the Unix ls command that lists directory contents as specified but also posts that list to a public website violates confidentiality. Or it might delete the files afterward, violating availability. Security evaluation must therefore ask not only “does the program do what it’s supposed to?” but also “does it do anything else?” Formal proofs that a program does exactly what its specification says and nothing more are the gold standard — and the hardest to achieve.
A Taxonomy of Flaws
Security flaws can be classified along several dimensions:
| Dimension | Categories |
|---|---|
| Origin | Intentional vs. Unintentional |
| Intent | Malicious vs. Non-malicious |
| Scope | General vs. Targeted |
Unintentional flaws — the largest category — are mistakes the developer made without hostile intent: buffer overflows, integer overflows, format string errors, incomplete input validation, and race conditions. Intentional non-malicious flaws are deliberate design choices that can nonetheless be exploited: side channels and debugging backdoors that were never removed before shipping. Intentional malicious flaws are inserted by attackers and subdivide into general (affecting all instances: intentional backdoors, viruses, worms) and targeted (activating only under specific conditions: Trojan horses, logic bombs, keyloggers).
Part 2 — Unintentional Security Flaws
Opening Examples
Two real-world bugs illustrate the breadth of unintentional flaws before the detailed taxonomy begins.
Heartbleed (2014) was a bug in OpenSSL, the widely-used open-source implementation of TLS (Transport Layer Security). TLS connections are expensive to set up, so TLS includes a heartbeat mechanism: one peer sends some random data along with a declared payload length, and the other echoes the same data back. The Heartbleed bug arose because the server failed to verify that the declared payload length matched the actual length of the data. A malicious client could declare a length of 64 KB while sending only a few bytes; the server would then copy up to 64 KB of its own memory — including data from other clients’ sessions and the server’s private cryptographic keys — back to the attacker. The fix was a single bounds check.
Apple’s “goto fail” bug (2014) appeared in the TLS client code that verifies a server’s certificate. Due to a duplicated goto fail; statement, all certificate-verification checks after the second one were silently skipped — the function returned a success status without having actually verified the signature. An attacker performing a man-in-the-middle interception could therefore substitute their own (invalid) certificate and the client would accept it as genuine.
Buffer Overflows
Buffer overflows are the single most commonly exploited class of security flaw, arising from the fact that C and C++ perform no automatic bounds checking on array accesses.
Memory Layout of a Process
To understand buffer overflows, one must first understand how a process’s memory is organized. From low addresses to high addresses:
- Text (program code, read-only)
- Data (initialized global variables and constants)
- BSS (uninitialized global data)
- Heap (dynamically allocated memory from
malloc; grows upward) - Stack (function call frames; starts at high addresses and grows downward)
The Stack Frame
Every function invocation creates a stack frame containing the function’s local variables, a saved copy of the caller’s frame pointer (also called the base pointer, rbp in x86-64), and the return address — the value the instruction pointer (rip) should have when the function returns to its caller. The call instruction automatically pushes the return address; the function prologue then saves the old frame pointer and adjusts the stack pointer to make room for local variables.
The critical layout point is: local buffers live on the stack below (at lower addresses than) the saved frame pointer and the return address. When a buffer grows toward higher addresses (as C arrays do), an overflow first clobbers other local variables, then the saved frame pointer, and finally the return address.
Exploiting a Buffer Overflow
The classic technique, described in Aleph One’s mandatory reading “Smashing the Stack for Fun and Profit” (1996), involves two steps: (1) place the attacker’s shellcode — typically a short sequence of machine instructions that spawn a command shell — inside the overflowed buffer, and (2) overwrite the return address with the address of the shellcode. When the function returns, control transfers to the shellcode rather than to the legitimate caller.
Practical obstacles include: eliminating null bytes from the shellcode (so that string-handling functions do not truncate it prematurely), and determining the correct address to write as the return address (which varies by machine and compiler). The paper discusses techniques for overcoming both.
If the attacker does not wish to inject new code, they can instead overwrite the return address to point to an existing region of program memory they want to execute — a technique called return-to-libc. When Data Execution Prevention (Windows) and W⊕X (Linux) emerged in the early 2000s to prevent execution of stack data, attackers responded with Return-Oriented Programming (ROP): chaining together short instruction sequences (“gadgets”) already present in the program’s code — each ending with a ret instruction — to perform arbitrary computation without ever needing to inject new executable code.
Defenses Against Buffer Overflows
Several layers of defense exist:
- Use memory-safe languages (Java, Rust, Python): bounds are checked automatically. This is not always feasible for legacy code or performance-critical systems.
- Stack canaries: the compiler inserts a random value (the “canary”) between the local variables and the saved return address. Before a function returns, it checks that the canary is intact; a modified canary signals an overflow. The name alludes to canaries in coal mines: small birds that would be affected by toxic gases before miners were, providing an early warning.
- Data Execution Prevention (DEP) / W⊕X: memory pages are marked either writable or executable, but not both. The stack is writable, so it cannot be executed; injecting shellcode onto the stack and jumping to it fails.
- Address Space Layout Randomization (ASLR): the loader places the stack, heap, and code at random addresses each time a process starts, making it difficult to write a reliable exploit with hardcoded addresses.
- ARM Pointer Authentication: the ARMv8.3 architecture allows a processor-held secret key to be used to compute a small cryptographic MAC over every pointer (including return addresses) before it is stored in memory. When the pointer is later read, the MAC is verified; an attacker who overwrites a return address cannot compute a valid MAC for the replacement value without knowing the key.
Integer Overflows
Integer overflows stem from the finite range of machine integer types. A 64-bit unsigned integer can represent values from 0 to 2⁶⁴ − 1; incrementing past the maximum wraps back to 0. A signed 8-bit integer spans −128 to 127; adding 1 to 127 yields −128 in two’s complement. Similar wrapping occurs with underflows and with type-narrowing casts (for example, assigning a 32-bit int to a signed char).
Attackers exploit integer overflows in several ways. An overflow in a variable used as an array index causes the program to access a memory location far from the intended one, potentially reading or writing a sensitive location such as a return address. An overflow in a value passed to malloc() can result in allocating far less memory than intended, setting up a subsequent heap buffer overflow.
The mandatory reading “Basic Integer Overflows” and the code example in int-overflow-attack.c demonstrate a concrete case: a function printchar() that tries to prevent access to the first two array elements by checking that the index is non-negative, but accepts the index as a signed char. Passing 255 from a caller that holds an int triggers an implicit narrowing conversion: 255 interpreted as a signed 8-bit value is −1, which when offset by 2 addresses element 1 — exactly the protected secret element.
Format String Errors
C’s family of printf-style output functions (printf, fprintf, sprintf, etc.) accept a format string as their first argument and any number of subsequent arguments to be formatted. Directives in the format string (%s, %x, %d, %n, etc.) consume those arguments. If a program passes user-supplied input as the format string without a separate format specifier — printf(buffer) instead of printf("%s", buffer) — the user can inject format directives that will be interpreted by the function.
The consequences are severe. %x and %s directives cause printf to read arbitrary values off the stack, leaking memory contents. The %n directive writes the count of characters printed so far to an address on the stack, enabling controlled writes to memory. Together, these allow an attacker to read sensitive data (including cryptographic keys or passwords) and to write to arbitrary memory locations, often leading to arbitrary code execution.
A subtler instance is passing a format string but omitting arguments; printf will consume whatever happens to be on the stack, which may include sensitive data.
Incomplete Mediation
Mediation is the process of checking that input conforms to expected structure before using it. Incomplete mediation occurs when a program accepts malformed or out-of-range input without validation.
The classic consequence is SQL injection. If a web application constructs a SQL query by directly concatenating user input — for example, building SELECT name FROM users WHERE dob = '<user_input>' — an attacker can supply a carefully crafted date string containing SQL code. Entering ' OR 1=1; DROP TABLE users; -- as a date of birth would result in the entire user table being deleted. The defense is prepared queries (parameterized statements): the SQL code is sent to the database engine separately from the user-supplied data, so data can never be interpreted as code.
Cross-Site Scripting (XSS) is a related attack against web applications. If a server stores user-supplied content and later includes it verbatim in pages served to other users, a malicious user can inject JavaScript into the stored content. When other users’ browsers render the page, the injected script executes in their browser context, potentially exfiltrating session cookies or performing actions on behalf of the victim. A stored XSS attack is when the injected code is permanently saved by the server.
Client-side mediation — validating input with JavaScript on the client before submission — is useful for user experience but not for security, because an attacker can bypass the client entirely and send requests directly to the server. Moreover, users can disable JavaScript or modify client-side state (cookies, hidden form fields) using developer tools or crafted HTTP requests. The server must always perform its own validation. Server-side state integrity can be protected with cryptographic MACs, ensuring that state returned from the client has not been tampered with.
Time-of-Check to Time-of-Use (TOCTTOU)
A TOCTTOU vulnerability (also written TOCTOU) is a race condition that arises when a program checks whether an operation is permitted at one point in time and performs the operation at a later point, with a gap between the two. If an attacker can modify the relevant state during that gap, they can defeat the access check.
A historical example involves xterm, a Unix terminal emulator that runs with root privileges (setuid) and supports writing session output to a log file specified by the user. The sequence is: (1) xterm checks whether the user has permission to write to the log file, then (2) opens the file for writing. An attacker can create the log file as a symbolic link pointing to a file they own (so the permission check passes), then — between steps 1 and 2 — redirect the symlink to point to /etc/passwd or another privileged file. If the attacker wins the race, xterm will open the privileged file for writing with root permissions.
Defenses include using file descriptors rather than file names (check permissions on an already-open file descriptor, not on a path), using file locks to make check-and-use atomic, and designing APIs to eliminate the gap between check and use wherever possible.
Part 3 — Malicious Code: Malware
Malware (short for malicious software) is software written with the express purpose of causing harm. To have its effect, malware must execute — either because a user explicitly runs it (clicking an email attachment, downloading and running software) or because it exploits a vulnerability to execute itself without user action.
Viruses
A virus is malicious code that attaches itself to a host program or document. When the host is executed, the virus runs first, looks for additional files to infect, and then transfers control to the original host, allowing the host to appear to behave normally. Modern viruses target not just executable binaries but also documents that support macro execution (Word, Excel, PDF) — essentially any execution platform.
Viruses spread through email attachments, shared network drives, and social network links. A virus’s payload — the code that performs the actual malicious action — can cause local harm (vandalism, data deletion, keylogging to steal credentials, ransomware) or harm to the world (contributing the infected machine to a distributed denial-of-service attack or spam campaign).
Detecting Viruses
Detecting malware is theoretically an undecidable problem: one can prove by contradiction that no perfect virus detector exists. Assume a perfect detector V; construct a program P* that runs V on itself and exits harmlessly if V flags it as a virus, but infects a new target if V says it is clean. This program contradicts V’s correctness in both cases. In practice, however, detection is highly useful even if imperfect; it is simply an arms race.
Two broad detection strategies exist:
Signature-based detection maintains a database of known malware signatures (code samples, checksums, or other identifying attributes) and matches incoming files against them. It produces zero false positives for known malware but cannot detect zero-day malware (exploiting previously unknown vulnerabilities). Attackers defeat it with polymorphic viruses that mutate their code on each infection — encrypting with a random key, substituting equivalent instructions, inserting no-op padding — while preserving their payload’s effect.
Behavior-based (anomaly) detection models expected behavior and flags deviations. It can detect novel malware but is susceptible to false positives, which are not merely a nuisance: users who are inundated with warnings quickly become habituated to ignoring them, negating the security benefit.
Both strategies are subject to the base rate fallacy. Even a detector with a low false-positive rate (say 5%) will produce more false alarms than true detections when the base rate of infected files is very low (say 1 in 1,000). Applying Bayes’ theorem: with those numbers, fewer than 2% of files flagged as infected are actually infected.
Worms
A worm differs from a virus in that it propagates autonomously, without requiring user interaction. It exploits vulnerabilities in remote systems to install copies of itself, then searches for additional victims.
The Morris Worm (1988) was the first worm to have widespread impact on the Internet, infecting roughly 10% of the ~6,000 machines then connected. Robert Morris (then a Cornell student, now an MIT professor) designed it to spread via three mechanisms: a buffer overflow in the Unix fingerd daemon, a backdoor left in the sendmail mail server, and a dictionary attack against hashed Unix passwords. The worm had a bug that caused infected machines to be re-infected repeatedly, leading to severe performance degradation.
Code Red (2001) exploited a buffer overflow in Microsoft IIS (Internet Information Services) and spread to a large fraction of vulnerable hosts within half a day.
Slammer (2003), nicknamed the Warhol worm (“fifteen milliseconds of fame”), infected 90% of vulnerable hosts within ten minutes, demonstrating how fast autonomous propagation can saturate a network.
Conficker infected millions of machines and assembled them into a botnet — an army of compromised hosts (bots, short for robots) coordinated by a command-and-control (C&C) infrastructure. Botnets can be directed at spam campaigns, DDoS attacks, or any other coordinated task. Conficker was ultimately thwarted by setting up sinkholes: because the worm was known to make DNS queries for certain domains, defenders registered those domains and directed their DNS responses to sinkhole IP addresses monitored by security researchers, allowing them to identify and clean infected machines.
Stuxnet (2010) represented a qualitative advance: a highly sophisticated, targeted worm attributed to US and Israeli intelligence services and aimed at SCADA (Supervisory Control and Data Acquisition) systems controlling Iranian nuclear centrifuges. Stuxnet used multiple zero-day exploits, spread via USB drives to reach air-gapped systems (networks not connected to the Internet), and sabotaged centrifuges while hiding its activity from operators.
Mirai targeted Linux-based IoT (Internet of Things) devices — smart kettles, cameras, routers — that were secured only by factory-default passwords. IoT devices are often built by manufacturers with no security expertise, sold at commodity prices with minimal resources, and rarely updated. Mirai assembled a massive botnet and used it to attack the web server of security journalist Brian Krebs and the DNS provider Dyn, making major services including Netflix temporarily unavailable to millions.
Trojans and Ransomware
A Trojan horse (named after the Greek myth) is software that appears to serve an innocent purpose while hiding malicious code inside. Unlike viruses and worms, Trojans do not spread autonomously; they rely on the victim choosing to install them. Scareware is a prominent form: a web page declares that the user’s system is infected and offers a free scan — the scan itself is the Trojan.
Ransomware is malware that encrypts the victim’s files using the attacker’s public key and demands payment (typically in cryptocurrency) in exchange for the private key needed to decrypt them. The concept was theorized by researchers in the 1990s and became a major threat in the 2010s. WannaCry (2017) was particularly notable: it spread using a Windows vulnerability that the NSA had previously weaponized for its own offensive operations, and which was stolen and published by the Shadow Brokers hacking group. A researcher discovered that WannaCry checked whether a specific domain existed before encrypting files — apparently an attacker-controlled kill switch — and neutralized it by registering the domain. WannaCry infected hospitals and critical infrastructure worldwide.
The disclosure of the NSA’s stockpile of vulnerabilities by the Shadow Brokers illustrates a deep ethical tension in security: intelligence agencies argue that hoarding zero-day vulnerabilities enables offensive operations against adversaries, but the WannaCry episode demonstrated that such hoards can be stolen and turned against the general public.
Logic Bombs
A logic bomb is malicious code that lies dormant until a specific trigger condition is met. A time bomb activates on a particular date (for example, on Christmas Day). Other triggers might be a specific sequence of inputs or the presence or absence of a particular file. Because logic bombs only manifest their payload when the trigger fires, routine testing is unlikely to discover them — the program appears completely normal the rest of the time. Logic bombs are typically planted by insiders with access to the system.
Part 4 — Other Malicious Code
Web Bugs
A web bug (also called a web beacon) is a tiny embedded object — often a 1×1 pixel transparent image — included in a web page or email. When a user’s browser renders the page, it fetches the web bug from a third-party server, revealing the user’s IP address, installing and retrieving cookies, and disclosing the page’s URL (via the HTTP Referer header). Because the web bug is invisible, users have no knowledge that this is occurring.
If multiple websites embed web bugs from the same third-party server (typically an advertising aggregator), that server can build a detailed profile of a user’s browsing history across many sites. Browsing history is highly sensitive: research has shown that it correlates with gender, financial status, political beliefs, and religious affiliation. Similar data collection occurs in mobile apps; Zoom was found to send data to Facebook’s Graph API even for users without Facebook accounts, and the Ring doorbell app exhibited comparable behavior.
Browser privacy extensions such as Privacy Badger can detect and block web bugs. Modern browsers like Firefox enable several such protections by default.
Backdoors
A backdoor is a mechanism for accessing a system or resource without going through normal authentication or access-control checks. Backdoors arise for several reasons:
- Developer convenience: a developer hardwires a secret account or password to ease debugging. If not removed before shipping, it becomes an exploitable vulnerability. A D-Link router vulnerability discovered in 2013 granted unauthenticated access to any user who set their browser’s User-Agent string to a specific value — reversing the string revealed “edit by 04882 Joel’s backdoor.”
- Field service access: equipment manufacturers sometimes include backdoor credentials for service technicians.
- Lawful interception: telecommunications law in many jurisdictions requires that carriers be capable of providing law enforcement with access to communications upon receipt of a warrant — effectively mandating a backdoor.
- Malicious insertion: an attacker who compromises the development or build process can insert a backdoor that will be distributed with the legitimate software.
Ken Thompson’s 1984 Turing Award lecture, “Reflections on Trusting Trust” (mandatory reading), demonstrates that a backdoor can be inserted not in the source code of an application but in the compiler — so that auditing the source code, no matter how carefully, will never reveal it. The compiler silently injects the backdoor whenever it compiles certain programs.
An attempted backdoor in the Linux kernel was caught because Linux developers maintained two parallel version control repositories (Bitkeeper and CVS); a change appeared in CVS without a corresponding authorization record in Bitkeeper, triggering an investigation.
Salami Attacks
A salami attack skims small, potentially unnoticeable amounts from a large number of transactions, aggregating to a substantial total. A classic example is rounding credit card interest calculations down to the nearest cent and diverting the fractional cents to an attacker-controlled account — each individual theft is too small to notice on a statement, but summed across millions of accounts the total is significant. The name alludes to slicing salami: each thin slice seems trivial, but together they constitute the whole sausage.
Privilege Escalation
Privilege escalation occurs when an attacker gains more permissions than they were originally granted, breaking the principle of least privilege (each user or process should have only the minimum privileges necessary to perform its task). Buffer overflows in setuid programs — programs that run with elevated (root) privileges regardless of who invokes them — are a classic vehicle for privilege escalation.
The telnet -l option illustrates a logic-error based escalation: supplying a username beginning with -f caused telnet to invoke login with a flag that forced login to skip password verification for the specified user, granting shell access without credentials. The same class of bug recurred in a Solaris variant as recently as 2007.
Rootkits
A rootkit combines two capabilities: code to escalate privileges (typically to root) and code to hide the rootkit itself. Hiding is achieved by modifying system tools or the kernel: ps (list processes) and ls (list files) may be replaced or patched so that the rootkit’s processes and files are invisible. Log entries recording the initial compromise may be deleted; remote logging (sending log entries to a different machine immediately) can preserve an audit trail that the attacker cannot reach.
The Sony XCP rootkit (2005) was the most infamous example: Sony embedded a rootkit on every audio CD they sold at the time, installing it silently when the disc was inserted into a Windows PC. The intent was to prevent CD ripping, but the rootkit behaved like any other rootkit — hiding itself and creating vulnerabilities that other parties exploited to cheat in online games. After security researcher Mark Russinovich exposed it, Sony faced class action suits and a massive recall; even their subsequent uninstaller left further vulnerabilities behind.
Keystroke Logging
A keystroke logger captures everything typed on a keyboard — passwords, email content, personal messages — and either stores it locally or exfiltrates it. Keyboard drivers process every keystroke before any application sees it, making them ideal interception points. Hardware keyloggers (physical devices inserted between the keyboard and the computer) are also used. Keyloggers are legitimately sold for parental monitoring but are also deployed by malware. Antivirus tools sometimes classify keylogging applications that are marketed for legitimate purposes as PUPs (Potentially Unwanted Programs) rather than outright malware.
Interface Illusions and Clickjacking
Users trust standard UI elements to behave consistently. An interface illusion exploits this expectation by manipulating the UI so that what the user sees is not what their action actually does. Clickjacking is a concrete form: an attacker overlays an invisible or misleading UI element over a legitimate control, so the user thinks they are clicking one thing but are actually clicking another.
The Conficker worm exemplified this: when an infected USB drive was inserted, the Windows AutoRun dialog was replaced by one that looked identical to the standard “Open folder to view files” option but actually invoked the worm’s installer. Subtle cues — “Publisher: Not Specified” — existed but were easy to overlook.
A general defense is the trusted path: certain UI elements, such as the browser chrome (the address bar, tab bar, and controls surrounding the web content area), cannot be controlled by the website or app. Code from a website can change the content area but not the browser chrome, so the URL and the HTTPS padlock in the address bar can be trusted. The tension between security and usability is acute on mobile: full-screen apps need the entire display, but reserving pixels for trusted path indicators is unacceptable to users and developers alike. One compromise is a Secure Attention Key — a key combination (like Ctrl-Alt-Delete on Windows) that always invokes the trusted operating environment regardless of what is on screen.
Phishing
Phishing is perhaps the most widespread form of interface illusion: creating a fake website that closely mimics a legitimate one (a bank, an email provider, a retailer) and luring users into entering their credentials there. Unicode support in domain names has made phishing more sophisticated — the Cyrillic letter “о” is visually indistinguishable from the Latin “o,” allowing an attacker to register a domain that looks exactly like amazon.com (or amazon.ca) to the human eye but is a completely different domain.
The traditional defense of requiring HTTPS (and the associated certificate cost) as a proxy for legitimacy has been undermined by Let’s Encrypt, which provides TLS certificates to anyone at no cost. The padlock in the browser is no longer evidence that you are on the genuine website.
Phishing URLs are commonly distributed via email or social media. Blacklists maintained by antivirus vendors and browsers identify known phishing sites, but sophisticated phishing sites can detect web crawlers (by checking IP address, User-Agent, or request timing) and serve benign pages to them while serving phishing pages to humans. Client-side phishing detection — analyzing the page in real time inside the browser rather than relying on a pre-computed blacklist — offers a more robust approach.
Man-in-the-Middle Attacks
A man-in-the-middle (MitM) attack inserts the attacker between two communicating parties. The attacker can intercept, read, modify, and forge traffic in both directions, while each party believes they are communicating directly with the other. A MitM sitting between a user’s browser and their bank can allow legitimate transactions to succeed (so the user notices nothing amiss) while injecting unauthorized transactions on the side.
Defense against MitM relies on authentication of the remote party: verifying that you are actually talking to the intended server, not an impersonator. Widespread deployment of TLS, aided by Let’s Encrypt, makes MitM attacks harder because the attacker cannot easily obtain a valid certificate for a domain they do not control. However, this protection only holds if users pay attention to TLS indicators (the padlock icon, certificate warnings) and if certificate authorities are not themselves compromised.
Part 5 — Non-malicious Flaws: Covert and Side Channels
Non-malicious flaws are not bugs in the conventional sense but rather inherent properties of systems that an attacker can exploit to leak sensitive information without going through any intentional communication channel.
Covert Channels
A covert channel is an unauthorized transfer of information through a channel not intended for communication. The attacker controls both endpoints: one process encodes secret information into observable but apparently innocuous behavior (the formatting of a report, the exact count of network connections, the size of an output file), and another process controlled by the attacker decodes the information from that observable behavior.
Steganography is a classic example: a JPEG image of the Mona Lisa can have secret data encoded in the least-significant bits of its pixels, invisible to the eye but decodable by someone who knows how to look. Covert channels typically have low bandwidth — hiding information from a watchful defender necessarily limits how much can be transmitted — but they are extremely difficult to eliminate entirely.
Side Channels
A side channel differs from a covert channel in that the attacker does not control the encoding end. Instead, the attacker observes some side effect of the victim’s computation — information that leaks unintentionally through channels the system designers did not consider.
Reflection Side Channels
A user typing a password in public while concealing their screen may not notice that the screen’s display is reflected in their glasses, in the surface of their eyes, or in a nearby shiny object such as a kettle. An attacker with a telephoto camera or telescope can, at distances of 10–30 meters, reconstruct what is displayed. Defenses include wearing non-reflective lenses and avoiding entering sensitive credentials in public spaces.
Cache Timing Side Channels
Modern processors use a hierarchy of memory caches to bridge the speed gap between the processor and main memory. Level-1 cache is on-chip; Level-3 is typically shared across all cores. Caches are micro-architectural: programs do not see them in their memory model, but they affect execution timing. A cache hit (data already in cache) is fast; a cache miss (data must be fetched from main memory) is slow.
Because caches are shared between processes running on the same core or the same chip, one process’s memory accesses can influence which data is cached, and another process can infer what data the first process used by timing its own memory accesses. Techniques like Flush+Reload (flush the cache, let the victim run, time a reload to see what the victim cached) and Prime+Probe (fill the cache with known data, let the victim run, probe to see what was evicted) exploit this sharing.
Speculative execution — where a processor predicts a branch outcome and executes down the predicted path before the branch condition is resolved — leaves micro-architectural traces even when the speculated instructions are architecturally rolled back. The Spectre and Meltdown attacks (2018) used these micro-architectural traces to read arbitrary memory, including memory belonging to other processes and the operating system kernel, from unprivileged user code.
Other side channels include bandwidth consumption (the size and timing of encrypted network packets can reveal which app generated them and what action was taken, even over an encrypted TLS connection) and timing attacks on cryptographic primitives (if a cryptographic function takes different amounts of time depending on the key value, an attacker can infer key bits by measuring execution time).
Part 6 — Controls Against Security Flaws
Having surveyed the landscape of security flaws, the final section addresses what can be done. Controls must be applied throughout the entire software development lifecycle, from initial specification through deployment and maintenance.
Design-Time Controls
Modularity
Large systems are decomposed into smaller, independently understandable modules. Small modules are easier to analyze, verify, test, and maintain. Modules should have minimal coupling — each module should be as independent as possible.
Encapsulation
Modules interact only through well-defined interfaces. The developer of module A need not know — and cannot depend on — the internal implementation of module B, only its public API. This limits the blast radius of a compromised or buggy module.
Information Hiding
Beyond encapsulation, internal state should not even be visible to other modules’ developers. This prevents reliance on implementation details and reduces the opportunities for both accidental and malicious exploitation of internal state.
Mutual Suspicion
Each module should validate all input received through its interfaces, even input from other modules within the same system. The rationale is that if an attacker compromises one module, they cannot easily use it as a stepping stone to compromise others, because every module independently validates its inputs. The cost is performance: every cross-module input requires validation overhead.
Confinement
A module that must rely on an untrusted component can run the untrusted component in a sandboxed or confined environment — a virtual machine, container, or restricted execution environment. If the sandboxed component is malicious or compromised, the damage it can cause is limited to what is accessible within the sandbox. The trade-off is performance and functionality: sandboxing adds overhead and may restrict what the confined code can do.
Implementation-Time Controls
Static Analysis
Static analysis tools examine source or binary code without executing it, identifying potentially dangerous patterns such as unchecked buffer lengths, potentially-null pointer dereferences, or unvalidated format strings. Static analysis is valuable precisely because it finds potential vulnerabilities before the code ships, but it is subject to the same false-positive / false-negative trade-off as virus scanners: too many false positives and developers stop reading the warnings; too few and dangerous code slips through.
Hardware Assistance
Processor manufacturers have been adding security-relevant features at the hardware level. ARM’s Pointer Authentication (already deployed on Apple Silicon) attaches a small cryptographic MAC to every pointer stored in memory; any attacker who overwrites a pointer with an arbitrary value will not know how to compute a valid MAC and will be detected when the pointer is used. Intel CET (Control-flow Enforcement Technology) provides shadow stacks and indirect-branch tracking to defeat ROP attacks.
Formal Methods
Formal verification — mathematically proving that a program satisfies its security specification — provides the strongest possible assurance. The seL4 microkernel is the most prominent example: formally verified in terms of functional correctness and security properties, at the cost of many person-years of effort. Formal proofs are only as good as the model on which they are based; when Spectre and Meltdown were discovered, seL4’s proofs had to be revisited because they did not account for micro-architectural side channels.
Diversity and Randomization
Just as genetic diversity in a population can limit the spread of a pathogen, software diversity limits the reusability of exploits. ASLR (Address Space Layout Randomization) places the stack, heap, and code regions at random addresses each time a process starts, so that a working exploit on one machine is unlikely to work on another. Randomization does not prevent exploitation but substantially raises the difficulty and cost.
Testing
Black-Box Testing
Given only an interface (function signatures, an API, a web form), try to provoke misbehavior without knowing the implementation. Security-aware black-box testing means thinking like an attacker: supply oversized inputs to probe for buffer overflows, supply negative indices, inject SQL or HTML, try null pointers.
Fuzzing
Fuzz testing submits random or semi-random inputs to the system and observes whether it crashes or misbehaves. Simple random fuzzing is less effective than guided fuzzing, which prioritizes inputs that maximize code coverage (exercising previously unseen functions or branches). Evolutionary fuzzing remembers mutations that were productive (reached new code) and biases future mutations toward them.
White-Box Testing
White-box testing uses full knowledge of the implementation — source code, design documents, architecture diagrams — to design targeted test cases. This knowledge enables regression test suites: comprehensive test sets that are re-run after every change to ensure that patches do not break existing functionality.
Change Management
Version Control
Source code control systems (Git, SVN, Bitkeeper) record every change to the codebase along with who made it and when. This creates an audit trail that can identify when a flaw was introduced and helps ensure accountability. It was version-control discrepancy detection that caught the attempted Linux kernel backdoor: a change appeared in CVS without a corresponding record in Bitkeeper.
Code Review
Code review is empirically among the most effective ways to find security flaws. “Given enough eyeballs, all bugs are shallow” (Linus’s Law) captures the intuition, though in practice the mere public availability of source code does not guarantee that it receives adequate scrutiny. Structured approaches include:
- Guided code review: the developer walks reviewers through each change, explaining the rationale and discussing possible interactions with other system components. Particularly effective for safety-critical systems.
- Easter-egg code review: the reviewer injects intentional flaws into the code before the review, so that reviewers know there are things to find even if they do not know what they are. This counteracts the tendency of reviewers to declare the code clean once they have found what they think are all the problems.
Standards and Audits
Large organizations codify their security practices in organizational standards that specify what controls must be applied and exactly how each control must be implemented — who reviews what, in how much detail, under what circumstances. International standards such as ISO 9001 provide frameworks for quality management processes.
Audits verify that the specified processes are actually being followed. An audit does not evaluate whether the processes are effective; it only checks conformance to the documented standard. The combination of standards (which define the expected behavior) and audits (which verify conformance) provides a systematic mechanism for maintaining security discipline across a large organization over time.
Summary
Defending against the full spectrum of security flaws requires controls at every stage of the software lifecycle:
| Stage | Representative Controls |
|---|---|
| Specification | Threat modeling, adversary identification |
| Design | Modularity, encapsulation, information hiding, mutual suspicion, confinement |
| Implementation | Memory-safe languages, static analysis, hardware security features, formal methods, ASLR |
| Testing | Black-box, fuzzing, white-box, regression testing |
| Deployment & maintenance | Version control, code review, patching, standards, audits |
The attacker’s asymmetric advantage — they need to find only one path through all of these defenses — means that defense in depth, deploying multiple overlapping controls, is the only viable strategy. No single measure is sufficient; the goal is to raise the attacker’s cost high enough that the expected return from an attack does not justify the investment.
Module 3: Operating System Security
This module was not included in the course export. Content to be added when source materials become available.
Topics covered (per course schedule, taught by Prof. Asokan):
- Access control models (DAC, MAC, RBAC)
- Unix/Linux security mechanisms (file permissions, setuid, capabilities)
- Windows security architecture
- Virtualization and sandboxing
- Trusted computing and TPMs
Module 4: Network Security
This module was not included in the course export. Content to be added when source materials become available.
Topics covered (per course schedule, taught by Prof. Miti Mazmudar):
- Network threat model and protocol stack security
- TCP/IP vulnerabilities (spoofing, hijacking, SYN floods)
- Firewalls and intrusion detection systems
- VPNs and IPsec
- Wireless security (WEP, WPA, 802.11)
- Clickjacking and cross-origin attacks (expanded)
Module 5: Internet Application Security and Cryptography
This module was not included in the course export. Content to be added when source materials become available.
Topics covered (per course schedule, taught by Prof. Asokan and Prof. Mazmudar):
- Symmetric encryption and block ciphers
- Public-key cryptography and RSA
- Hash functions and MACs
- Digital signatures and certificates
- TLS/HTTPS in depth
- Web application security (CSRF, session management)
- Authentication protocols
RSA Encryption Flow
TLS Handshake
Module 6: Database Security and Privacy
This module was not included in the course export. Content to be added when source materials become available.
Topics covered (per course schedule, taught by Prof. Mazmudar):
- Database access control and views
- SQL injection (expanded)
- Statistical database privacy and inference attacks
- Differential privacy
- Data anonymization and re-identification
Module 7: Non-Technical Aspects of Security and Privacy
This module was not included in the course export. Content to be added when source materials become available.
Topics covered (per course schedule, taught by Prof. Mazmudar):
- Legal frameworks for security and privacy
- Ethics and responsible disclosure
- Economics of security
- Security policy and governance
- Incident response and forensics