CS 136: Elementary Algorithm Design and Data Abstraction
Estimated reading time: 19 minutes
Table of contents
1 From Racket to C
CS 136 transitions from the functional paradigm (CS 135, Racket) to the imperative paradigm (C), then toward object-oriented (CS 246, C++).
Imperative Programming. A paradigm based on sequences of statements that change program state through side effects (I/O and mutation), as opposed to functional programming where functions have no side effects and values are immutable.
Key differences at a glance:
| Concept | Racket | C |
|---|---|---|
| Typing | Dynamic | Static |
| Paradigm | Functional | Imperative |
| Entry point | Top of file | main function |
| Equality | equal? | == (primitives only) |
| Comments | ;; or #| ... |# | // or /* ... */ |
| Application | (f x y) | f(x, y) |
A minimal C program:
#include "cs136.h"
int my_sqr(int n) {
return n * n;
}
int main(void) {
assert(my_sqr(7) == 49);
}
Every C program must have exactly one main function. It returns 0 for success.
2 The C Language
2.1 Types
| Type | Size | Range / Notes |
|---|---|---|
int | 4 bytes | \(-2^{31}\) to \(2^{31}-1\) |
char | 1 byte | \(-128\) to \(127\); ASCII characters |
bool | (integer) | 0 (false) or 1 (true); requires <stdbool.h> |
float | 4 bytes | Inexact; 24-bit mantissa, 8-bit exponent |
double | 8 bytes | Better precision than float |
Integer overflow is undefined behavior in C. The constants INT_MIN and INT_MAX are provided by <limits.h>.
2.2 Operators and Precedence
From highest to lowest precedence:
| Precedence | Operators |
|---|---|
| Negation | ! |
| Multiplicative | * / % |
| Additive | + - |
| Comparison | < <= >= > |
| Equality | == != |
| Logical AND | && |
| Logical OR | || |
Integer division (/) truncates toward zero. The remainder operator (%) has the same sign as the dividend.
The conditional (ternary) operator: q ? a : b produces a if q is true, b otherwise.
2.3 Control Flow
if / else if / else:
if (n < 0) {
return -n;
} else if (n == 0) {
return 0;
} else {
return n;
}
while loop:
while (condition) {
// body
}
for loop (condensed while):
for (int i = 0; i < n; ++i) {
// body
}
do … while executes at least once:
do {
// body
} while (condition);
break exits the innermost loop. continue skips to the next iteration. In a for loop, continue executes the update statement before checking the condition.
2.4 Functions
// purpose statement
// requires: preconditions
// effects: side effects (if any)
// time: O(...)
int my_func(int x, int y) {
return x + y;
}
Use void for no parameters or no return value. C uses pass by value: arguments are copied into the stack frame.
2.5 Variables and Mutation
int x = 5; // initialization
x = 10; // assignment (mutation)
x += 3; // compound assignment: x = x + 3
++x; // increment
const int c = 7; // immutable constant
Side Effect. An observable change to program state beyond computing a return value. The two kinds of side effects in this course are I/O (input/output) and mutation (changing stored values). Side effects must be documented with an effects: section.
2.6 I/O
printf("x is %d\n", x); // output int
printf("char: %c\n", ch); // output char
printf("string: %s\n", str); // output string
int n;
scanf("%d", &n); // read int (note &)
char c;
scanf(" %c", &c); // read char, skip whitespace
printf returns the number of characters printed. scanf returns the number of items successfully read, or EOF.
3 Pointers and Memory
Pointer. A variable that stores the memory address of another variable. Declared with , the address-of operator is &, and the dereference (indirection) operator is .
int i = 42;
int *p = &i; // p points at i
*p = 99; // changes i to 99
int **pp = &p; // pointer to pointer
NULL is a special pointer value meaning “points to nothing.” Dereferencing NULL crashes the program.
3.1 Arrow Operator
For pointers to structs, ptr->field is equivalent to (*ptr).field.
struct posn { int x; int y; };
struct posn p = {3, 4};
struct posn *pp = &p;
pp->x = 10; // equivalent to (*pp).x = 10
3.2 Pointer Arithmetic
If p is a pointer of type T *, then p + i advances the address by i * sizeof(T) bytes.
int a[6] = {4, 8, 15, 16, 23, 42};
// a[i] is equivalent to *(a + i)
// &a[i] is equivalent to (a + i)
3.3 Pass by Reference (Emulated)
C only has pass by value. To modify a caller’s variable, pass its address:
void swap(int *px, int *py) {
int temp = *px;
*px = *py;
*py = temp;
}
// Usage: swap(&a, &b);
3.4 const with Pointers
const int *p; // cannot modify *p
int * const p = &i; // cannot change where p points
const int * const p = &i; // neither
It is good style to use const for pointer parameters that should not be modified.
4 Memory Model
C Memory Sections. The five sections of memory (low to high addresses): Code (machine instructions), Read-Only Data (constants, string literals), Global Data (mutable globals), Heap (dynamic allocation, grows upward), and Stack (function frames, grows downward).
4.1 Stack Frames
Each function call creates a stack frame containing:
- copies of argument values (pass by value)
- local variables
- the return address
When the function returns, its frame is popped. Stack overflow occurs when deep or infinite recursion exhausts the stack.
4.2 Heap (Dynamic Memory)
int *a = malloc(n * sizeof(int)); // allocate
a = realloc(a, 2 * n * sizeof(int)); // resize
free(a); // deallocate
a = NULL; // good practice
Memory Leak. Occurs when dynamically allocated memory is never freed. The address is lost, so the memory can never be reclaimed. Programs that leak memory may degrade over time.
Dangling Pointer. A pointer that refers to memory that has already been freed. Dereferencing a dangling pointer is undefined behavior.
malloc returns uninitialized memory and is \(O(1)\). realloc is \(O(n)\) where \(n\) is the new size. free is \(O(1)\).
Doubling strategy: When a dynamic array is full, double its capacity. This yields amortized \(O(1)\) per insertion because the total cost of all resizes for \(n\) insertions is \(O(n)\).
5 Structures
struct posn {
int x;
int y;
};
struct posn p = {3, 4}; // initialization
struct posn q = {.y = 4, .x = 3}; // C99 designated init
p.x = 10; // field access
p = q; // copies all fields
sizeof(struct) may be larger than the sum of field sizes due to padding/alignment. The == operator does not work on structs; write your own equality function.
6 Modularization
Module. A collection of related functions provided through an interface. Modules enable re-usability, maintainability, and abstraction. A good module has high cohesion (related functions) and low coupling (minimal dependencies on other modules).
6.1 Interface (.h) and Implementation (.c)
// mymod.h (INTERFACE)
// purpose of module
int my_func(int x);
// mymod.c (IMPLEMENTATION)
#include "mymod.h"
int my_func(int x) { return x * 2; }
// main.c (CLIENT)
#include "mymod.h"
int main(void) { assert(my_func(3) == 6); }
6.2 Scope
- Local (block) scope: variables inside
{ }. - Module scope:
staticglobal identifiers, visible only in their file. - Program scope: default for globals, visible across files via
externdeclarations.
6.3 Information Hiding and Opaque Structures
An opaque structure is declared without fields in the interface:
// widget.h
struct widget;
struct widget *widget_create(void);
void widget_destroy(struct widget *w);
The client can only use pointers to the struct and must call provided create/destroy functions. This provides security (prevents tampering) and flexibility (implementation can change freely).
7 Abstract Data Types
Abstract Data Type (ADT). A mathematical model for data storage that defines operations (interface) without specifying the underlying data structure (implementation). The implementation is hidden from the client, providing flexibility and security.
Collection ADT. An ADT designed to store an arbitrary number of items. Common collection ADTs include stacks, queues, sequences, and dictionaries.
8 Arrays
int a[6] = {4, 8, 15, 16, 23, 42};
int b[5] = {0}; // all zeros
int c[] = {1, 2, 3}; // length inferred as 3
- Elements are contiguous in memory. Accessing
a[i]is \(O(1)\). - C does not track array length or perform bounds checking.
- When passed to a function, only the address is copied (not the array).
int sum_array(const int a[], int len) {
int sum = 0;
for (int i = 0; i < len; ++i) {
sum += a[i];
}
return sum;
}
8.1 Strings
A C string is a null-terminated ('\0') character array.
char s[] = "cat"; // length 4 (includes '\0')
int len = strlen(s); // 3 (excludes '\0')
strcmp(s1, s2); // 0 if equal, <0 if s1 precedes s2
strcpy(dest, src); // copy src into dest
strcat(dest, src); // append src to dest
Buffer overrun: writing beyond array bounds. A serious security vulnerability in C.
String literals (e.g., "hello") are stored in the read-only data section and must not be modified.
9 Linked Lists
Linked List. A dynamic data structure where each node contains an item and a pointer to the next node. The last node’s pointer is NULL. Unlike arrays, elements are not contiguous and accessing the \(i\)-th element requires \(O(i)\) traversal.
struct llnode {
int item;
struct llnode *next;
};
struct llist {
struct llnode *front;
};
9.1 Core Operations
// Create an empty list
struct llist *list_create(void) {
struct llist *lst = malloc(sizeof(struct llist));
lst->front = NULL;
return lst;
}
// Insert at front: O(1)
void add_front(int i, struct llist *lst) {
struct llnode *node = malloc(sizeof(struct llnode));
node->item = i;
node->next = lst->front;
lst->front = node;
}
// Traversal: O(n)
int length(const struct llist *lst) {
int len = 0;
struct llnode *node = lst->front;
while (node) {
++len;
node = node->next;
}
return len;
}
// Destroy: O(n)
void list_destroy(struct llist *lst) {
struct llnode *cur = lst->front;
while (cur) {
struct llnode *next = cur->next;
free(cur);
cur = next;
}
free(lst);
}
9.2 Linked List vs. Array
| Operation | Dynamic Array | Linked List |
|---|---|---|
| Access by index | \(O(1)\) | \(O(n)\) |
| Insert at front | \(O(n)\) | \(O(1)\) |
| Insert at back | \(O(1)\) amortized | \(O(n)\) (\(O(1)\) with back pointer) |
| Search | \(O(n)\) | \(O(n)\) |
| Remove front | \(O(n)\) | \(O(1)\) |
10 Stacks
Stack ADT. A LIFO (last in, first out) collection. Items are pushed onto and popped from the top. Only the top item is accessible. Used in call stacks, undo histories, and expression evaluation.
Operations: push, pop, top, is_empty – all \(O(1)\).
Array-based implementation (with doubling):
struct stack {
int len;
int maxlen;
int *data;
};
void stack_push(int item, struct stack *s) {
if (s->len == s->maxlen) {
s->maxlen *= 2;
s->data = realloc(s->data, s->maxlen * sizeof(int));
}
s->data[s->len] = item;
s->len += 1;
}
int stack_pop(struct stack *s) {
assert(s->len > 0);
s->len -= 1;
return s->data[s->len];
}
11 Queues
Queue ADT. A FIFO (first in, first out) collection. Items are added at the back and removed from the front, like a lineup.
Operations: add_back, remove_front, front, is_empty – all \(O(1)\) with a linked list and back pointer.
struct llnode {
int item;
struct llnode *next;
};
struct queue {
struct llnode *front;
struct llnode *back;
};
void queue_add_back(int i, struct queue *q) {
struct llnode *node = malloc(sizeof(struct llnode));
node->item = i;
node->next = NULL;
if (q->front == NULL) {
q->front = node;
} else {
q->back->next = node;
}
q->back = node;
}
int queue_remove_front(struct queue *q) {
assert(q->front);
int val = q->front->item;
struct llnode *old = q->front;
q->front = q->front->next;
free(old);
if (q->front == NULL) q->back = NULL;
return val;
}
A doubly linked list with back pointer supports add/remove from both ends in \(O(1)\), implementing a deque (double-ended queue).
12 Trees
Binary Search Tree (BST). A binary tree where for every node with item \(i\), all items in the left subtree are less than \(i\) and all items in the right subtree are greater than \(i\).
struct bstnode {
int item;
struct bstnode *left;
struct bstnode *right;
};
struct bst {
struct bstnode *root;
};
12.1 BST Operations
Search – follow left/right links: \(O(h)\).
Insert (recursive):
struct bstnode *insert_node(int i, struct bstnode *node) {
if (node == NULL) return new_leaf(i);
if (i < node->item)
node->left = insert_node(i, node->left);
else if (i > node->item)
node->right = insert_node(i, node->right);
return node;
}
Remove: Three cases:
- Leaf: simply free.
- One child: promote the child.
- Two children: replace with the next-largest key (smallest in right subtree), then remove that node.
12.2 Tree Traversals
- In-order (left, root, right): produces sorted order for a BST.
- Pre-order (root, left, right): useful for copying a tree.
- Post-order (left, right, root): useful for freeing a tree.
12.3 Balance and Efficiency
Balanced Tree. A tree where the height \(h\) is \(O(\log n)\). Standard BST operations (search, insert, remove) are \(O(h)\), so they are \(O(\log n)\) for balanced trees but \(O(n)\) for degenerate (unbalanced) trees.
A self-balancing BST (e.g., AVL tree, red-black tree) rearranges nodes to maintain balance. Covered in CS 240.
Node augmentation: store extra info per node (e.g., subtree size for \(O(h)\) select).
Array-based tree: root at a[0], left child of a[i] at a[2*i+1], right at a[2*i+2].
13 Efficiency
Big O Notation. \(f(n) \in O(g(n))\) if there exist constants \(c > 0\) and \(n_0 > 0\) such that for all \(n \ge n_0\), \(f(n) \le c \cdot g(n)\). It describes an upper bound on the growth rate of a function.
Common orders (smallest to largest):
\[O(1) \subset O(\log n) \subset O(n) \subset O(n \log n) \subset O(n^2) \subset O(n^3) \subset O(2^n)\]Big O arithmetic:
- \(O(f) + O(g) = O(\max(f, g))\)
- \(O(f) \times O(g) = O(f \cdot g)\)
13.1 Related Notations
| Notation | Meaning |
|---|---|
| \(O(g(n))\) | Upper bound: \(f(n) \le c \cdot g(n)\) |
| \(\Omega(g(n))\) | Lower bound: \(f(n) \ge c \cdot g(n)\) |
| \(\Theta(g(n))\) | Tight bound: \(c_1 \cdot g(n) \le f(n) \le c_2 \cdot g(n)\) |
Unless specified, running time refers to the worst case.
13.2 Recurrence Relations
| Recurrence | Closed Form |
|---|---|
| \(T(n) = O(1) + T(n - k)\) | \(O(n)\) |
| \(T(n) = O(n) + T(n - k)\) | \(O(n^2)\) |
| \(T(n) = O(n^2) + T(n - k)\) | \(O(n^3)\) |
| \(T(n) = O(1) + T(n/k)\) | \(O(\log n)\) |
| \(T(n) = O(1) + k \cdot T(n/k)\) | \(O(n)\) |
| \(T(n) = O(n) + k \cdot T(n/k)\) | \(O(n \log n)\) |
| \(T(n) = O(1) + T(n-k_1) + T(n-k_2)\) | \(O(2^n)\) |
where \(k, k_1, k_2 \ge 1\) and \(k > 1\) for division.
13.3 Analysis Techniques
Iterative: work from innermost loop outward; count iterations and cost per iteration.
Recursive: identify non-recursive cost and recursive call sizes, form recurrence relation, look up closed form.
13.4 Space Complexity
The amount of additional memory required. Tail recursion (recursive call is the last operation) allows stack frame reuse, achieving \(O(1)\) space.
14 Sorting
Sorting Complexity Summary.
| Algorithm | Best Case | Worst Case | Space |
|---|---|---|---|
| Insertion sort | \(O(n)\) | \(O(n^2)\) | \(O(1)\) |
| Selection sort | \(O(n^2)\) | \(O(n^2)\) | \(O(1)\) |
| Merge sort | \(O(n \log n)\) | \(O(n \log n)\) | \(O(n)\) |
| Quick sort | \(O(n \log n)\) | \(O(n^2)\) | \(O(\log n)\) |
14.1 Selection Sort
Find the minimum in the unsorted portion, swap it to the front. Repeat.
void selection_sort(int a[], int len) {
for (int i = 0; i < len - 1; ++i) {
int pos = i;
for (int j = i + 1; j < len; ++j) {
if (a[j] < a[pos]) pos = j;
}
swap(&a[i], &a[pos]);
}
}
14.2 Insertion Sort
Build a sorted prefix by inserting each element into its correct position.
\(T(n) = O(n) + T(n-1) = O(n^2)\) worst case; \(O(n)\) if nearly sorted.
14.3 Merge Sort
Divide the array in half, sort each half recursively, merge the sorted halves.
void merge_sort(int a[], int len) {
if (len <= 1) return;
int llen = len / 2;
int rlen = len - llen;
int *left = malloc(llen * sizeof(int));
int *right = malloc(rlen * sizeof(int));
for (int i = 0; i < llen; ++i) left[i] = a[i];
for (int i = 0; i < rlen; ++i) right[i] = a[i + llen];
merge_sort(left, llen);
merge_sort(right, rlen);
merge(a, left, llen, right, rlen);
free(left);
free(right);
}
Merge Sort Complexity. Merge sort runs in \(O(n \log n)\) time in all cases, since \(T(n) = O(n) + 2T(n/2) = O(n \log n)\).
14.4 Quick Sort
Select a pivot, partition elements into less-than and greater-than groups, recurse.
void quick_sort_range(int a[], int first, int last) {
if (last <= first) return;
int pivot = a[first];
int pos = last;
for (int i = last; i > first; --i) {
if (a[i] > pivot) {
swap(&a[pos], &a[i]);
--pos;
}
}
swap(&a[first], &a[pos]);
quick_sort_range(a, first, pos - 1);
quick_sort_range(a, pos + 1, last);
}
Best case (balanced partitions): \(T(n) = O(n) + 2T(n/2) = O(n \log n)\). Worst case (sorted input, first-element pivot): \(T(n) = O(n) + T(n-1) = O(n^2)\).
14.5 Binary Search
On a sorted array, repeatedly halve the search range:
int binary_search(int item, const int a[], int len) {
int low = 0, high = len - 1;
while (low <= high) {
int mid = (low + high) / 2;
if (a[mid] == item) return mid;
else if (a[mid] < item) low = mid + 1;
else high = mid - 1;
}
return -1;
}
Binary Search Complexity. Binary search on a sorted array of \(n\) elements runs in \(O(\log n)\) time.
15 Hash Tables
Hash Table. A data structure that maps keys to values using a hash function to compute an index into an array of buckets. With a good hash function, average-case lookup, insert, and delete are \(O(1)\).
A hash function \(h(k)\) maps a key \(k\) to an index in range \([0, m-1]\) where \(m\) is the table size.
Load factor: \(\alpha = n/m\), where \(n\) is the number of stored items.
15.1 Collision Resolution
Chaining (separate chaining): Each bucket is a linked list. Insert appends to the list. Average search time is \(O(1 + \alpha)\).
Open addressing (probing): On collision, probe subsequent slots:
- Linear probing: check \(h(k)+1, h(k)+2, \ldots\) (suffers from clustering).
- Quadratic probing: check \(h(k)+1^2, h(k)+2^2, \ldots\)
- Double hashing: use a second hash function for the step size.
When the load factor exceeds a threshold (e.g., 0.75), resize the table and rehash all entries.
Hash Table Average Complexity. With a good hash function and bounded load factor, hash table operations (insert, lookup, remove) run in \(O(1)\) expected (average) time. Worst case is \(O(n)\) when all keys collide.
16 Graphs
Graph. A data structure consisting of nodes (vertices) connected by edges. Graphs may be directed or undirected, weighted or unweighted, cyclic or acyclic. Linked lists and trees are special cases of graphs.
16.1 Representations
Adjacency matrix: A 2D array where entry \((i,j)\) indicates an edge from vertex \(i\) to vertex \(j\). Space: \(O(V^2)\). Edge lookup: \(O(1)\).
Adjacency list: An array of linked lists, where list \(i\) stores the neighbors of vertex \(i\). Space: \(O(V + E)\). Better for sparse graphs.
16.2 BFS and DFS
Breadth-First Search (BFS): Uses a queue. Explores all neighbors at the current depth before moving deeper. Finds shortest paths in unweighted graphs. Time: \(O(V + E)\).
Depth-First Search (DFS): Uses a stack (or recursion). Explores as far as possible along each branch before backtracking. Time: \(O(V + E)\).
17 ADT Design and Data Structure Selection
17.1 Sequenced vs. Unsequenced Data
- Sequenced data (order matters): use arrays or linked lists.
- Unsequenced data (rearrangeable): use BSTs or hash tables.
17.2 Comparison Table: Sequenced Data
| Operation | Dynamic Array | Linked List |
|---|---|---|
| item_at(i) | \(O(1)\) | \(O(n)\) |
| insert_front | \(O(n)\) | \(O(1)\) |
| insert_back | \(O(1)\)* | \(O(1)\)** |
| remove_front | \(O(n)\) | \(O(1)\) |
| search | \(O(n)\) | \(O(n)\) |
* amortized with doubling. ** with back pointer.
17.3 Comparison Table: Unsequenced (Sorted) Data
| Operation | Sorted Array | Sorted LL | BST | Balanced BST |
|---|---|---|---|---|
| search | \(O(\log n)\) | \(O(n)\) | \(O(h)\) | \(O(\log n)\) |
| insert | \(O(n)\) | \(O(n)\) | \(O(h)\) | \(O(\log n)\) |
| remove | \(O(n)\) | \(O(n)\) | \(O(h)\) | \(O(\log n)\) |
| select(k) | \(O(1)\) | \(O(n)\) | \(O(h)\)* | \(O(\log n)\)* |
* with size augmentation.
17.4 Typical ADT Implementations
| ADT | Common Implementation |
|---|---|
| Stack | Dynamic array or linked list |
| Queue | Linked list (with back pointer) |
| Sequence | Dynamic array or linked list |
| Dictionary / Set | Self-balancing BST or hash table |
17.5 Generic ADTs in C
typedef strategy: Define typedef int ItemType; in a header and use ItemType throughout the ADT.
void pointer strategy: Use void * for items. Requires comparison function pointers and clear memory ownership protocol.
// comparison function convention (like strcmp)
int compare_ints(const void *a, const void *b) {
const int *ia = a;
const int *ib = b;
if (*ia < *ib) return -1;
if (*ia > *ib) return 1;
return 0;
}
C’s standard library provides generic qsort and bsearch:
qsort(arr, len, sizeof(int), compare_ints);
int *result = bsearch(&key, arr, len, sizeof(int), compare_ints);
18 The Dictionary ADT
Dictionary (Map). A collection of unique key-value pairs supporting lookup, insert, and remove by key. Also known as a map, associative array, or symbol table.
// dictionary.h
struct dictionary;
struct dictionary *dict_create(void);
void dict_insert(int key, const char *val, struct dictionary *d);
const char *dict_lookup(int key, struct dictionary *d);
void dict_remove(int key, struct dictionary *d);
void dict_destroy(struct dictionary *d);
Can be implemented with a BST (operations are \(O(h)\)) or a hash table (operations are \(O(1)\) average).
19 Compilation and Beyond
19.1 Compilation Pipeline
- Preprocessing:
#includeinserts header contents;#defineperforms text substitution. - Compiling: Each
.cfile is converted to an object file (.o) containing machine code with unresolved references. - Linking: Object files are combined, addresses are resolved, producing an executable.
gcc -c module.c # compile only -> module.o
gcc main.c module.o -o prog # compile + link
./prog # execute
19.2 Command-Line Arguments
int main(int argc, char *argv[]) {
// argc = number of arguments (including program name)
// argv[0] = program name
// argv[1]...argv[argc-1] = arguments
}
19.3 I/O Redirection
./prog < input.txt # redirect stdin from file
./prog > output.txt # redirect stdout to file
./prog1 | ./prog2 # pipe stdout of prog1 to stdin of prog2