CS 136: Elementary Algorithm Design and Data Abstraction

Estimated reading time: 19 minutes

Table of contents

1 From Racket to C

CS 136 transitions from the functional paradigm (CS 135, Racket) to the imperative paradigm (C), then toward object-oriented (CS 246, C++).

Imperative Programming. A paradigm based on sequences of statements that change program state through side effects (I/O and mutation), as opposed to functional programming where functions have no side effects and values are immutable.

Key differences at a glance:

Concept	Racket	C
Typing	Dynamic	Static
Paradigm	Functional	Imperative
Entry point	Top of file	`main` function
Equality	`equal?`	`==` (primitives only)
Comments	`;;` or `#\| ... \|#`	`//` or `/* ... */`
Application	`(f x y)`	`f(x, y)`

A minimal C program:

#include "cs136.h"

int my_sqr(int n) {
  return n * n;
}

int main(void) {
  assert(my_sqr(7) == 49);
}

Every C program must have exactly one main function. It returns 0 for success.

2 The C Language

2.1 Types

Type	Size	Range / Notes
`int`	4 bytes	\(-2^{31}\) to \(2^{31}-1\)
`char`	1 byte	\(-128\) to \(127\); ASCII characters
`bool`	(integer)	`0` (false) or `1` (true); requires `<stdbool.h>`
`float`	4 bytes	Inexact; 24-bit mantissa, 8-bit exponent
`double`	8 bytes	Better precision than `float`

Integer overflow is undefined behavior in C. The constants INT_MIN and INT_MAX are provided by <limits.h>.

2.2 Operators and Precedence

From highest to lowest precedence:

Precedence	Operators
Negation	`!`
Multiplicative	`* / %`
Additive	`+ -`
Comparison	`< <= >= >`
Equality	`== !=`
Logical AND	`&&`
Logical OR	`\|\|`

Integer division (/) truncates toward zero. The remainder operator (%) has the same sign as the dividend.

The conditional (ternary) operator: q ? a : b produces a if q is true, b otherwise.

2.3 Control Flow

if / else if / else:

if (n < 0) {
  return -n;
} else if (n == 0) {
  return 0;
} else {
  return n;
}

while loop:

while (condition) {
  // body
}

for loop (condensed while):

for (int i = 0; i < n; ++i) {
  // body
}

do … while executes at least once:

do {
  // body
} while (condition);

break exits the innermost loop. continue skips to the next iteration. In a for loop, continue executes the update statement before checking the condition.

2.4 Functions

// purpose statement
// requires: preconditions
// effects: side effects (if any)
// time: O(...)
int my_func(int x, int y) {
  return x + y;
}

Use void for no parameters or no return value. C uses pass by value: arguments are copied into the stack frame.

2.5 Variables and Mutation

int x = 5;       // initialization
x = 10;          // assignment (mutation)
x += 3;          // compound assignment: x = x + 3
++x;             // increment
const int c = 7; // immutable constant

Side Effect. An observable change to program state beyond computing a return value. The two kinds of side effects in this course are I/O (input/output) and mutation (changing stored values). Side effects must be documented with an effects: section.

2.6 I/O

printf("x is %d\n", x);          // output int
printf("char: %c\n", ch);        // output char
printf("string: %s\n", str);     // output string

int n;
scanf("%d", &n);                  // read int (note &)
char c;
scanf(" %c", &c);                // read char, skip whitespace

printf returns the number of characters printed. scanf returns the number of items successfully read, or EOF.

3 Pointers and Memory

Pointer. A variable that stores the memory address of another variable. Declared with , the address-of operator is &, and the dereference (indirection) operator is .

int i = 42;
int *p = &i;    // p points at i
*p = 99;        // changes i to 99

int **pp = &p;  // pointer to pointer

NULL is a special pointer value meaning “points to nothing.” Dereferencing NULL crashes the program.

3.1 Arrow Operator

For pointers to structs, ptr->field is equivalent to (*ptr).field.

struct posn { int x; int y; };
struct posn p = {3, 4};
struct posn *pp = &p;
pp->x = 10;  // equivalent to (*pp).x = 10

3.2 Pointer Arithmetic

If p is a pointer of type T *, then p + i advances the address by i * sizeof(T) bytes.

int a[6] = {4, 8, 15, 16, 23, 42};
// a[i] is equivalent to *(a + i)
// &a[i] is equivalent to (a + i)

3.3 Pass by Reference (Emulated)

C only has pass by value. To modify a caller’s variable, pass its address:

void swap(int *px, int *py) {
  int temp = *px;
  *px = *py;
  *py = temp;
}
// Usage: swap(&a, &b);

3.4 const with Pointers

const int *p;           // cannot modify *p
int * const p = &i;     // cannot change where p points
const int * const p = &i; // neither

It is good style to use const for pointer parameters that should not be modified.

4 Memory Model

C Memory Sections. The five sections of memory (low to high addresses): Code (machine instructions), Read-Only Data (constants, string literals), Global Data (mutable globals), Heap (dynamic allocation, grows upward), and Stack (function frames, grows downward).

4.1 Stack Frames

Each function call creates a stack frame containing:

copies of argument values (pass by value)
local variables
the return address

When the function returns, its frame is popped. Stack overflow occurs when deep or infinite recursion exhausts the stack.

4.2 Heap (Dynamic Memory)

int *a = malloc(n * sizeof(int));   // allocate
a = realloc(a, 2 * n * sizeof(int)); // resize
free(a);                             // deallocate
a = NULL;                            // good practice

Memory Leak. Occurs when dynamically allocated memory is never freed. The address is lost, so the memory can never be reclaimed. Programs that leak memory may degrade over time.

Dangling Pointer. A pointer that refers to memory that has already been freed. Dereferencing a dangling pointer is undefined behavior.

malloc returns uninitialized memory and is \(O(1)\). realloc is \(O(n)\) where \(n\) is the new size. free is \(O(1)\).

Doubling strategy: When a dynamic array is full, double its capacity. This yields amortized \(O(1)\) per insertion because the total cost of all resizes for \(n\) insertions is \(O(n)\).

5 Structures

struct posn {
  int x;
  int y;
};

struct posn p = {3, 4};       // initialization
struct posn q = {.y = 4, .x = 3}; // C99 designated init
p.x = 10;                    // field access
p = q;                       // copies all fields

sizeof(struct) may be larger than the sum of field sizes due to padding/alignment. The == operator does not work on structs; write your own equality function.

6 Modularization

Module. A collection of related functions provided through an interface. Modules enable re-usability, maintainability, and abstraction. A good module has high cohesion (related functions) and low coupling (minimal dependencies on other modules).

6.1 Interface (.h) and Implementation (.c)

// mymod.h (INTERFACE)
// purpose of module
int my_func(int x);

// mymod.c (IMPLEMENTATION)
#include "mymod.h"
int my_func(int x) { return x * 2; }

// main.c (CLIENT)
#include "mymod.h"
int main(void) { assert(my_func(3) == 6); }

6.2 Scope

Local (block) scope: variables inside { }.
Module scope: static global identifiers, visible only in their file.
Program scope: default for globals, visible across files via extern declarations.

6.3 Information Hiding and Opaque Structures

An opaque structure is declared without fields in the interface:

// widget.h
struct widget;
struct widget *widget_create(void);
void widget_destroy(struct widget *w);

The client can only use pointers to the struct and must call provided create/destroy functions. This provides security (prevents tampering) and flexibility (implementation can change freely).

7 Abstract Data Types

Abstract Data Type (ADT). A mathematical model for data storage that defines operations (interface) without specifying the underlying data structure (implementation). The implementation is hidden from the client, providing flexibility and security.

Collection ADT. An ADT designed to store an arbitrary number of items. Common collection ADTs include stacks, queues, sequences, and dictionaries.

8 Arrays

int a[6] = {4, 8, 15, 16, 23, 42};
int b[5] = {0};      // all zeros
int c[] = {1, 2, 3};  // length inferred as 3

Elements are contiguous in memory. Accessing a[i] is \(O(1)\).
C does not track array length or perform bounds checking.
When passed to a function, only the address is copied (not the array).

int sum_array(const int a[], int len) {
  int sum = 0;
  for (int i = 0; i < len; ++i) {
    sum += a[i];
  }
  return sum;
}

8.1 Strings

A C string is a null-terminated ('\0') character array.

char s[] = "cat";           // length 4 (includes '\0')
int len = strlen(s);        // 3 (excludes '\0')
strcmp(s1, s2);             // 0 if equal, <0 if s1 precedes s2
strcpy(dest, src);          // copy src into dest
strcat(dest, src);          // append src to dest

Buffer overrun: writing beyond array bounds. A serious security vulnerability in C.

String literals (e.g., "hello") are stored in the read-only data section and must not be modified.

9 Linked Lists

Linked List. A dynamic data structure where each node contains an item and a pointer to the next node. The last node’s pointer is NULL. Unlike arrays, elements are not contiguous and accessing the \(i\)-th element requires \(O(i)\) traversal.

struct llnode {
  int item;
  struct llnode *next;
};

struct llist {
  struct llnode *front;
};

9.1 Core Operations

// Create an empty list
struct llist *list_create(void) {
  struct llist *lst = malloc(sizeof(struct llist));
  lst->front = NULL;
  return lst;
}

// Insert at front: O(1)
void add_front(int i, struct llist *lst) {
  struct llnode *node = malloc(sizeof(struct llnode));
  node->item = i;
  node->next = lst->front;
  lst->front = node;
}

// Traversal: O(n)
int length(const struct llist *lst) {
  int len = 0;
  struct llnode *node = lst->front;
  while (node) {
    ++len;
    node = node->next;
  }
  return len;
}

// Destroy: O(n)
void list_destroy(struct llist *lst) {
  struct llnode *cur = lst->front;
  while (cur) {
    struct llnode *next = cur->next;
    free(cur);
    cur = next;
  }
  free(lst);
}

9.2 Linked List vs. Array

Operation	Dynamic Array	Linked List
Access by index	\(O(1)\)	\(O(n)\)
Insert at front	\(O(n)\)	\(O(1)\)
Insert at back	\(O(1)\) amortized	\(O(n)\) (\(O(1)\) with back pointer)
Search	\(O(n)\)	\(O(n)\)
Remove front	\(O(n)\)	\(O(1)\)

10 Stacks

Stack ADT. A LIFO (last in, first out) collection. Items are pushed onto and popped from the top. Only the top item is accessible. Used in call stacks, undo histories, and expression evaluation.

Operations: push, pop, top, is_empty – all \(O(1)\).

Array-based implementation (with doubling):

struct stack {
  int len;
  int maxlen;
  int *data;
};

void stack_push(int item, struct stack *s) {
  if (s->len == s->maxlen) {
    s->maxlen *= 2;
    s->data = realloc(s->data, s->maxlen * sizeof(int));
  }
  s->data[s->len] = item;
  s->len += 1;
}

int stack_pop(struct stack *s) {
  assert(s->len > 0);
  s->len -= 1;
  return s->data[s->len];
}

11 Queues

Queue ADT. A FIFO (first in, first out) collection. Items are added at the back and removed from the front, like a lineup.

Operations: add_back, remove_front, front, is_empty – all \(O(1)\) with a linked list and back pointer.

struct llnode {
  int item;
  struct llnode *next;
};

struct queue {
  struct llnode *front;
  struct llnode *back;
};

void queue_add_back(int i, struct queue *q) {
  struct llnode *node = malloc(sizeof(struct llnode));
  node->item = i;
  node->next = NULL;
  if (q->front == NULL) {
    q->front = node;
  } else {
    q->back->next = node;
  }
  q->back = node;
}

int queue_remove_front(struct queue *q) {
  assert(q->front);
  int val = q->front->item;
  struct llnode *old = q->front;
  q->front = q->front->next;
  free(old);
  if (q->front == NULL) q->back = NULL;
  return val;
}

A doubly linked list with back pointer supports add/remove from both ends in \(O(1)\), implementing a deque (double-ended queue).

12 Trees

Binary Search Tree (BST). A binary tree where for every node with item \(i\), all items in the left subtree are less than \(i\) and all items in the right subtree are greater than \(i\).

struct bstnode {
  int item;
  struct bstnode *left;
  struct bstnode *right;
};

struct bst {
  struct bstnode *root;
};

12.1 BST Operations

Search – follow left/right links: \(O(h)\).

Insert (recursive):

struct bstnode *insert_node(int i, struct bstnode *node) {
  if (node == NULL) return new_leaf(i);
  if (i < node->item)
    node->left = insert_node(i, node->left);
  else if (i > node->item)
    node->right = insert_node(i, node->right);
  return node;
}

Remove: Three cases:

Leaf: simply free.
One child: promote the child.
Two children: replace with the next-largest key (smallest in right subtree), then remove that node.

12.2 Tree Traversals

In-order (left, root, right): produces sorted order for a BST.
Pre-order (root, left, right): useful for copying a tree.
Post-order (left, right, root): useful for freeing a tree.

12.3 Balance and Efficiency

Balanced Tree. A tree where the height \(h\) is \(O(\log n)\). Standard BST operations (search, insert, remove) are \(O(h)\), so they are \(O(\log n)\) for balanced trees but \(O(n)\) for degenerate (unbalanced) trees.

A self-balancing BST (e.g., AVL tree, red-black tree) rearranges nodes to maintain balance. Covered in CS 240.

Node augmentation: store extra info per node (e.g., subtree size for \(O(h)\) select).

Array-based tree: root at a[0], left child of a[i] at a[2*i+1], right at a[2*i+2].

13 Efficiency

Big O Notation. \(f(n) \in O(g(n))\) if there exist constants \(c > 0\) and \(n_0 > 0\) such that for all \(n \ge n_0\), \(f(n) \le c \cdot g(n)\). It describes an upper bound on the growth rate of a function.

Common orders (smallest to largest):

\[O(1) \subset O(\log n) \subset O(n) \subset O(n \log n) \subset O(n^2) \subset O(n^3) \subset O(2^n)\]

Big O arithmetic:

\(O(f) + O(g) = O(\max(f, g))\)
\(O(f) \times O(g) = O(f \cdot g)\)

Notation	Meaning
\(O(g(n))\)	Upper bound: \(f(n) \le c \cdot g(n)\)
\(\Omega(g(n))\)	Lower bound: \(f(n) \ge c \cdot g(n)\)
\(\Theta(g(n))\)	Tight bound: \(c_1 \cdot g(n) \le f(n) \le c_2 \cdot g(n)\)

Unless specified, running time refers to the worst case.

13.2 Recurrence Relations

Recurrence	Closed Form
\(T(n) = O(1) + T(n - k)\)	\(O(n)\)
\(T(n) = O(n) + T(n - k)\)	\(O(n^2)\)
\(T(n) = O(n^2) + T(n - k)\)	\(O(n^3)\)
\(T(n) = O(1) + T(n/k)\)	\(O(\log n)\)
\(T(n) = O(1) + k \cdot T(n/k)\)	\(O(n)\)
\(T(n) = O(n) + k \cdot T(n/k)\)	\(O(n \log n)\)
\(T(n) = O(1) + T(n-k_1) + T(n-k_2)\)	\(O(2^n)\)

where \(k, k_1, k_2 \ge 1\) and \(k > 1\) for division.

13.3 Analysis Techniques

Iterative: work from innermost loop outward; count iterations and cost per iteration.

Recursive: identify non-recursive cost and recursive call sizes, form recurrence relation, look up closed form.

13.4 Space Complexity

The amount of additional memory required. Tail recursion (recursive call is the last operation) allows stack frame reuse, achieving \(O(1)\) space.

14 Sorting

Sorting Complexity Summary.

Algorithm	Best Case	Worst Case	Space
Insertion sort	\(O(n)\)	\(O(n^2)\)	\(O(1)\)
Selection sort	\(O(n^2)\)	\(O(n^2)\)	\(O(1)\)
Merge sort	\(O(n \log n)\)	\(O(n \log n)\)	\(O(n)\)
Quick sort	\(O(n \log n)\)	\(O(n^2)\)	\(O(\log n)\)

14.1 Selection Sort

Find the minimum in the unsorted portion, swap it to the front. Repeat.

void selection_sort(int a[], int len) {
  for (int i = 0; i < len - 1; ++i) {
    int pos = i;
    for (int j = i + 1; j < len; ++j) {
      if (a[j] < a[pos]) pos = j;
    }
    swap(&a[i], &a[pos]);
  }
}

14.2 Insertion Sort

Build a sorted prefix by inserting each element into its correct position.

\(T(n) = O(n) + T(n-1) = O(n^2)\) worst case; \(O(n)\) if nearly sorted.

14.3 Merge Sort

Divide the array in half, sort each half recursively, merge the sorted halves.

void merge_sort(int a[], int len) {
  if (len <= 1) return;
  int llen = len / 2;
  int rlen = len - llen;
  int *left = malloc(llen * sizeof(int));
  int *right = malloc(rlen * sizeof(int));
  for (int i = 0; i < llen; ++i) left[i] = a[i];
  for (int i = 0; i < rlen; ++i) right[i] = a[i + llen];
  merge_sort(left, llen);
  merge_sort(right, rlen);
  merge(a, left, llen, right, rlen);
  free(left);
  free(right);
}

Merge Sort Complexity. Merge sort runs in \(O(n \log n)\) time in all cases, since \(T(n) = O(n) + 2T(n/2) = O(n \log n)\).

14.4 Quick Sort

Select a pivot, partition elements into less-than and greater-than groups, recurse.

void quick_sort_range(int a[], int first, int last) {
  if (last <= first) return;
  int pivot = a[first];
  int pos = last;
  for (int i = last; i > first; --i) {
    if (a[i] > pivot) {
      swap(&a[pos], &a[i]);
      --pos;
    }
  }
  swap(&a[first], &a[pos]);
  quick_sort_range(a, first, pos - 1);
  quick_sort_range(a, pos + 1, last);
}

Best case (balanced partitions): \(T(n) = O(n) + 2T(n/2) = O(n \log n)\). Worst case (sorted input, first-element pivot): \(T(n) = O(n) + T(n-1) = O(n^2)\).

14.5 Binary Search

On a sorted array, repeatedly halve the search range:

int binary_search(int item, const int a[], int len) {
  int low = 0, high = len - 1;
  while (low <= high) {
    int mid = (low + high) / 2;
    if (a[mid] == item) return mid;
    else if (a[mid] < item) low = mid + 1;
    else high = mid - 1;
  }
  return -1;
}

Binary Search Complexity. Binary search on a sorted array of \(n\) elements runs in \(O(\log n)\) time.

15 Hash Tables

Hash Table. A data structure that maps keys to values using a hash function to compute an index into an array of buckets. With a good hash function, average-case lookup, insert, and delete are \(O(1)\).

A hash function \(h(k)\) maps a key \(k\) to an index in range \([0, m-1]\) where \(m\) is the table size.

Load factor: \(\alpha = n/m\), where \(n\) is the number of stored items.

15.1 Collision Resolution

Chaining (separate chaining): Each bucket is a linked list. Insert appends to the list. Average search time is \(O(1 + \alpha)\).

Open addressing (probing): On collision, probe subsequent slots:

Linear probing: check \(h(k)+1, h(k)+2, \ldots\) (suffers from clustering).
Quadratic probing: check \(h(k)+1^2, h(k)+2^2, \ldots\)
Double hashing: use a second hash function for the step size.

When the load factor exceeds a threshold (e.g., 0.75), resize the table and rehash all entries.

Hash Table Average Complexity. With a good hash function and bounded load factor, hash table operations (insert, lookup, remove) run in \(O(1)\) expected (average) time. Worst case is \(O(n)\) when all keys collide.

16 Graphs

Graph. A data structure consisting of nodes (vertices) connected by edges. Graphs may be directed or undirected, weighted or unweighted, cyclic or acyclic. Linked lists and trees are special cases of graphs.

16.1 Representations

Adjacency matrix: A 2D array where entry \((i,j)\) indicates an edge from vertex \(i\) to vertex \(j\). Space: \(O(V^2)\). Edge lookup: \(O(1)\).

Adjacency list: An array of linked lists, where list \(i\) stores the neighbors of vertex \(i\). Space: \(O(V + E)\). Better for sparse graphs.

16.2 BFS and DFS

Breadth-First Search (BFS): Uses a queue. Explores all neighbors at the current depth before moving deeper. Finds shortest paths in unweighted graphs. Time: \(O(V + E)\).

Depth-First Search (DFS): Uses a stack (or recursion). Explores as far as possible along each branch before backtracking. Time: \(O(V + E)\).

17 ADT Design and Data Structure Selection

17.1 Sequenced vs. Unsequenced Data

Sequenced data (order matters): use arrays or linked lists.
Unsequenced data (rearrangeable): use BSTs or hash tables.

17.2 Comparison Table: Sequenced Data

Operation	Dynamic Array	Linked List
item_at(i)	\(O(1)\)	\(O(n)\)
insert_front	\(O(n)\)	\(O(1)\)
insert_back	\(O(1)\)*	\(O(1)\)**
remove_front	\(O(n)\)	\(O(1)\)
search	\(O(n)\)	\(O(n)\)

* amortized with doubling. ** with back pointer.

17.3 Comparison Table: Unsequenced (Sorted) Data

Operation	Sorted Array	Sorted LL	BST	Balanced BST
search	\(O(\log n)\)	\(O(n)\)	\(O(h)\)	\(O(\log n)\)
insert	\(O(n)\)	\(O(n)\)	\(O(h)\)	\(O(\log n)\)
remove	\(O(n)\)	\(O(n)\)	\(O(h)\)	\(O(\log n)\)
select(k)	\(O(1)\)	\(O(n)\)	\(O(h)\)*	\(O(\log n)\)*

* with size augmentation.

17.4 Typical ADT Implementations

ADT	Common Implementation
Stack	Dynamic array or linked list
Queue	Linked list (with back pointer)
Sequence	Dynamic array or linked list
Dictionary / Set	Self-balancing BST or hash table

17.5 Generic ADTs in C

typedef strategy: Define typedef int ItemType; in a header and use ItemType throughout the ADT.

void pointer strategy: Use void * for items. Requires comparison function pointers and clear memory ownership protocol.

// comparison function convention (like strcmp)
int compare_ints(const void *a, const void *b) {
  const int *ia = a;
  const int *ib = b;
  if (*ia < *ib) return -1;
  if (*ia > *ib) return 1;
  return 0;
}

C’s standard library provides generic qsort and bsearch:

qsort(arr, len, sizeof(int), compare_ints);
int *result = bsearch(&key, arr, len, sizeof(int), compare_ints);

18 The Dictionary ADT

Dictionary (Map). A collection of unique key-value pairs supporting lookup, insert, and remove by key. Also known as a map, associative array, or symbol table.

// dictionary.h
struct dictionary;
struct dictionary *dict_create(void);
void dict_insert(int key, const char *val, struct dictionary *d);
const char *dict_lookup(int key, struct dictionary *d);
void dict_remove(int key, struct dictionary *d);
void dict_destroy(struct dictionary *d);

Can be implemented with a BST (operations are \(O(h)\)) or a hash table (operations are \(O(1)\) average).

19 Compilation and Beyond

19.1 Compilation Pipeline

Preprocessing: #include inserts header contents; #define performs text substitution.
Compiling: Each .c file is converted to an object file (.o) containing machine code with unresolved references.
Linking: Object files are combined, addresses are resolved, producing an executable.

gcc -c module.c           # compile only -> module.o
gcc main.c module.o -o prog  # compile + link
./prog                       # execute

19.2 Command-Line Arguments

int main(int argc, char *argv[]) {
  // argc = number of arguments (including program name)
  // argv[0] = program name
  // argv[1]...argv[argc-1] = arguments
}

19.3 I/O Redirection

./prog < input.txt           # redirect stdin from file
./prog > output.txt          # redirect stdout to file
./prog1 | ./prog2            # pipe stdout of prog1 to stdin of prog2