DAC 202: Multi-modal Communication Design
Terry O'Neill
Estimated study time: 39 minutes
Table of contents
Sources and References
Primary textbook — Steven Ascher & Edward Pincus, The Filmmaker’s Handbook: A Comprehensive Guide for the Digital Age, 5th ed. (Plume). Supplementary texts — Walter Murch In the Blink of an Eye; Bruce Block The Visual Story; Blain Brown Cinematography: Theory and Practice; Bordwell & Thompson Film Art. Online resources — Adobe Premiere Pro and DaVinci Resolve official training; Vimeo Video School; CalArts open materials.
Chapter 1 — Time-Based Media as Social Communication
A photograph stops time; a video unfolds inside it. That single difference rewires everything about how a message lands. When an image persists in front of a viewer for three seconds rather than three hundred milliseconds, the maker takes responsibility not only for what is shown but for when it is shown, in what order, against what sound, and with what rhythm. Bordwell and Thompson call this the formal system of cinema: a patterned organization of elements across time that produces meaning the viewer cannot get from any single frame. Lev Manovich, writing about digital media, extends the idea. In The Language of New Media he argues that moving images on networked platforms are no longer finished artifacts broadcast one-to-many; they are objects that audiences crop, remix, caption, and re-share, so that the original maker’s framing is only the first of many framings. A twenty-first-century video designer works inside that remix economy from the first shot.
Multi-modal communication means that meaning is carried simultaneously by more than one channel — image, speech, ambient sound, music, on-screen text, motion, color, duration — and that these channels can agree, contradict, or ironize each other. Gunther Kress and Theo van Leeuwen, in the social-semiotics tradition, treat each channel as a mode with its own grammar, and treat video as the orchestration of modes in time. A news anchor’s calm voice over shaky handheld combat footage is not the same message as calm voice over calm studio b-roll; the modes do not add, they interact. Design, then, is the deliberate management of those interactions.
Social context is never an afterthought. A ten-second vertical clip scrolled past on a phone at seven in the morning sits inside a different communicative frame than the same footage cut as a two-minute horizontal mini-doc on a laptop. Platform, aspect ratio, sound-on defaults, caption habits, and watch-duration norms belong to the design brief before a single shot is recorded. Treating the moving image as a social artifact also raises questions of authorship and ethics — whose voices, who holds the camera, whose likeness, who holds the rights — that sit inside the craft and return in every chapter that follows.
Finally, time itself is a material. Walter Murch, in In the Blink of an Eye, writes that film is made out of decisions about when to cut, and that those decisions mirror the blinks and micro-attention shifts of the human nervous system. A viewer’s attention is a budget the maker spends and replenishes, shot by shot.
Chapter 2 — Video Pre-Production: Ideation, Treatments, Storyboards
Pre-production is where problems are cheap to fix. A script rewrite costs a coffee and an afternoon; a reshoot costs a crew, a location, and the good will of everyone involved. Ascher and Pincus open The Filmmaker’s Handbook with the reminder that every dollar or hour saved in production usually came from an hour spent in pre-production, and the rule scales down to a one-person social video team just as cleanly as it scales up to a feature crew.
Ideation begins with a single, tight sentence: what is this video trying to make the viewer feel, know, or do? If the answer takes a paragraph, the project is still unresolved. From that sentence emerges the logline — a one- or two-sentence summary that names the subject, the situation, and the turn that makes the story worth watching. For a non-fiction social video the logline might be “A first-generation student walks us through the morning routine that keeps her grounded during finals.” For a brand piece it might be “A local bakery’s five a.m. loaf is an act of stubborn craft in a delivery-app world.” Either way, the logline is the contract everyone on the project signs, silently, with everyone else.
The treatment expands the logline into prose. A page or two, written in present tense, describes what the audience will see and hear from the first frame to the last. Treatments do not look like scripts; they read like short stories told through a camera. The point is to let collaborators, subjects, and clients understand the emotional arc before a storyboard commits to any particular shot. Ascher and Pincus recommend writing the treatment early enough that it can still be thrown out, because any treatment that cannot be thrown out has stopped serving the project.
The shot list and storyboard translate intent into logistics. A shot list is a numbered table: shot number, location, frame size, lens, camera movement, audio source, notes. A storyboard is a sequence of rough drawings — stick figures are fine — that show the frame the camera will actually record. Storyboards are not a demonstration of drawing skill; they are a thinking tool. Drawing forces the maker to answer questions the prose elided: where is the light, where is the eye-line, what is in the foreground, does this cut match on action or on concept. Bruce Block’s The Visual Story adds a second layer: sketch the graphic structure of each frame — line, shape, contrast, space — so that the storyboard reveals the visual rhythm of the cut, not just the staging.
For short-form social video the pre-production deliverables shrink but never disappear. A TikTok creator still benefits from a three-line treatment and a six-panel board scribbled on a phone notes app. The question “what is the hook in the first second” belongs to pre-production, not to editing. A hook sheet — three candidate openings written before shooting — is arguably the single most valuable pre-production artifact for platform-native video, because it forces the maker to design for the scroll.
Location scouting, permissions, release forms, and risk assessment round out the pre-production kit. Good pre-production is ninety percent boring paperwork that buys ten percent creative freedom on the day.
Chapter 3 — The Camera: Exposure, Focal Length, Framing
A camera is a box that trades three variables for a single image: aperture, shutter, and sensitivity. Aperture, measured in f-stops, controls how wide the iris opens; wide apertures (small f-numbers like f/2) let in more light and throw the background out of focus, while narrow apertures (large f-numbers like f/11) let in less light and keep more of the scene sharp. Shutter speed controls how long each frame is exposed. For video, the traditional rule is to set the shutter to roughly twice the frame rate — 1/50 second at 25 fps, 1/60 at 30 fps — which yields the familiar cinematic motion blur. Going faster produces a crisp, strobing look associated with news and sports; going slower produces dreamy smear. Sensitivity, or ISO, is the sensor’s gain; raising it brightens the image at the cost of noise. Blain Brown’s Cinematography: Theory and Practice describes this trio as the exposure triangle, and the cinematographer’s craft as knowing which corner to sacrifice for the shot in front of them.
Modern cameras add two digital controls that behave like exposure tools: white balance and picture profile. White balance tells the camera what “white” looks like under the current light — tungsten bulbs look orange compared to daylight, fluorescents skew green — and setting it wrongly tints the entire frame. Picture profiles (Rec.709, S-Log3, V-Log, Cine D) choose how the sensor’s raw data is mapped into the recorded file; flat log profiles preserve more highlight and shadow information at the cost of requiring color grading later, while standard profiles look ready-to-post straight out of the camera. For a student one-person crew shooting for social platforms, a standard Rec.709 profile with a neutral look often beats a log profile that will get crushed by social-platform compression anyway.
Focal length is the other axis. A short focal length (a wide lens, say 24 mm on a full-frame camera) takes in a large field of view, exaggerates depth, stretches faces near the edge of the frame, and makes the viewer feel inside the space. A long focal length (a telephoto, say 85 mm or 135 mm) narrows the field of view, compresses distance so near and far feel stacked, and flatters faces. The choice is not decorative; it builds the psychological geography of the shot. An interview on an 85 mm lens feels intimate and contained; the same interview on a 24 mm feels confrontational and exposed.
Framing is a language, and every language has common words. The extreme wide shot places a subject in an environment and tells the viewer “here is where we are.” The wide shot shows the whole body with some context. The medium shot cuts at the waist and is the workhorse of dialogue. The medium close-up cuts at the chest and is the workhorse of the talking head. The close-up isolates a face or an object; the extreme close-up isolates an eye, a mouth, a detail — and by detaching it from context, it asks the viewer to read meaning into it. The rule for learners is that a scene usually wants a variety of sizes so an editor has material to build rhythm, and that the same size held for too long fatigues the eye.
Learning to expose by eye — watching highlights for clipping, watching shadows for crushed blacks, reading a histogram or false-color display — matters more than learning any single camera’s menu system. A well-exposed, well-framed shot on a mid-range phone routinely beats a sloppy shot on an expensive camera.
Chapter 4 — Composition for the Moving Image
Composition is the arrangement of visual elements inside the frame. For the moving image it is also the arrangement of those elements across frames, because the eye remembers where things were and tracks where they are going. Bruce Block organizes the problem around five visual components — space, line, shape, tone, and color — and argues that a story’s visual intensity should follow its narrative intensity. A calm scene uses soft lines, balanced shapes, reduced tonal contrast, and narrow color palettes; a climactic scene uses sharp diagonals, unstable shapes, wide contrast, and saturated opposition. Design the components deliberately and the image supports the story even when the audience cannot name what they are seeing.
The classical guides still apply. The rule of thirds divides the frame into a three-by-three grid and suggests placing key subjects on the intersections rather than dead center, because the eye prefers asymmetry. Leading lines — a road, a corridor, a row of windows — draw the eye toward the subject and carry meaning about direction and momentum. Negative space isolates a subject and gives a shot room to breathe. Frame within a frame — a doorway, a window, a rearview mirror — nests the subject and adds layers of looking. These are heuristics, not laws; centered framing can be powerful (Wes Anderson has built a career on it) and cluttered frames can be intentional.
The 180-degree rule, or line of action, is the one compositional law that crosses into editing. Draw an imaginary line between two subjects in conversation; keep the camera on one side of it; and their eye-lines will remain consistent from cut to cut. Cross the line and the viewer suddenly thinks the subjects have swapped places. The rule can be broken, but breaking it requires a motivated camera move that carries the viewer across, otherwise the geography of the scene collapses.
Headroom, lead room, and eye-line complete the practical vocabulary. Headroom is the space between the top of the subject’s head and the top of the frame; too much feels weightless, too little feels claustrophobic. Lead room is the space in front of a subject looking or moving in a direction; leaving that room honors the direction of attention. Eye-line is where the subject is looking; for a direct-address piece the eye-line should be close to the lens so the viewer feels addressed, while for an interview the eye-line goes to an interviewer just off-camera so the viewer feels invited to listen in.
Vertical framing is its own compositional regime. A 9:16 frame privileges the human body over the landscape, removes most horizontal lead room, and pushes the maker toward close work, centered subjects, and stacked graphics. Designing for vertical means designing against the habits of a century of widescreen cinema; Vimeo’s Video School and Frame.io’s blog have both argued that the best vertical work accepts the format rather than cropping-in from horizontal masters.
Chapter 5 — Lighting for Video
Light is the material cinematography sculpts. Without controlled light, the camera records whatever the room happens to give it, and most rooms give it something ugly. The craft is to shape light intentionally, whether by adding fixtures, modifying existing sources, or scheduling around the sun.
The starting vocabulary is three-point lighting. A key light is the main source, placed at roughly forty-five degrees to the subject, and it establishes the dominant direction and quality of light. A fill light sits opposite the key, softer and dimmer, and reduces the shadow depth on the shadow side of the face. A backlight (or rim light) sits behind the subject and edges them with a thin line of light that separates them from the background. Varying the key-to-fill ratio controls mood: a 2:1 ratio looks gentle and documentary; an 8:1 ratio looks cinematic and dramatic. Backlight is what stops subjects from disappearing into dark backgrounds.
Quality matters as much as quantity. A hard source — bare bulb, direct sun, small LED panel held close — throws sharp-edged shadows and reveals skin texture. A soft source — sun through clouds, large softbox, bounce off a white wall — wraps around the face and flatters. Softness is a function of the source’s apparent size from the subject’s point of view: a small source held far away is hard; a large source held close is soft. Bouncing a light off a wall, diffusing it through a shower curtain, or draping it with a silk frame are all ways of trading intensity for softness.
Color temperature is the second control. Tungsten bulbs are warm, around 3200 K; daylight is cooler, around 5600 K; fluorescents and older LEDs sit at mixed, often greenish temperatures. Mixing sources of different color temperatures in the same shot produces muddled skin tones unless the mix is intentional. The working rule is to match the camera’s white balance to the dominant source and either gel the other sources to match or embrace the contrast as a creative choice.
Natural light is free, abundant, and uncontrollable. The golden hour, the hour after sunrise and before sunset, gives warm, low, directional light that flatters most subjects. Blue hour, just before sunrise and after sunset, gives cool, soft ambient light with long exposures. Midday sun is harsh and top-lit and usually wants diffusion or shade. Overcast days are gigantic softboxes and are the unsung friend of the one-person documentary crew.
For social-video production the practical kit is modest: a single LED panel, a bounce card, a roll of neutral-density gel for windows, and a lamp borrowed from the room. The goal is consistency from shot to shot so an editor can cut between them. Shooting a reference frame with a gray card at the start of every setup saves hours in post.
Chapter 6 — Capturing Sound: Dialogue, Ambient, Foley
Amateur video is usually recognizable first by its audio. Viewers will forgive a slightly soft image long before they forgive a scratchy, echoey voice, because bad sound actively hurts to listen to. The rule that Ascher and Pincus repeat — sound is half the picture — is not a slogan; it is an accurate accounting of cognitive load.
Three microphone types cover most production needs. A shotgun microphone is a directional condenser on a boom pole; it captures what it is pointed at and rejects most of what it is not. Shotguns are the default for on-location dialogue because they can sit out of frame while still being close to the speaker. A lavalier, or lav, is a small omnidirectional mic clipped to the subject’s clothing. Lavs give intimate, consistent level regardless of head movement and are the default for talking heads and run-and-gun interviews. A cardioid handheld is the stage and interview microphone; it favors the source in front of it and is less fussy about placement.
Proximity is the single biggest determinant of sound quality. Doubling the distance between microphone and source quarters the source level relative to room noise. A shotgun eighteen inches above the subject will always beat a shotgun six feet above the subject, all else equal. For lavaliers, clipping at mid-chest — around a hand’s width below the chin — balances clarity and clothing noise.
Monitoring is non-negotiable. A set of closed-back headphones plugged into the recorder lets the operator hear what the camera is actually recording: the hum from an air conditioner, the clothing rustle, the neighbor’s lawnmower, the moment the battery dies. Video makers who shoot without headphones are gambling, and the house usually wins. Levels should peak around −12 to −6 dBFS on the meter, leaving headroom for surprise shouts.
Beyond dialogue, a video needs room tone and ambient tracks. Room tone is thirty seconds of silence recorded in the actual shooting location with everyone still, used by the editor to patch over edits in dialogue so the audio bed does not jump. Ambient is a longer wild recording of the environment — a café murmur, a street, a park — that gives the scene a sonic place. Skipping these feels efficient on set and ruins the edit.
Foley and sound effects are the third layer. Foley is the craft of re-recording performance sound — footsteps, cloth movement, prop handling — in post, often because the production sound was busy with dialogue. Ric Viers, in The Sound Effects Bible, organizes the craft around libraries, recordists, and the design discipline of matching effect to action. For a social-video student the practical takeaway is smaller: a phone recording of a door closing, matched carefully to the picture, can rescue a scene where the production door sounded like a cardboard box. Treat every off-camera sound as a design decision, not an accident.
Chapter 7 — Sound Design and Music
Sound design is the organization of the audio channels so they carry meaning in parallel with the picture. Where the cinematographer composes the frame, the sound designer composes the soundscape, and the good ones treat silence as a color in the palette. A common working structure divides the audio mix into four layers: dialogue, ambience, sound effects, and music. Each layer has its own submix and its own logic, and the final stereo bounce is a balance across the four.
Dialogue sits at the top of the hierarchy because it carries information the viewer cannot afford to miss. In practice this means the dialogue bus is gently compressed (a 3:1 ratio, a slow attack, a medium release) so loud and soft phrases sit within a few dB of each other; equalized to remove mud around 200–400 Hz and to add presence around 3–5 kHz; and automated so it ducks under music in the seconds it is not playing and rises clear during lines. For broadcast and most streaming platforms a final dialogue loudness of −16 to −23 LUFS integrated is a reasonable target; platform specifications differ, but the principle is stable: dialogue should be comfortably above the noise floor without ever fighting the music.
Ambience glues cuts together. When two shots from a café scene are joined, a continuous ambience track underneath hides the seam. When the scene changes location, the ambience changes, signaling the move to the viewer even before the image confirms it. A designer who treats ambience as a character gets a richer piece than one who treats it as filler.
Sound effects are the punctuation marks of the mix. A door slam on the downbeat of a cut hits harder than the same slam two frames late. The design question is always what does the viewer need to feel here, and whether that feeling is achieved best by a literal effect, an exaggerated one, or a substituted one. A punch in a fight scene is almost never a fist hitting a face; it is usually a combination of a celery stalk snapping, a thud on a leather couch, and a low-end whoomp. Honesty in sound design is not the same thing as realism.
Music is the most dangerous layer because it is the quickest to feel wrong. A well-chosen cue underlines what the picture is already doing; a poorly chosen cue argues with it. The working rule is to let the picture suggest the feeling first and to add music only when it raises the stakes the picture has already set. For social video the practical question is licensing: platform-safe music libraries (Epidemic Sound, Artlist, Musicbed, and the free tiers inside YouTube’s audio library) supply cleared tracks that will not trigger copyright claims. Using a pop song without a license is not a neutral risk; it is a choice to let the platform mute, demonetize, or take down the video.
Silence deserves its own paragraph. A moment with no music and minimal ambience demands attention and signals importance. Used often, silence becomes boring; used once in a three-minute piece, it becomes a room the viewer leans into.
Chapter 8 — Editing Theory: Continuity, Montage, Rhythm
Editing is the invisible craft. A good cut passes under the viewer’s conscious radar while doing three things at once: advancing the story, preserving the geography of the scene, and controlling the rhythm of attention. Walter Murch’s famous rule of six ranks the criteria a cut should satisfy, in descending order of importance: emotion (51 percent), story (23 percent), rhythm (10 percent), eye-trace (7 percent), two-dimensional plane of the screen (5 percent), three-dimensional space of action (4 percent). The numbers are approximate but the ordering is instructive: a cut that preserves spatial continuity at the cost of emotion is the wrong cut.
Continuity editing is the classical Hollywood system. Cuts on action — a subject begins to sit in shot A and completes the sitting in shot B — hide the join under the motion. Match cuts line up shape, color, or motion across cuts to create a sense of connection. The eye-line match uses a subject’s gaze in one shot and the thing they are looking at in the next to build spatial logic without an establishing wide. The shot / reverse-shot pattern carries dialogue by alternating angles on the two speakers. Continuity editing works because it respects the viewer’s instinct to build a coherent mental model of the scene.
Montage, by contrast, foregrounds the cut. Sergei Eisenstein argued that meaning in cinema is produced by the collision between shots; two images in sequence generate a third idea that neither image holds alone. A shot of a soldier and a shot of a baby carriage rolling down stairs become, together, a scene about the collapse of civilian life. Montage is not merely fast cutting; it is the use of juxtaposition as an argument. Short-form social video lives in this tradition more than the continuity tradition: a TikTok essay about a city cuts between disparate images whose logic is thematic, not geographic.
J-cuts and L-cuts are the workhorses of dialogue editing. In a J-cut the audio of the next shot arrives before the picture; in an L-cut the audio of the previous shot continues after the picture has changed. Used gently, they hide the seams of a conversation and give the viewer’s ear the lead. A cut that changes picture and sound together is a hard cut; cutting picture and sound independently is the difference between a teenager’s first edit and a working professional’s.
Rhythm is where editing becomes music. Each shot has a natural length — too short and the viewer has not yet understood what they are looking at, too long and the viewer has understood and grown bored. The skill is feeling that length in relation to the surrounding shots. Murch describes editing as closer to dance than to writing: the editor moves with the material, not against it. A practical discipline is to view a cut with the picture muted, then with the sound muted, to test whether the rhythm of each channel holds on its own. If the picture rhythm collapses without sound, the sound is doing the editor’s job.
Chapter 9 — The NLE Workflow: Premiere Pro and DaVinci Resolve
A non-linear editor, or NLE, is the software that turns the pile of footage into a finished piece. Adobe Premiere Pro and Blackmagic DaVinci Resolve are the two most common choices for social-video work; Final Cut Pro is a third for Mac-only teams. The tools differ in their user interfaces and in which parts of the pipeline they do best, but the conceptual workflow is shared, and learning it once translates across NLEs with manageable friction.
The first step is ingest and organize. Every shoot produces a nested mess of camera cards, audio recorder cards, and reference files. The editor copies those files into a project folder structured by type — footage, audio, graphics, music, stills, exports — and backs the folder up to at least one other drive. Inside the NLE, clips are imported into bins, which are folders in the project. Bin structure matters: an editor who can find a specific clip in three seconds edits three times faster than one who cannot. Renaming clips with meaningful labels (INT_CAFE_MS_SARAH_01) beats leaving them as A001C0012.
The second step is assembly. The editor lays out the rough order of shots on a timeline. The rough assembly is always too long and that is fine; the point is to see the shape. Premiere calls its timeline by that name; Resolve calls it the Edit page timeline. Both offer three-point editing: choose a clip with in/out points in the source viewer, mark a target point on the timeline, and insert or overwrite. Keyboard shortcuts — J, K, L for play controls; I and O for in and out; the comma and period keys for insert and overwrite — are the difference between editing in flow and fighting the mouse.
The third step is rough cut to fine cut. The rough cut answers the question “does the story work.” The fine cut answers “is each cut the right cut.” Fine-cutting means tightening the head and tail of each clip by a few frames, replacing a weaker angle with a stronger one, and trimming through the lens of Murch’s rule of six. Resolve has a dedicated Cut page designed for fast rough assembly; Premiere users lean on keyboard-driven trimming in the Program monitor. Either way, the craft is more important than the tool.
The fourth step is sound design and mix. Multi-track audio lives on dedicated channels beneath the picture: one or two for dialogue, one or two for ambience, several for effects, a stereo pair for music. Resolve’s Fairlight page and Premiere’s Audio Workspace give dedicated mixing interfaces; an EQ, a compressor, and a limiter on each channel are the baseline processing chain. Automation lanes let the editor draw volume over time so music ducks under dialogue without having to hand-ride every frame.
The fifth step is color, treated in its own chapter below. The sixth is graphics and titling, treated in Chapter 11. The seventh is export. Export settings should match the destination: H.264 in an MP4 container for most social platforms, with a bitrate appropriate to the resolution (a commonly cited target is around 10–20 Mbps for 1080p and 35–45 Mbps for 4K), a frame rate matching the source, and audio at 48 kHz, stereo, 192 kbps AAC or higher. Resolve and Premiere both ship with platform-named presets, and those presets are usually a sensible starting point.
The last habit of a professional NLE workflow is back up everything — project file, source media, and exports across at least two drives, ideally with a third cloud copy.
Chapter 10 — Color Correction and Grading
Color work has two stages, and conflating them is the commonest beginner mistake. Correction is the technical job of making the footage look the way it would look if the production had gone perfectly: matching white balance across shots, fixing exposure drift, removing casts, and placing the image properly on the waveform and vectorscope. Grading is the creative job of shaping the look: the palette, the contrast curve, the mood. A finished piece has been corrected first and graded second.
Scopes are the reliable witness. A waveform monitor plots luminance against horizontal screen position, showing where highlights clip (above 100 IRE) and where shadows crush (below 0 IRE). A vectorscope plots the chroma information on a wheel whose angle is hue and whose radius is saturation; skin tones fall along a characteristic skin-tone line between red and yellow, and deviations from that line reveal a cast. A parade displays the red, green, and blue channels side by side, making white-balance errors obvious. Learning to read these instruments separates intuition from guesswork and protects against a monitor that lies.
The standard correction workflow begins with primary adjustments: lift, gamma, and gain (or shadows, mids, and highlights), plus an overall contrast and saturation pass. Lift raises the darkest values; gain raises the brightest; gamma bends the middle. A color wheel UI lets the colorist push each range toward or away from a hue while adjusting its level. After primaries come secondary adjustments: masks and qualifiers that isolate a specific element — a face, a sky, a jacket — and apply a targeted change without affecting the rest of the frame.
DaVinci Resolve is the widely acknowledged leader in this part of the pipeline. Its Color page uses a node-based graph in which each node is a grade layer; the colorist routes the image through a sequence of corrections and compositing operations, turning nodes on and off to compare states. The node graph makes complex work auditable: a year later, a returning colorist can see exactly which nodes did what. Premiere Pro’s Lumetri Color panel is simpler and sufficient for most social-video needs, with tabs for basic correction, creative looks, curves, wheels, HSL secondaries, and vignettes.
Grading, the creative stage, is where look comes in. A classic teal-and-orange grade pushes shadows toward cyan and highlights toward warm skin, exploiting a complementary-color relationship to pop faces off of backgrounds. A high-key grade lifts blacks and drops contrast for a soft, airy feel. A bleach-bypass grade desaturates and bumps contrast for a gritty documentary tone. Whatever the look, it should survive the platform’s compression; heavy saturation and delicate gradients tend to posterize on low-bitrate playback, so the grade that reads well on a color-graded monitor may need to be dialed back for a feed.
A final word on consistency. The viewer should not notice that the grade exists; they should notice how the scene feels. Rapid shifts in tone, color temperature, or contrast across cuts break the spell. The discipline of matching shots — using reference frames, scopes, and split-screen comparisons — is unglamorous and is where the bulk of a colorist’s day actually goes.
Chapter 11 — Motion Graphics and Titling
Type on screen is a design problem. A title that appears at the wrong moment, at the wrong size, in the wrong typeface, carrying the wrong message, will damage an otherwise strong piece more than a mediocre cut would. Motion graphics — animated typography, lower thirds, transitions, end cards, kinetic text — are a whole sub-discipline of video design, and for DAC 202 the goal is to establish a workable vocabulary rather than master the field.
Adobe After Effects is the dominant tool for motion graphics; Apple Motion, Blackmagic Fusion (now embedded inside Resolve), and Blender’s video editor are the main alternatives. The shared mental model is the layer timeline: every element sits on its own layer, each layer has properties (position, scale, rotation, opacity, anchor point, and effect parameters), and each property can be keyframed — given values at specific times so that it interpolates between them. The most important early skill is easing: setting the interpolation between keyframes so motion starts slowly, accelerates, and settles, rather than jerking linearly. Ease-in, ease-out, and ease-in-out curves make the difference between motion that feels intentional and motion that feels like a default preset.
Typography for the moving image follows its own rules. A typeface that reads well in print may smear on a compressed video feed; sans-serif faces with open counters and generous x-heights tend to survive better. Size should be read at the smallest expected playback device: a title that is comfortable on a desktop monitor may be unreadable on a phone. Contrast against the background matters as much as color choice: a drop shadow, a subtle background plate, or a slight blur behind the text keeps letterforms legible over busy footage. The safe area — roughly the inner 90 percent of the frame — protects critical text from being clipped by platform UI or letterboxing.
Lower thirds — the name-and-title cards over interview footage — are the workhorse of factual video. A good lower third enters on a motivated beat (the subject begins to speak), holds long enough to be read twice at a comfortable pace, and exits before it overstays. Animation should support the reading, not distract from it: a gentle slide with an ease is almost always better than a spin, a bounce, or a particle explosion.
Kinetic typography — text that animates in sync with speech — is a social-video staple. Tools like Premiere Pro’s Essential Graphics panel and Resolve’s Text+ can auto-generate captions from audio, and creators then style and animate the result. The craft here is rhythm: the text should land on the emphasized word, not drift across a generic timeline. Overusing kinetic type (every word highlighted, every line bouncing) turns the screen into visual noise; using it for moments of emphasis preserves its power.
A final reminder on restraint. A piece with fewer, better graphic moments almost always beats a piece with many mediocre ones. The test is whether each graphic earns its place by doing something the picture and sound cannot do alone.
Chapter 12 — Designing for Digital Platforms
A video is not a generic object that exists in the abstract; it exists inside a platform, and each platform imposes a grammar. TikTok, Instagram Reels, YouTube Shorts, YouTube long-form, Vimeo, LinkedIn, and a brand’s own website each prefer different aspect ratios, durations, sound defaults, caption habits, thumbnail conventions, and algorithmic reward structures. A designer who treats these as deployment details rather than design constraints will produce work that technically plays everywhere and lands well nowhere.
Aspect ratio is the first constraint. Horizontal 16:9 remains the default for YouTube long-form, desktop viewing, and most embedded players. Vertical 9:16 dominates TikTok, Reels, and Shorts, and has its own compositional logic discussed in Chapter 4. Square 1:1 is a compromise for platforms where viewers may hold a phone in either orientation and for feeds that still auto-crop. A piece shot 16:9 and cropped to 9:16 is usually worse than a piece shot 9:16 in the first place, because the camera operator framed the wrong thing. When a piece needs multiple aspect ratios, the protective strategy is to shoot at a higher resolution than the delivery target (4K for a 1080p deliverable) with safe areas for all intended crops marked on the monitor.
Duration is the second. Short-form platforms reward retention percentage; a thirty-second video watched to 90 percent outperforms a three-minute video watched to 40 percent. The design implication is that every second has to earn its place, and that padding is punished. For long-form YouTube the calculus inverts: watch-time in absolute minutes contributes to ranking, so a ten-minute video that holds attention can outperform a three-minute one. Vimeo has historically been the platform most tolerant of long, quiet, crafted work, because its audience opts in for that. A maker should ask, before the first shot, which platform the piece is for, and design the duration accordingly.
Sound defaults shape the opening seconds. TikTok plays with sound on. Instagram feeds often do not. A piece designed for a feed with sound-off defaults needs to communicate within the first three seconds using visual and captioned information alone, with the audio as an additional reward for those who unmute. Captions, discussed below, become part of the design rather than a post-production afterthought.
Algorithmic affordances matter even though the algorithm is a moving target. Hooks — the first second of a video — matter disproportionately because the viewer’s decision to keep watching is usually made there. A hook can be a visual surprise, a direct-address question, a pattern break, a pre-announcement of a payoff, or a cold-open scene. A retention curve — the platform-provided graph of how many viewers are still watching at each second — is the diagnostic tool; sharp drops reveal exactly where the design failed.
Thumbnails and cover images are the still-image argument for clicking. A YouTube thumbnail competes in a grid against a dozen others, and the best thumbnails combine a high-contrast subject, a legible piece of text, and an expression or situation that promises emotion. A Vimeo cover image is a more restrained affair. A TikTok cover sits inside a profile grid and is most often read as part of the grid’s overall look, so brands and creators design the grid, not just individual covers.
Branding and tone carry across platforms. The visual identity — typeface, color palette, lower-third style, sonic logo — should feel related across channels. Consistency is how an audience learns to recognize a maker on a feed full of strangers.
Chapter 13 — Accessibility, Distribution, and Analytics
A piece that cannot be consumed by everyone who wants to consume it is a piece whose design is incomplete. Accessibility is partly a legal requirement — the Accessibility for Ontarians with Disabilities Act and the European Accessibility Act impose obligations that apply to much public-facing video — and partly an ethical and practical one, because accessible video is almost always better video for everyone.
Captions are the first accessibility layer. Closed captions are a separate text track that a viewer can toggle; open captions are burned into the picture and cannot be turned off. For platforms that reliably support closed captions (YouTube, Vimeo), uploading a separate SRT or VTT file is the right move; for platforms whose caption support is weaker, or for pieces that rely on caption styling as part of the design, open captions become necessary. Auto-generated captions are a useful starting point and a dangerous final product: they miss proper nouns, mishear technical terms, and misplace punctuation. Every auto-generated transcript should be reviewed and corrected by a human who knows the content. Caption quality is also a search and retention issue: platforms surface video via caption text, and viewers with sound-off retain better when captions are accurate and well-paced.
Audio description is the second layer. A separate narration track, inserted into gaps between dialogue, describes visual information a blind or low-vision viewer would otherwise miss — who is on screen, what they are doing, where they are. For a dialogue-driven piece the effort is moderate; for a visually driven montage it is substantial and should be budgeted from the start. The WCAG guidelines give technical targets; platforms like YouTube allow separate audio tracks that viewers can select.
Color and contrast round out the accessibility basics. Text on video should meet WCAG contrast ratios (4.5:1 for normal text, 3:1 for large text) against the parts of the background it actually sits on, not against an average. Colorblind viewers will miss information carried by hue alone; a design that uses red and green to distinguish elements should also use shape, position, or text labels as a secondary channel. Flashing content should respect the three-flashes-per-second limit to avoid triggering photosensitive epilepsy; an otherwise harmless transition effect can be genuinely harmful if it fails this test.
Distribution follows design. A finished piece has a launch plan: where it goes first, where it is cross-posted, what supporting posts frame it, what the caption says, what the thumbnail is, and what calls to action it carries. Cross-posting is not duplication; each platform’s post should be tuned for its native audience, and the same video can carry a different caption, a different thumbnail, and a different hook-trim across platforms.
Analytics is the feedback loop. Platforms provide dashboards — YouTube Studio, Meta Business Suite, TikTok Analytics, Vimeo Stats — that report views, impressions, click-through rates, average watch duration, retention curves, audience demographics, and traffic sources. A maker who reads analytics learns where viewers drop off, which hooks earn a click, which titles get ignored, and which thumbnails beat which. The discipline is to read analytics with curiosity rather than vanity: the numbers are a conversation with the audience, not a scoreboard. A single underperforming video reveals what the audience was not ready for; a single overperforming video reveals what they were.
Ethical responsibility closes the loop. Analytics can be a surveillance tool as much as a feedback tool, and the maker should not reduce every decision to a minimax over engagement metrics. The most durable work respects the subject’s consent, honors the audience’s attention, and says something true. A course in multi-modal communication design is ultimately a course in taking responsibility for what a moving image says to the people who watch it, on whatever screen, in whatever moment it reaches them.