MUSIC 231: Music Cognition
John Brownell
Estimated study time: 40 minutes
Table of contents
The psychology of music draws two kinds of researchers: musicians curious about psychology, and psychologists curious about music. This course examines music from the perspective of the receptor rather than the transmitter — the listener rather than the composer or performer. As John Sloboda wrote in The Musical Mind (1985):
“The reason that most of us take part in musical activity, be it composing, performing, or listening, is that music is capable of arousing in us deep and significant emotions . . . If emotional factors are fundamental to the existence of music, then the fundamental question for a psychological investigation into music is how music is able to affect people.”
The disciplines of musicology and music theory have generally regarded music — especially music of the cultivated European tradition — as consisting of autonomous structures whose aesthetic beauty and value lie in the works themselves. This tendency reached its peak in the mid-twentieth century with the total serialism of Milton Babbitt and his contemporaries, for whom a composition could be analyzed as a self-contained logical object independent of any listener’s experience.
Coincidentally, the mid-twentieth century also saw the beginnings of psychological research into the reception and cognition of music. Memory, perception, and processing of musical information were rightly regarded as uniquely human abilities similar to language skills. The listener was seen as an active participant in the production of musical meaning rather than simply a passive receiver.
The course loosely follows the order of William Forde Thompson’s Music, Thought, and Feeling: Understanding the Psychology of Music (2nd edition, Oxford University Press, 2015).
Chapter 1: Introduction — Areas of Intersection
Thompson identifies three “abiding controversies” in the psychology of music that have shaped the field since its origins. But before exploring those debates, it is worth mapping the broad territory where music and psychology meet. There are at least six major areas of intersection.
Six Areas of Intersection
1. Performance. Music is a performance art embedded in social dynamics. The psychology of musical performance overlaps with sports psychology in significant ways — issues of performance anxiety (colloquially “stage fright”), peak performance, visualization techniques, stress management, and the relationship between arousal and quality of execution all find parallels in athletic competition. Developmental psychology also enters here: how do we acquire musical skills? Most responses to music are learned rather than innate, raising questions about the nature of musical education, talent, and expertise.
2. Perception, Cognition, and Memory. People differ in how they perceive the same auditory phenomenon. Some listeners have exquisite discrimination for pitch, while others struggle to distinguish intervals. The perceptual processes that allow us to parse a complex orchestral texture into separate melodic lines, to recognize a melody transposed to a new key, or to detect a wrong note in a familiar piece — all of these are cognitive achievements that psychologists study. Memory for music raises its own puzzles: why can we recall a song heard decades ago yet forget a spoken sentence within minutes?
3. Music and Language. Music and language share striking surface similarities — both are universal to human cultures, both unfold in time, both use pitch and rhythm, and both are governed by rule systems that speakers or listeners internalize without explicit instruction. Yet there are critical differences. Language conveys concepts and propositions; music conveys something closer to emotion and motion. Language requires that both the speaker and the listener share the same code; music, by contrast, can move a listener profoundly even when the listener knows nothing about the tradition from which a piece comes. The gap between producing and receiving music is far wider than the gap between speaking and hearing a sentence.
4. Social Psychology. Music serves as a powerful marker for social identity and group membership. Musical preferences often correlate more strongly with peer groups, subcultures, and generational cohorts than with any individual personality trait. Music is a prerequisite for certain cultural practices — religious rituals, political rallies, sporting events, coming-of-age ceremonies — and in many societies the choice of what music to listen to is a declaration of identity. Social psychologists study how music facilitates bonding, regulates group emotion, and signals belonging.
5. Music and Emotion. Perhaps the most fundamental intersection. Why does music make us feel things? In the twentieth century, some composers — particularly the serialists — attempted to strip emotion from the compositional process, treating music as pure form, structure, and combinatorial logic. But listeners stubbornly persisted in having emotional responses. The question of how abstract patterns of sound can arouse joy, sadness, tension, or ecstasy remains one of the deepest puzzles in the field.
6. The Problem of Creativity. Where do musical ideas come from? How are they generated? What distinguishes the working process of a Mozart — who seemed to conceive entire movements in a flash — from that of a Beethoven, who laboured over sketchbooks for years? Creativity is notoriously difficult to study scientifically, yet it lies at the very heart of music as a human activity.
The Science of Sound
Before exploring any of these intersections, we need the physical vocabulary. Acoustics — now a subfield of fluid dynamics — studies sound as a mechanical phenomenon: pressure waves propagating through a medium. All sounds are pressure waves, but musical sound has a distinctive property. A tone is a sound dominated by a particular frequency — a periodic vibration that the ear interprets as having a definite pitch. Pure tones (single-frequency sine waves) do not exist in nature; every natural sound is a composite of multiple frequencies. The frequencies above the fundamental are called overtones or harmonics, and their relative strengths give each instrument its characteristic timbre — the quality that allows us to distinguish a violin from a clarinet playing the same pitch at the same loudness.
The human ear is sensitive to frequencies roughly between 20 Hz and 20,000 Hz, though this upper limit declines with age. Within this range, the ear is a remarkably sensitive instrument: it can detect pressure variations as small as 20 micropascals and can discriminate between frequencies differing by as little as 0.2%.
Chapter 2: Musical Building Blocks — Pitch, Intervals, and Consonance
Pitch and Frequency
Pitch is the perceptual correlate of frequency — the psychological experience of how “high” or “low” a sound seems. The relationship between frequency and perceived pitch is approximately logarithmic: doubling the frequency raises the pitch by one octave, and this relationship holds regardless of the starting frequency. The note A above middle C is standardized at 440 Hz (A4); the A one octave higher is 880 Hz, and the A one octave lower is 220 Hz.
This logarithmic relationship is one of the most fundamental facts in music perception. It means that musical intervals — the perceived distance between two pitches — correspond to ratios of frequencies rather than arithmetic differences. A perfect fifth, for instance, corresponds to a frequency ratio of approximately 3:2 regardless of what the starting pitch is.
Intervals and Consonance
The notion that simple frequency ratios produce pleasant combinations of tones goes back to Pythagoras, who is said to have discovered that a string stopped at half its length (ratio 2:1) produces an octave, at two-thirds its length (3:2) a fifth, and at three-quarters (4:3) a fourth. These intervals, perceived as maximally consonant, form the backbone of virtually all musical systems worldwide.
Consonance and dissonance — the perceived stability or instability of tone combinations — are among the most studied phenomena in music psychology. Hermann von Helmholtz proposed in the nineteenth century that dissonance arises from beating — the interference pattern created when two tones of similar frequency are sounded together, producing a pulsating, rough sensation. More recent work by William Sethares has extended this idea, showing that the perceived consonance of an interval depends on the timbre of the instruments producing it: instruments with harmonic overtone series (strings, winds) favour simple-ratio intervals, while instruments with inharmonic spectra (bells, metallophones) favour different tunings entirely.
Timbre
Timbre (from the French, pronounced roughly “TAM-bər”) is sometimes called the “colour” of a sound. It is what distinguishes a trumpet from an oboe playing the same pitch at the same dynamic level. Physically, timbre depends on the relative amplitudes of the harmonics in a tone’s spectrum, the way those harmonics evolve over time (the spectral envelope), and the characteristics of the tone’s attack (onset) and decay.
The attack transient is particularly important for identification. In classic experiments, listeners who heard recordings of instruments with their attacks removed often could not identify the instrument, even when the sustained portion of the tone was otherwise unaltered. This suggests that much of what we think of as an instrument’s “sound” is actually concentrated in the first few milliseconds.
Chapter 3: Scales, Tonality, and the Origins of Music
Scales and Tuning Systems
A scale is an ordered collection of pitches within an octave that provides the raw material for melody and harmony. Scales exist in every musical culture, but their construction varies enormously. The most fundamental question in the psychology of music is: why these particular pitches and not others?
The earliest known tuning system in the Western tradition is Pythagorean tuning, built entirely from stacked perfect fifths (ratio 3:2). Starting from any pitch and ascending by twelve perfect fifths eventually returns to a pitch that is almost — but not quite — the same as the starting pitch raised by seven octaves. The discrepancy, known as the Pythagorean comma (about 23.5 cents, or roughly a quarter of a semitone), plagued Western music for centuries. Pythagorean tuning produces beautifully pure fifths but leaves certain thirds sounding harsh and several keys essentially unusable.
The solution that eventually triumphed was equal temperament, formalized in the early seventeenth century: the Pythagorean comma is divided equally among all twelve semitones, so that each semitone has a frequency ratio of exactly \(2^{1/12} \approx 1.05946\). The result is a system in which every key sounds equally good (or, purists would say, equally imperfect). Equal temperament’s great practical advantage is that it enables transposition — moving a piece from one key to another without any change in the relationships between its tones. Johann Sebastian Bach famously demonstrated this versatility in The Well-Tempered Clavier (1722, 1742), a collection of preludes and fugues in all twenty-four major and minor keys.
Tonal Hierarchy
Not all tones in a scale are created equal. Listeners perceive a hierarchy among the tones, with some sounding more stable, important, or “resolved” than others. The first degree of the scale — the tonic — occupies the apex of this hierarchy; the fifth degree (the dominant) and the third are the next most stable; the remaining diatonic tones come next; and chromatic tones (those outside the scale) are perceived as least stable and most in need of resolution.
This hierarchy is not merely a convention taught in music theory classes — it is a psychological reality that can be measured experimentally. Carol Krumhansl and her colleagues developed the probe-tone method in the 1980s: listeners hear a musical context (a scale, a chord progression, or a short passage) followed by a single “probe” tone, and they rate how well that tone “fits” with what preceded it. The resulting tonal profiles are remarkably consistent across listeners and closely mirror the predictions of music theory. Major-key profiles show a sharp peak on the tonic, a secondary peak on the fifth, and progressively lower ratings for non-diatonic tones.
Krumhansl’s work, published in her landmark book Cognitive Foundations of Musical Pitch (1990), established that listeners internalize tonal hierarchies through exposure to their musical culture — even listeners without formal training show the same profiles, suggesting that statistical regularities in the music they hear are sufficient to shape their perceptual expectations.
Maximally Even Distributions
The specific arrangement of intervals within the most common scales — the diatonic major and minor scales, the pentatonic scale — is not arbitrary. Music theorist John Clough and mathematician Jack Douthett showed that these scales are maximally even distributions of their constituent intervals around the octave. A seven-note diatonic scale distributes its five whole steps and two half steps as evenly as possible within twelve chromatic positions; a five-note pentatonic scale distributes its intervals even more symmetrically. This property may help explain why these particular scales recur independently across so many musical cultures.
Melodic Universals
Cross-cultural surveys of melody reveal certain near-universal constraints. Melodies tend to have a limited range — typically spanning about an octave, rarely more than two. Melodic motion proceeds predominantly by small intervals (steps rather than leaps), and leaps tend to be followed by stepwise motion in the opposite direction. Phrases tend to follow an arch contour, rising and then falling. These regularities likely reflect constraints of both vocal production and perceptual processing.
The Origins of Music
Music is a characteristic behaviour of the human species — and, intriguingly, possibly of a few other species such as humpback whales, certain songbirds, and gibbons. Its universality raises a fundamental evolutionary question: why do humans make music?
Charles Darwin proposed that music evolved through sexual selection, analogous to birdsong: musical ability served as a signal of fitness, and individuals who could sing or play well attracted more mates. In The Descent of Man (1871), Darwin speculated that a musical protolanguage preceded the evolution of spoken language — that our ancestors sang before they spoke.
Steven Mithen developed this idea in The Singing Neanderthals: The Origins of Music, Language, Mind, and Body (2005), proposing that early hominins communicated through a holistic, musical-emotional protolanguage he calls “Hmmmmm” (Holistic, Multi-modal, Manipulative, Musical, and Mimetic). On this view, music and language diverged from a common ancestor: language specialized for referential, propositional communication, while music retained the emotional, social-bonding functions of the original system.
Not everyone agrees that music is an evolutionary adaptation. Steven Pinker famously dismissed music as “auditory cheesecake” — a by-product of cognitive systems that evolved for other purposes (language processing, auditory scene analysis, emotional regulation) that happens to tickle the brain’s pleasure centres. On this view, music has no survival value of its own; it is a technology of pleasure that exploits pre-existing neural machinery.
Others take a middle position. The sociologist and musicologist Ian Cross argues that music’s primary adaptive value lies in its capacity to promote social cohesion — synchronized rhythmic activity (drumming, dancing, singing together) releases endorphins, builds trust, and coordinates group action. Evidence from developmental psychology supports this: infants respond to music earlier and more reliably than to language, and musical play between caregivers and infants appears to be a universal feature of human parenting.
The debate remains unresolved, but the mere fact that every known human culture has music — even cultures that lack many other features of “complex” societies — strongly suggests that music is not a frivolous luxury but something deeply embedded in our biology.
Chapter 4: Music and Emotion
Music’s capacity to arouse emotion is perhaps its most remarkable and least understood property. Two weeks of the course are devoted to this topic — a measure of both its importance and its complexity.
The Philosophical Background
The idea that music expresses or arouses emotion has a long philosophical history. The ancient Greeks attributed powerful emotional effects to different musical modes — the Dorian mode was considered dignified and martial, the Phrygian ecstatic, the Mixolydian plaintive — and Plato warned in the Republic that the wrong kinds of music could corrupt the character of citizens.
In the modern era, the philosopher Susanne Langer argued in Philosophy in a New Key (1942) that music is a “presentational symbol” of the forms of feeling — not the feelings themselves, but their temporal shapes: tension and release, rising and falling, acceleration and deceleration. Music does not say “I am sad” the way a sentence does; rather, its temporal structure isomorphically resembles the dynamic forms of human emotional experience.
How Does Music Arouse Emotion?
Patrik Juslin and Daniel Västfjäll proposed the BRECVEMA model (later expanded to BRECVEMA+S), identifying eight psychological mechanisms by which music can evoke emotional responses:
Brain stem reflex — Sudden, loud, or dissonant sounds trigger an automatic startle or arousal response mediated by the brain stem. This is the most primitive mechanism, shared with other animals, and does not require any musical knowledge.
Rhythmic entrainment — The listener’s body synchronizes with the pulse of the music. Strong, regular rhythms can accelerate or decelerate heart rate, breathing, and motor activity. Entrainment is why we tap our feet, bob our heads, and feel energized by fast tempos.
Evaluative conditioning — A piece of music becomes associated with a positive or negative stimulus through repeated pairing. The music played at your wedding, the song that was on the radio during a car accident — these acquire emotional valence through Pavlovian conditioning, regardless of their intrinsic musical properties.
Contagion — Music that sounds like an emotional vocal expression tends to induce a corresponding emotion in the listener. A slow, descending melody with a narrow range and soft dynamics mimics the vocal characteristics of sadness and tends to make listeners feel sad. This mechanism likely depends on the mirror neuron system.
Visual imagery — Music may evoke visual images (a stormy sea, a pastoral landscape, a funeral procession) that in turn arouse emotion. Program music and film music exploit this mechanism explicitly, but even abstract music can trigger imagery in susceptible listeners.
Episodic memory — A piece of music may trigger a vivid autobiographical memory, and the emotion felt is really the emotion associated with the recalled event. This is the “our song” phenomenon — the music itself is merely the trigger.
Musical expectancy — As listeners familiar with a musical style develop expectations about what will happen next, violations of those expectations create tension, surprise, and emotional arousal. The resolution of tension is experienced as satisfaction or pleasure. This mechanism depends heavily on the listener’s familiarity with the style and was explored extensively by Leonard Meyer in Emotion and Meaning in Music (1956).
Aesthetic judgement — The listener evaluates the music as beautiful, profound, innovative, or masterful, and this appraisal generates an emotion (admiration, awe, pleasure). This mechanism requires the most cognitive processing and is most dependent on musical expertise.
Emotional Expression versus Emotional Induction
An important distinction must be maintained between music that expresses an emotion (the music “sounds sad”) and music that induces an emotion (the listener feels sad). These are not the same thing, and the relationship between them is complex. Listeners can readily identify the emotional character of a piece — labelling it as happy, sad, angry, or peaceful — without necessarily feeling the corresponding emotion themselves. Conversely, the emotion induced by a piece may differ from the emotion it expresses: a skilled performance of a desperately sad piece may induce admiration or even joy rather than sadness.
The Role of Expectation
Leonard Meyer’s theory of musical emotion, developed in Emotion and Meaning in Music (1956), places expectation at the centre of the listener’s emotional experience. According to Meyer, meaning in music arises when an expected continuation is delayed, redirected, or denied. The delay creates tension; the eventual resolution (or sometimes the permanent denial of resolution) creates the affective response.
David Huron elaborated Meyer’s framework in Sweet Anticipation: Music and the Psychology of Expectation (2006), proposing the ITPRA theory. Huron argues that our response to a musical event is the sum of five temporally distinct reactions: Imagination (pre-outcome anticipation), Tension (pre-outcome arousal), Prediction (the immediate assessment of whether the event matched our prediction), Reaction (the fast, automatic evaluation of the event), and Appraisal (the slower, conscious evaluation). This model explains why the same musical event can produce different emotions depending on the listener’s expectations — a deceptive cadence, for instance, frustrates the prediction response but may delight the appraisal response if the listener finds the surprise aesthetically pleasing.
Chapter 5: Perceiving Musical Structure
Auditory Scene Analysis
In everyday life, our ears are bombarded by a complex mixture of sounds from many sources. The cocktail party is the classic example: dozens of people talking simultaneously, music playing, glasses clinking — yet we can follow a single conversation. How does the auditory system parse this chaos into separate perceptual objects?
Albert Bregman addressed this question in his foundational work Auditory Scene Analysis (1990). Bregman proposed that the auditory system uses two kinds of processes to segregate sound sources: primitive (automatic, pre-attentive) processes that rely on physical regularities in sound, and schema-based (top-down, learned) processes that use knowledge and expectation.
The primitive processes follow several grouping principles analogous to the Gestalt principles in visual perception:
- Proximity in frequency: Tones close in pitch tend to be grouped into the same stream. If alternating high and low tones are presented rapidly, they split into two separate perceptual streams — high and low — rather than being heard as a single alternating melody.
- Proximity in time: Sounds close together in time are grouped together. Wide temporal gaps signal boundaries between groups.
- Timbre similarity: Sounds with similar spectral characteristics (same instrument, same voice) are grouped together.
- Common fate: Sounds that change in the same way at the same time (e.g., all harmonics rising in frequency together) are heard as belonging to the same source.
- Good continuation: A smooth trajectory of pitch or loudness tends to be perceived as a single stream even when other sounds intervene.
Grouping in Music
Composers have exploited these perceptual principles for centuries, often without knowing the science behind them. Bach’s solo violin and cello works create the illusion of multiple simultaneous voices from a single melodic line by rapidly alternating between widely spaced pitch registers — a technique called compound melody or pseudopolyphony. The listener’s auditory system, following the proximity principle, splits the single line into two or more perceptual streams.
The Gestalt principle of closure operates in music as well: if a melody is interrupted by a brief silence or a loud noise, listeners tend to “fill in” the missing portion, perceiving the melody as continuous. Similarly, the principle of common region explains why we hear all the notes within a phrase as belonging together — the phrase boundary (marked by a pause, a cadence, or a change of register) defines the “region.”
Perceiving Rhythm and Meter
The perception of meter — the hierarchical pattern of strong and weak beats — is a constructive act by the listener rather than a simple registration of physical accents. Fred Lerdahl and Ray Jackendoff’s A Generative Theory of Tonal Music (1983) proposed a formal model of how listeners infer metrical structure from musical surfaces, using preference rules analogous to the rules of linguistic syntax.
Listeners can extract meter from remarkably impoverished stimuli. Even a monotone series of identical clicks will be perceived as metrically grouped — typically in twos or threes — a phenomenon known as subjective rhythmization. When real music is heard, cues such as dynamic accents, agogic accents (longer notes), melodic accents (higher or lower notes), and harmonic rhythm (the rate of chord change) all contribute to the perception of meter, though they may sometimes conflict with each other, creating the tension of syncopation.
Chapter 6: Music and the Brain
Neuroscience of Music Processing
The study of how the brain processes music has been revolutionized by neuroimaging technologies — fMRI, PET, EEG, and MEG — that allow researchers to observe the living brain while it listens to, performs, or imagines music.
Music processing is widely distributed across the brain rather than localized to a single “music centre.” This distinguishes it from early expectations based on the language model, where Broca’s area and Wernicke’s area serve relatively specialized functions. Nevertheless, certain brain regions play particularly important roles:
- Primary auditory cortex (Heschl’s gyrus, in the temporal lobe): The first cortical station for auditory processing. Responds to basic features of sound — frequency, intensity, duration.
- Planum temporale: A region adjacent to the primary auditory cortex that is larger in the left hemisphere in most people. In musicians — and especially in those with absolute pitch — the leftward asymmetry of the planum temporale is exaggerated. This region appears to be involved in spectral processing and pitch analysis.
- Superior temporal gyrus: Processes more complex auditory features, including melodic contour, harmonic relationships, and timbre.
- Frontal cortex: Involved in working memory for music, expectation, and the processing of musical syntax — the hierarchical structure of chords and keys. Stefan Koelsch’s work has shown that syntactic violations in music (unexpected chords) elicit an early right anterior negativity (ERAN) in EEG, analogous to the language-related event-related potentials elicited by grammatical violations.
- Cerebellum: Crucial for rhythm perception and production, motor timing, and the coordination of musical performance.
- Limbic system (amygdala, hippocampus, nucleus accumbens): Mediates the emotional responses to music. The nucleus accumbens, a key node of the brain’s reward circuit, shows increased dopamine release during moments of peak musical pleasure — the “chills” or “frisson” that listeners report. This was demonstrated in a landmark study by Valorie Salimpoor and Robert Zatorre (2011) using PET imaging.
Amusia
Amusia — sometimes called “tone-deafness” — is a disorder of music perception that can be either acquired (through brain damage, typically to the right temporal lobe) or congenital (present from birth without any known brain lesion). Individuals with congenital amusia, studied extensively by Isabelle Peretz and her colleagues in Montreal, have difficulty discriminating pitches that differ by less than about two semitones, cannot detect wrong notes in familiar melodies, and do not experience the emotional responses to music that most listeners take for granted.
Remarkably, congenital amusics typically have normal hearing, normal intelligence, and normal language abilities — suggesting that the brain mechanisms for music perception are at least partially independent of those for language and general auditory processing. This dissociation is one of the strongest pieces of evidence for the existence of domain-specific neural circuitry for music.
Musicians’ Brains
Musicians who begin training in early childhood show measurable structural and functional differences in the brain compared to non-musicians. The corpus callosum (the band of fibres connecting the two hemispheres) is larger in musicians, particularly those who started before age seven. Motor cortex areas representing the fingers are enlarged, and the degree of enlargement correlates with the age at which training began. These findings suggest a sensitive period in childhood during which musical training has its greatest impact on brain development — paralleling the sensitive periods observed for language acquisition and visual development.
Chapter 7: Music Acquisition and Development
The Development of Musical Abilities
Musical development begins before birth. The auditory system is functional by the third trimester of pregnancy, and newborns show preferences for musical passages they heard in utero. By two months, infants can detect changes in pitch and rhythm. By six months, they can discriminate between consonant and dissonant intervals and between different meters.
The developmental trajectory of musical abilities reveals an intriguing pattern of initial universality followed by cultural specialization. Infants up to about eight months of age can detect mistuned notes equally well in both their own culture’s scale system and in unfamiliar foreign scales — just as infants can discriminate foreign-language phonemes that adults cannot. By twelve months, this flexibility has narrowed: Western infants become better at detecting mistunings in Western scales and worse at detecting them in, say, Javanese pelog scales. This perceptual narrowing closely parallels the trajectory of language development and suggests that the brain is tuning its perceptual categories to the statistical regularities of the ambient musical environment.
Absolute Pitch
Absolute pitch (AP) — the ability to identify or produce a specific musical pitch without an external reference — is a rare ability possessed by an estimated 1 in 10,000 people in the general population, though it is much more common among professional musicians (estimates range from 1 in 5 to 1 in 20 for conservatory students). AP is strongly associated with early musical training: virtually all possessors of AP began training before age six or seven, and the ability is almost never acquired after puberty.
This pattern has led researchers to propose that AP depends on a critical period — a window of developmental time during which the brain is maximally plastic for acquiring this particular perceptual skill. Diana Deutsch and colleagues have additionally noted that AP is far more common among speakers of tonal languages (Mandarin, Cantonese, Vietnamese), suggesting that early experience with pitch as a linguistic dimension may facilitate the development of AP.
Whether AP confers any general musical advantage is debatable. Some AP possessors report that equal temperament sounds “out of tune” to them, and transposition can be disorienting — they experience a piece in D major as fundamentally different from the same piece in E-flat major, in a way that relative-pitch listeners do not.
Formal Training and Informal Learning
Most musical learning occurs informally — through exposure, enculturation, and play rather than through explicit instruction. By the time a child begins formal music lessons, they have already internalized the tonal hierarchy of their culture, developed expectations about melodic contour and phrase structure, and acquired a repertoire of songs. Formal training builds upon and refines these implicitly acquired competences, developing skills in notation reading, deliberate listening, technical control of an instrument, and explicit theoretical knowledge.
Dina Kirnarskaya, in The Natural Musician: On Abilities, Giftedness, and Talent (2009), distinguishes between productive abilities (creating and performing music) and receptive abilities (perceiving and understanding music), arguing that the two can develop somewhat independently.
Chapter 8: Music and Well-being
Therapeutic Uses of Music
The use of music as a therapeutic tool has ancient roots — the Old Testament describes David soothing King Saul’s distress with harp playing — but music therapy as a formalized clinical discipline dates to the aftermath of World War II, when musicians visited veterans’ hospitals and observed the remarkable effects of music on soldiers suffering from physical and emotional trauma.
Modern music therapy operates through several mechanisms:
- The iso principle: The therapist begins by matching the music to the client’s current emotional state (if the client is agitated, the music begins at a fast tempo and high energy level) and then gradually shifts the music toward the desired state (slowing, softening). This technique exploits the entrainment mechanisms discussed in Chapter 4 — the listener’s physiological state tends to follow the music.
- Distraction and pain management: Music can reduce the subjective intensity of pain by competing for attentional resources. Numerous clinical studies have found that music listening reduces the need for analgesic medication during and after surgical procedures.
- Motor rehabilitation: Rhythmic auditory stimulation (RAS) exploits the tight coupling between auditory rhythm and motor timing to help stroke patients and individuals with Parkinson’s disease improve their gait. Walking to a steady rhythmic stimulus helps regularize step timing and increases walking speed.
- Cognitive stimulation in dementia: One of the most striking clinical observations is that patients with advanced Alzheimer’s disease, who may be unable to recognize family members or carry on a conversation, can often still sing familiar songs from their youth. Musical memory appears to be preserved longer than other forms of memory, possibly because it is stored in distributed networks less vulnerable to the focal degeneration of Alzheimer’s pathology.
Music and Everyday Well-being
Beyond clinical settings, music plays a pervasive role in regulating mood and well-being in everyday life. Diary studies show that people use music strategically: to energize themselves in the morning, to relax after work, to intensify positive moods, and — more controversially — to dwell in negative moods when they are already sad. The latter phenomenon, sometimes called “sad music enjoyment,” has attracted considerable research attention: why do people voluntarily seek out music that makes them feel sad?
Several explanations have been proposed. One is that the sadness induced by music is somehow pleasurable because it is “safe” — there is no real loss, no genuine threat, so the emotion can be savoured aesthetically. Another is that sad music induces not sadness per se but a complex mixture of emotions including nostalgia, tenderness, and a sense of being understood. A third possibility is that sad music triggers the release of prolactin, a hormone associated with consolation and comfort.
Chapter 9: Performing Music
Expertise and Deliberate Practice
What distinguishes an expert musician from a competent amateur? K. Anders Ericsson and his colleagues proposed that the key factor is deliberate practice — practice that is specifically designed to improve performance, involves sustained effort, and targets weaknesses rather than reinforcing strengths. Ericsson’s famous “10,000-hour rule” (later popularized by Malcolm Gladwell) suggests that roughly ten years or 10,000 hours of deliberate practice are required to achieve expert-level performance in any complex skill, including music.
However, the 10,000-hour figure is a rough average, not a guarantee. Individual differences in aptitude, quality of instruction, and type of practice all matter. Some musicians achieve exceptional levels with significantly fewer hours; others practise far more without reaching the highest levels. Recent meta-analyses suggest that deliberate practice accounts for roughly 20–25% of the variance in musical performance quality — important, but far from the whole story.
Performance Anxiety
Music performance anxiety (MPA) — commonly known as stage fright — affects musicians at all levels, from students to internationally acclaimed soloists. Surveys suggest that between 15% and 25% of professional musicians experience debilitating anxiety that interferes with their performance. The violinist Jascha Heifetz reportedly remarked that anyone who claims never to feel nervous before a performance is either lying or not very good.
MPA involves a cluster of cognitive, physiological, and behavioural symptoms: catastrophic thoughts (“I’m going to forget the notes,” “Everyone will hear the mistake”), elevated heart rate and blood pressure, trembling hands, dry mouth, shallow breathing, and — in severe cases — avoidance of performance situations altogether.
Treatment approaches include cognitive-behavioural therapy (addressing the catastrophic thought patterns), beta-blockers (which reduce the physiological symptoms without sedation — controversial but widely used, particularly among orchestral musicians), and performance psychology techniques borrowed from sports: visualization, progressive muscle relaxation, centering routines, and simulated performance practice (rehearsing under conditions that mimic the stress of a concert).
Expressive Performance
A technically perfect but expressionless performance is musically dead. What makes a performance “expressive”? Research by Bruno Repp, Alf Gabrielsson, and others has identified several dimensions of expressive variation:
- Timing: Performers systematically deviate from strict metronomic timing. They slow down at phrase boundaries (ritardando), speed up in passages of increasing intensity, and introduce micro-timing variations that give the music a sense of breath and life. These deviations follow patterns that are remarkably consistent across performers and can be partly predicted by the musical structure.
- Dynamics: Performers shape phrases with crescendos and diminuendos that go beyond the composer’s written instructions, using loudness to highlight structural features and create narrative arc.
- Articulation: The way notes are connected (legato) or separated (staccato), and the precise shaping of each note’s attack and decay.
- Vibrato and timbre: Singers and string players vary the speed and width of vibrato to add emotional colour.
The expressiveness of a performance is not arbitrary — it is constrained by the musical structure — but it is also not fully determined by the score. The space between what is written and what is heard is where artistry lives.
Chapter 10: Composing Music
The Psychology of Musical Creativity
Composing music — generating new musical ideas and shaping them into coherent works — is perhaps the most mysterious of all musical activities. How do new melodies, harmonies, and forms emerge from the composer’s mind?
One influential framework comes from Graham Wallas, who proposed four stages of creative thought in The Art of Thought (1926):
- Preparation — The composer immerses themselves in the problem: studying the genre, analyzing existing works, sketching ideas, exploring possibilities at the instrument. This stage requires expertise — you cannot compose in a style you have not thoroughly absorbed.
- Incubation — The composer steps away from the problem and lets the unconscious mind work. Many composers report that ideas come to them during walks, in the shower, or upon waking — suggesting that some crucial processing occurs outside conscious awareness.
- Illumination — The “aha!” moment when a solution or a compelling idea suddenly appears in consciousness. Mozart famously described (in a letter whose authenticity is disputed) how entire movements would appear to him “all at once,” as if hearing a complete piece in a single instant.
- Verification — The composer evaluates, revises, and refines the idea, testing it against aesthetic criteria and technical constraints.
Not all composers work the same way. The contrast between Mozart and Beethoven is instructive. Mozart’s surviving manuscripts are remarkably clean, suggesting that much of the compositional work happened in his head before he put pen to paper. Beethoven’s sketchbooks, by contrast, reveal an agonizing process of revision — themes are tried, rejected, modified, turned upside down, combined, and gradually hammered into their final form over months or years.
Constraints and Creativity
A paradox of musical creativity is that constraints often facilitate rather than impede the creative process. Composers working within a strict formal framework — sonata form, the twelve-bar blues, the 32-bar AABA song form — often report that the constraints free them to focus on expression rather than architecture. Igor Stravinsky expressed this forcefully: “The more constraints one imposes, the more one frees one’s self of the chains that shackle the spirit.”
This observation aligns with psychological research on creativity more broadly. Studies of problem-solving show that well-defined constraints help focus the search for solutions, reduce the paralysis of infinite possibility, and allow the problem-solver to evaluate candidate solutions more efficiently.
Improvisation
Improvisation — composing in real time during performance — represents the most compressed form of the creative process, where preparation, incubation, illumination, and verification are telescoped into a single moment. Neuroimaging studies of jazz improvisation by Charles Limb and Allen Braun (2008) found that improvisation is associated with increased activity in the medial prefrontal cortex (a region associated with self-expression and autobiographical narrative) and decreased activity in the dorsolateral prefrontal cortex (associated with self-monitoring and inhibition). In other words, improvising musicians appear to “turn off” the inner critic and “turn on” the inner storyteller.
Chapter 11: Music and Other Abilities
The Mozart Effect
In 1993, Frances Rauscher, Gordon Shaw, and Catherine Ky published a study in Nature reporting that college students who listened to ten minutes of a Mozart sonata (K. 448) subsequently performed better on a spatial-temporal reasoning task than students who listened to silence or relaxation instructions. The effect was modest (about 8–9 IQ points on a specific spatial subtest) and temporary (it disappeared after 10–15 minutes), but the media ran with the story, spawning a cottage industry of “Mozart for Babies” products and even prompting the governor of Georgia to distribute free classical music CDs to every newborn in the state.
Subsequent research has substantially qualified the original finding. Many attempted replications have failed, and those that succeeded have generally found the effect to be small and not specific to Mozart — any music that the listener enjoys and finds arousing produces a similar short-term boost in spatial reasoning. The current consensus is that the “Mozart effect” is best explained by arousal and mood: stimulating, enjoyable music puts listeners in a state of heightened arousal and positive mood, which temporarily enhances performance on certain cognitive tasks. There is no evidence that passive listening to music produces lasting gains in intelligence.
Music Training and Cognitive Transfer
A more interesting question is whether active musical training — learning to play an instrument — produces lasting cognitive benefits beyond music itself. The evidence here is more promising, though still debated.
Correlational studies consistently find that musically trained children outperform untrained children on a range of cognitive measures: verbal memory, reading ability, executive function, and spatial reasoning. However, correlation does not imply causation — children who take music lessons may differ from those who do not in family income, parental education, general motivation, and pre-existing cognitive ability.
The strongest evidence comes from a smaller number of randomized controlled trials. E. Glenn Schellenberg’s (2004) study randomly assigned six-year-olds to keyboard lessons, voice lessons, drama lessons, or no lessons for one year and found small but significant IQ gains in the music groups compared to the control groups. Sylvain Moreno and colleagues found that short-term music training improved verbal intelligence and executive function in preschoolers.
The proposed mechanisms for these transfer effects include:
- Executive function: Musical performance demands sustained attention, working memory (holding multiple voices, upcoming passages, and one’s place in the score simultaneously), and inhibitory control (suppressing incorrect responses). Regular exercise of these capacities may strengthen them generally.
- Auditory processing: Musical training sharpens the ability to detect subtle acoustic features — pitch, timing, timbre — and these enhanced perceptual skills may transfer to speech perception, facilitating language learning and reading acquisition (which depends on the ability to discriminate speech sounds).
- Fine motor skills and sensorimotor integration: Playing an instrument trains the precise coordination of multiple effectors (fingers, hands, arms, breath) with auditory feedback, potentially enhancing general motor control and the integration of sensory and motor processing.
Music and Language Revisited
The relationship between music and language has come full circle. Aniruddh Patel, in Music, Language, and the Brain (2008), argues that while music and language rely on distinct representational systems (music does not have semantics in the linguistic sense), they share many processing resources — particularly in the domains of syntax (hierarchical structural processing), prosody (the melodic and rhythmic contour of speech), and working memory. This shared resources hypothesis predicts exactly the kind of transfer effects that researchers have observed: training the brain to process complex musical structures should, as a side effect, improve its ability to process complex linguistic structures.
Daniel Levitin’s popular book This Is Your Brain on Music: The Science of a Human Obsession (2006) brought many of these ideas to a wide audience, arguing that music is not a frivolous entertainment but a fundamental feature of human cognition that engages virtually every area of the brain and touches virtually every aspect of mental life — perception, attention, memory, emotion, motor control, social cognition, and creativity.
Primary text: Thompson, William Forde. Music, Thought, and Feeling: Understanding the Psychology of Music (2nd edition). Oxford University Press, 2015.
Recommended reading: Huron, David. Sweet Anticipation: Music and the Psychology of Expectation. MIT Press, 2007. — Levitin, Daniel J. This Is Your Brain on Music. Dutton, 2006. — Mithen, Steven. The Singing Neanderthals. Harvard University Press, 2005. — Patel, Aniruddh. Music, Language, and the Brain. Oxford University Press, 2008. — Juslin, Patrik N. and John A. Sloboda, eds. Music and Emotion: Theory, Research, Applications. Oxford University Press, 2009. — Kirnarskaya, Dina. The Natural Musician. Oxford University Press, 2009.