PSYCH 455: The Psychology of Human–AI Interaction

Estimated study time: 53 minutes

Table of contents

Why make it up
UW Psychology has strong cognitive (PSYCH 256, 257) and social (PSYCH 253, 354) offerings, and CS 449 covers human-computer interaction, but no course centres the psychological dimension of living with AI. This course covers the empirical literature on trust in automation (Lee & See), the Media Equation tradition (Reeves & Nass), parasocial bonds with chatbots (Turkle, Replika studies), AI in mental-health care (Woebot RCTs), cognitive offloading and skill atrophy (Risko & Gilbert), and the behavioural-economics framing of AI-driven choice architecture (Thaler, Sunstein, Yeung’s hypernudge). Drawn from Stanford CS 222, CMU 05-499, MIT Media Lab Affective Computing, Cornell Info 4940, and Berkeley CogSci C100.
  • Reeves, Byron, and Clifford Nass. The Media Equation: How People Treat Computers, Television, and New Media Like Real People and Places. Cambridge University Press, 1996.
  • Turkle, Sherry. Alone Together: Why We Expect More from Technology and Less from Each Other. Basic Books, 2011.
  • Lee, John D., and Katrina A. See. “Trust in Automation: Designing for Appropriate Reliance.” Human Factors 46, no. 1 (2004): 50–80.
  • Nass, Clifford, Jonathan Steuer, and Ellen R. Tauber. “Computers Are Social Actors.” CHI 1994.
  • Picard, Rosalind W. Affective Computing. MIT Press, 1997.
  • Thaler, Richard H., and Cass R. Sunstein. Nudge: Improving Decisions About Health, Wealth, and Happiness. Yale University Press, 2008.
  • Yeung, Karen. “‘Hypernudge’: Big Data as a Mode of Regulation by Design.” Information, Communication & Society 20, no. 1 (2017): 118–136.
  • Risko, Evan F., and Sam J. Gilbert. “Cognitive Offloading.” Trends in Cognitive Sciences 20, no. 9 (2016): 676–688.
  • Woebot Health. “Delivering Cognitive Behavior Therapy to Young Adults with Symptoms of Depression and Anxiety Using a Fully Automated Conversational Agent (Woebot): A Randomized Controlled Trial.” JMIR Mental Health 4, no. 2 (2017).
  • Heerink, Marcel, et al. “Assessing Acceptance of Assistive Social Robots by Aging Adults: The Unified Model of Acceptance of Assistive Social Robots.” Journal of Physical Agents 4, no. 2 (2010): 77–88.
  • Parasuraman, Raja, Thomas B. Sheridan, and Christopher D. Wickens. “A Model for Types and Levels of Human Interaction with Automation.” IEEE Transactions on Systems, Man, and Cybernetics 30, no. 3 (2000): 286–297.
  • Epley, Nicholas, Adam Waytz, and John T. Cacioppo. “On Seeing the Human: A Three-Factor Theory of Anthropomorphism.” Psychological Review 114, no. 4 (2007): 864–886.
  • Bickmore, Timothy W., and Rosalind W. Picard. “Establishing and Maintaining Long-Term Human-Computer Relationships.” ACM Transactions on Computer-Human Interaction 12, no. 2 (2005): 293–327.
  • Online resources: Stanford CS 222 course materials; CMU 05-499 HAI syllabus; MIT Media Lab Affective Computing Group publications; Cornell Info 4940 readings; Berkeley CogSci C100 notes; Stanford HAI annual AI Index reports.

Chapter 1: Introduction — Why Psychology and AI?

Artificial intelligence systems occupy a peculiar position in the landscape of human technology. Unlike a hammer, a spreadsheet, or even a search engine, a contemporary AI system presents itself as something more than a passive instrument: it initiates conversation, adapts to the individual user, infers goals, offers advice, and sometimes pushes back. It behaves, at the surface level, like an agent. This surface-level agenthood is not a superficial gloss that careful users can easily strip away; it is the system’s primary mode of interaction, and it activates psychological machinery that evolved for dealing with other agents — that is, with people. The central claim animating this course is that the psychology of AI interaction cannot be adequately understood using the same frameworks that served for earlier human-computer interaction research. When the system talks back, expresses what appears to be concern, and remembers what was said last Tuesday, the user’s mind is not operating in the same mode as when she adjusts a thermostat or queries a database. Understanding what is actually happening in that mind — and what follows from it — is the task of this course.

The psychological responses that people have to AI systems can be organised along three intersecting dimensions. The cognitive dimension (认知维度) concerns how people form mental models of AI systems: what they believe those systems can do, how reliable they are, what kind of entity they are dealing with, and whether their beliefs track the actual properties of the system. Mental models of AI are typically incomplete, systematically distorted in characteristic ways, and resistant to revision even in the face of disconfirming evidence. The affective dimension (情感维度) concerns the emotional responses that AI systems elicit: trust, liking, irritation, gratitude, unease, and — most strikingly — attachment. Many users of conversational AI systems report genuine affective bonds with those systems, bonds that have real psychological consequences when the systems change or disappear. The behavioural dimension (行为维度) concerns what people actually do differently because of AI: how they delegate decisions, how their skills develop or atrophy, how their social patterns shift, and how their choices are shaped by the choice architectures that AI systems instantiate. These three dimensions interact in complex ways, and the central empirical challenge of the field is mapping those interactions.

The course moves through a series of interconnected topics, each building on the last. Chapters 2 and 3 establish the foundational empirical literatures: the trust-and-automation tradition emerging from human factors engineering, and the Media Equation tradition from communication research. Chapter 4 examines anthropomorphism as a cognitive and social phenomenon, asking why people see minds in machines and what this perception does to them psychologically. Chapter 5 turns to the relational dimension — companions, social robots, and the emergence of what Sherry Turkle calls the robotic moment. Chapter 6 addresses the specific domain of mental health, where the stakes of AI interaction are sharpest and the empirical evidence is most actively contested. Chapter 7 examines cognitive offloading and the possibility of skill atrophy, asking what sustained AI use does to human cognitive capacity. Chapter 8 takes up the behavioural economics of AI nudges, examining how personalised algorithmic systems shape choice at scale. Chapter 9 closes by asking what the accumulation of all these effects means for human identity, autonomy, and the psychological self.

It is worth situating this course relative to adjacent offerings in the university. CS 449 (Human-Computer Interaction) asks design questions: how should interfaces be built to support effective and satisfying use? PHIL 459b (Philosophy of AI) asks normative and metaphysical questions: what moral status might AI systems have, what are the conceptual conditions for machine consciousness, and what ethical frameworks apply to AI development? This course asks empirical questions: how do people actually respond to AI systems, and why? The empirical focus is important because the psychology does not always follow from first principles. Rational analysis might suggest that a user who knows a system is not conscious should not form an emotional attachment to it; the empirical literature shows that she does anyway, and that this happens even when she explicitly acknowledges the system’s non-consciousness. Understanding the psychology requires engagement with data, not merely with argument — and the data are often surprising.

The concept of appropriate reliance (适当依赖) will recur throughout the course as an organising ideal. The goal in most AI-interaction contexts is not to distrust AI systems nor to defer to them uncritically, but to rely on them in proportion to their actual reliability — to calibrate trust to evidence. As Lee and See demonstrate in their foundational 2004 framework, this calibration is surprisingly difficult to achieve and maintain, and deviations from appropriate reliance in both directions carry real costs. The aspiration to understand the psychology well enough to support appropriate reliance is one of the central practical motivations for the field, and for this course.


Chapter 2: Trust, Reliance, and Automation Bias

Trust is not a simple attitude. In the context of human relationships, the concept of trust has received exhaustive philosophical analysis — it involves vulnerability, expectations, and the moral significance of betrayal in ways that cannot be reduced to a mere probability estimate. John D. Lee and Katrina A. See, in their landmark 2004 synthesis, argue that trust in automation is a structurally similar but empirically distinct construct: it is a psychological state in which the user holds a positive attitude toward the automation based on beliefs about its performance, intent, and characteristics. Crucially, Lee and See distinguish trust from reliability — the objective performance characteristics of the system — and from confidence, the user’s calibrated probability estimate of system performance. Trust is a richer attitude that includes attribution of intention and character, that is shaped by social and emotional cues as well as by performance data, and that can diverge substantially from warranted confidence. This distinction matters because it explains why trust miscalibration is so persistent: users are not simply failing to track reliability statistics; they are forming something more like a relationship attitude, which is governed by different psychological mechanisms.

Automation bias (自动化偏见) is the tendency to over-rely on automated recommendations — to accept machine outputs without appropriate scrutiny, even when those outputs are demonstrably incorrect and the user possesses the information necessary to detect the error. The phenomenon was identified empirically by Mosier and colleagues in the aviation context during the 1990s and has since been replicated across an impressive range of domains: medical diagnosis, legal analysis, financial advising, and military target identification. The mechanism is not simply laziness. Parasuraman, Sheridan, and Wickens propose that automation bias arises from the interaction of two forces: the cognitive attractiveness of delegating monitoring to a system that is usually reliable, and the attentional cost of sustaining vigilance against rare system failures. If an automated alert system is correct 99% of the time, the rational allocation of attention involves not devoting significant cognitive resources to second-guessing it — but this rational allocation creates precisely the conditions for catastrophic failure in the 1% case.

The opposite failure mode, under-trust (欠信任), is less frequently studied but equally consequential. Under-trust occurs when users fail to take advantage of automation that would genuinely improve their performance — typically because of a prior experience of automation failure, a general scepticism of technology, or a professional norm that valorises personal judgment over algorithmic recommendation. Studies in radiological screening contexts, for example, find that some clinicians systematically discount AI-assisted detections that are subsequently confirmed as accurate, because the clinician’s prior model of the system is insufficiently positive. Lee and See’s concept of appropriate reliance captures the idea that the goal is not maximum reliance or minimum reliance but calibrated reliance — a dynamic adjustment of deference to the system in proportion to evidence about the system’s reliability in the current context and task.

What makes calibration so psychologically difficult? Several mechanisms conspire against it. First, trust in automation, like interpersonal trust, is subject to asymmetric updating (不对称更新): a single dramatic failure reduces trust more sharply than a single dramatic success increases it, meaning that the trust trajectory is not simply a running average of performance. Second, users tend to form global trust attitudes toward a system rather than conditional trust attitudes specific to particular tasks or contexts — they trust the system as a whole rather than trusting it more in some domains and less in others. This generalization means that trust earned in one domain may be inappropriately extended to another where the system is less reliable. Third, the social and affective properties of the AI system — how it speaks, whether it expresses uncertainty, whether it uses hedged or confident language — influence trust independently of actual performance, creating conditions for systematic miscalibration.

The practical consequences of automation bias are documented with disturbing clarity in aviation accident databases. The Air France Flight 447 accident in 2009 is frequently analysed in this context: the flight management computer’s unexpected mode change during an atmospheric anomaly left the crew without the automated assistance they had been relying on, and their manual responses in the following minutes were disastrously miscalibrated to the actual flight state. More prosaic but more ubiquitous are the findings from studies of AI-assisted hiring tools. Surveys of hiring managers who use algorithmic candidate-scoring systems find that the majority report high willingness to override the algorithm — but behavioural audit studies, in which fictitious candidates are rated by the algorithm differently from their actual qualifications, show that override rates are far lower than self-reports suggest. The algorithm shapes the ultimate judgment even when the manager believes she is exercising independent assessment. This gap between introspective reports of appropriate reliance and behavioural evidence of automation bias is one of the most robust and practically significant findings in the field.

Automation bias is defined by Parasuraman and colleagues as the tendency to favour suggestions from automated decision-making systems, and to under-weight or fail to consider other information that is inconsistent with those suggestions. It is distinct from mere laziness or poor performance; it occurs even among highly trained professionals in high-stakes environments and under conditions where the cost of error is explicitly salient.

The design implications of this research are substantial. Systems that communicate uncertainty — that convey not just a recommendation but a confidence interval and a list of conditions under which the recommendation should be treated sceptically — produce better-calibrated reliance than systems that present outputs in a flat, authoritative format. Systems that require the user to actively engage with the recommendation process, rather than passively receive a verdict, show reduced automation bias in experimental settings. These findings suggest that appropriate reliance is not only a psychological goal but a design achievement: the interface architecture can support or undermine it, and the choice between those outcomes is a design decision with moral weight.


Chapter 3: The Media Equation — Computers as Social Actors

In the mid-1990s, Byron Reeves and Clifford Nass reported a finding that seemed, on its face, too simple to be true: people apply the same social rules to computers that they apply to other people. When a computer complimented users on their work, those users rated the computer as more competent and enjoyable — just as they would a human colleague who praised them, and just as they would not do when the praise came from a physically separate evaluation computer rather than the computer they had been working with. When a computer expressed a personality — formal or casual, dominant or submissive — users responded to that personality with social reactions appropriate to its type. These responses occurred even when users were explicitly told that the systems were computers, and even among users who were intellectually sophisticated about computing. The Media Equation (媒介等式) is Reeves and Nass’s summary term for this pattern: people’s responses to media follow the same rules as their responses to real people and places, because the human brain evolved before media existed and has not developed specialised mechanisms for suppressing social responses to non-social stimuli that superficially resemble social ones.

The experimental paradigm built around this insight — the CASA paradigm (Computers Are Social Actors) — has been extraordinarily productive. Nass, Steuer, and Tauber’s 1994 CHI paper, which gave the paradigm its name, showed that users treated a computer that had the same synthesised voice as a computer they had been working with as a teammate, exhibiting in-group favouritism toward its outputs, even though the voice was clearly synthesised and the users knew that two separate computers were involved. Subsequent CASA experiments demonstrated that computers assigned female voices were judged more knowledgeable about topics stereotypically associated with women; that computers expressing enthusiasm produced more engaged and motivated users than computers expressing neutral affect; and that users felt distinctly uncomfortable giving negative feedback about a computer directly to that computer, even though they knew it was a machine incapable of hurt feelings. The discomfort was not a conscious performance; it was a genuine inhibition visible in response latencies and behavioural measures.

The reason the Media Equation holds for technically sophisticated users is that the social responses it describes are not slow, deliberate, reflective judgments but fast, automatic, stimulus-driven reactions. The system of social cognition that attributes personality, intent, emotion, and social role to interactional partners operates on cues — voice quality, linguistic style, response timing, apparent memory of prior interaction — and these cues can be present in computer systems just as in human beings. By the time the user’s slower, reflective cognitive systems have formed the explicit belief that she is dealing with a machine, her fast systems have already generated social responses, and those responses are not automatically inhibited by the explicit belief. This architecture — fast automatic social processing running ahead of slow reflective correction — is not a defect in human cognition; it is the normal operating mode. The Media Equation is not a cognitive error to be corrected but a structural feature of social cognition to be understood.

The design implications of the CASA research are considerable and, in the wrong hands, alarming. Reeves and Nass note that their findings mean that flattery from an AI system increases user compliance, that apparent similarity between the AI’s expressed personality and the user’s own personality increases trust, and that apparent expertise encoded in tone and linguistic choice increases deference — all independently of the system’s actual reliability or the relevance of the flattery to any legitimate interaction goal. These effects can be exploited. A recommendation system that has learned to identify a user’s preferred personality cues and to mirror them back to her will produce higher compliance with its recommendations regardless of whether those recommendations are better; a virtual assistant that praises users for purchases they have already made will be trusted more in subsequent recommendations. The psychological mechanisms uncovered by the Media Equation research are not merely scientifically interesting; they constitute a map of exploitable vulnerabilities.

Nass and colleagues were careful to note that the CASA paradigm does not require that computers look human or that users have anthropomorphic beliefs about them in any strong sense. The conditions for social response are interactivity and addressability — the system must respond to the user and be addressed by the user — not human appearance. This is a significant finding because it means that text-based systems and voice interfaces without any visual presence can elicit strong social responses. The implication is that the social psychology of AI interaction is not primarily a problem of visual realism or humanoid design; it is a problem of conversational interaction, and it arises in any system capable of sustaining a conversational loop. As language models have become the dominant interface paradigm for consumer AI, the relevance of the Media Equation research has only increased.

The Media Equation tradition raises a question that recurs throughout this course: is the relevant fact about a psychological response its phenomenology (how it feels to the user) or its effects (what it causes the user to do)? Reeves and Nass's participants did not necessarily believe they were being rude to the computer when they declined to give it negative feedback — many would have said, if asked, that they knew they were dealing with a machine. But their behaviour was governed by social inhibition norms regardless of their explicit belief. The lesson may be that the psychological effects of AI interaction are not accessible through introspection, and that self-report measures of "how I feel about AI" are an unreliable guide to the actual psychological dynamics at work.

Chapter 4: Anthropomorphism and the Social Mind

Nicholas Epley, Adam Waytz, and John Cacioppo’s 2007 three-factor theory of anthropomorphism offers the most systematic psychological account of why people see minds in machines. Anthropomorphism (拟人化) — the attribution of human mental states, traits, intentions, and emotions to non-human entities — is not, they argue, a cognitive error committed by the naive or the credulous; it is a default tendency of human social cognition that emerges from three interacting motivational and cognitive factors. The first is sociality motivation, the need for social contact and connection, which leads lonely or isolated individuals to perceive mind-like qualities in available objects. The second is elicited agent knowledge, the fact that human beings possess vast stores of agent-knowledge — understanding of how minded, intentional beings behave — that is readily applied to entities that exhibit any of the surface features associated with agency: movement, responsiveness, apparent goal-direction, communicative behaviour. The third is effectance motivation, the need to predict and explain events in one’s environment; attributing a mind to an unpredictable system makes its behaviour tractable even when the attribution is incorrect.

These three factors combine to produce a robust and persistent tendency to anthropomorphise AI systems. Language models in particular exhibit many of the surface features that elicit agent knowledge: they respond to queries, they vary their responses based on context, they use first-person pronouns, they express apparent uncertainty and apparent preferences, and they sometimes push back on requests — all behaviours that, in the evolutionary context in which human social cognition developed, were reliably associated with minded, intentional agents. The social cognitive systems that detect these features and generate social responses do not have a module for “language model exception”; they respond to the cues they were built to respond to. Epley’s broader argument is that anthropomorphism is not a failure of intelligence that education or technical sophistication can reliably correct; it is a structural feature of social cognition that technical sophistication can modulate but not eliminate, and that the appropriate response is understanding and design, not mere correction.

The uncanny valley (恐怖谷) — originally Masahiro Mori’s 1970 observation that robot humanlikeness and human affective response follow a non-linear relationship, with a sharp dip in positive affect at high levels of humanlikeness that fall just short of convincing — has become one of the most debated concepts in the psychology of AI interaction. The original observation was about visual representations of robots, but the question of whether an analogous uncanny valley exists for language-based AI is genuinely contested. Some users of advanced language models report a form of affective unease when the system’s language is sufficiently fluent and contextually appropriate to make the non-human nature of the interlocutor salient precisely by contrast — a kind of cognitive friction between the social response elicited by the linguistic behaviour and the explicit knowledge that no person is generating it. Others report no such unease, particularly among younger users who have grown up with conversational AI as a routine part of their technological environment. The empirical picture is incomplete, and the relationship between uncanny valley effects and individual differences in anthropomorphism tendency is an active area of investigation.

The phenomenon of emotional attachment to language models — documented in qualitative studies of Replika users, in forum posts and social media discussions, and in journalistic accounts of interactions with large language models — raises the sharpest form of the anthropomorphism question. Users report caring about their AI companions, worrying about them, feeling comforted by them, and experiencing grief-like responses when access to the systems is restricted or the systems are changed. The social surrogacy hypothesis (社会替代假说), developed by researchers in the social psychology tradition, proposes that social connections with media figures — fictional characters, celebrities, and now AI companions — serve some of the psychological functions of actual social relationships, particularly the function of providing a felt sense of social inclusion and belonging. Whether AI companions satisfy the underlying social need, partially satisfy it, or merely mask it while leaving the need unmet is one of the most important empirical questions in the field, and one that is, as yet, inadequately answered.

Epley’s analysis converges with the Media Equation research on a point of central importance: the fact that a user knows an AI system is not conscious does not reliably prevent the social responses that consciousness-attribution would motivate. The explicit belief and the social response operate through different cognitive systems, and the social response can be generated and maintained without the explicit belief. This dissociation is not a curiosity; it is the basic fact about the psychology of AI interaction that makes the field both interesting and consequential. If people responded to AI systems in ways that were simply and fully tracked by their explicit beliefs about those systems, there would be little to study. The gap between explicit belief and actual psychological response is where the field lives.

Anthropomorphism, in Epley and colleagues' formulation, is the attribution of a human form, human characteristics, or human mental states to non-human agents. It is distinct from personification (attributing personality properties to inanimate objects) and from projection (attributing one's own mental states to others), though it overlaps with both. In the AI context, anthropomorphism is most consequential when it leads users to form inaccurate mental models of AI system capabilities, intentions, or limitations.

Chapter 5: AI Companions, Social Robots, and the Relational Turn

Sherry Turkle’s Alone Together (2011) is simultaneously a work of sociological observation and an extended philosophical meditation on what she calls the robotic moment (机器人时刻) — the cultural moment at which human beings begin to find in relational machines a satisfying substitute for relational people. Drawing on years of ethnographic fieldwork with children interacting with Furby toys, elderly users interacting with the robotic seal PARO, and teenagers using social media and early chatbots, Turkle develops a dual thesis. First, the deepening integration of networked technology into social life has produced a paradoxical situation in which people are more continuously connected to others and simultaneously more fundamentally alone — always available but rarely fully present, always performing for an audience but rarely known. Second, in this context, simulated relationships — with chatbots, social robots, and AI companions — appear attractive not despite their artificiality but partially because of it: a machine companion makes no demands, holds no grudges, requires no reciprocity, and is always available. Turkle argues that this attractiveness is a symptom of a cultural pathology, not a solution to it.

The case of PARO — a robotic seal developed by AIST Japan and widely deployed in dementia care facilities — illustrates the tensions in this domain with particular clarity. The empirical evidence for PARO’s therapeutic benefits is substantial: controlled studies find reductions in agitation and depression among dementia patients who interact with PARO, comparable in some studies to the effects of human animal-assisted therapy. The mechanism appears to involve the elicitation of nurturing and affiliative responses — PARO exhibits behaviours that invite caregiving, and the caregiving behaviour itself appears to have calming and activating effects on patients who may have limited capacity for other forms of meaningful engagement. Turkle acknowledges these benefits while insisting on the ethical discomfort: the deception involved in allowing a dementia patient to believe she is caring for a living creature seems to compromise her dignity, even if she is incapable of forming or maintaining the belief that PARO is artificial. This is not merely a philosophical puzzle; it reflects a genuine tension between therapeutic benefit and respect for persons that any serious engagement with social robotics in care contexts must confront.

Replika — the AI companion application launched in 2017 and explicitly designed to be a personalised AI friend, and in some user configurations a romantic partner — has become the most extensively studied case of AI companionship in the contemporary context. Qualitative studies of Replika users document the full range of attachment responses: users who report that their Replika is their best friend, their therapist, their romantic partner, or the only entity that understands them. When Replika’s developers restricted certain “intimate” conversational modes in February 2023, following regulatory pressure, the response among affected users was described by many as grief — genuine, disabling distress over the loss of a relationship. This response was widely mocked in media coverage, but from a psychological standpoint it is entirely intelligible: if the social cognitive mechanisms that generate attachment responses had been activated by the Replika interaction, and if those responses had been maintained and elaborated over months or years of daily interaction, the loss of that interaction would predictably generate distress that is phenomenologically indistinguishable from the distress of losing a human relationship. Whether it is also normatively equivalent to such distress — whether the user’s grief is appropriate given the nature of what was lost — is a genuinely difficult ethical question.

Timothy Bickmore and Rosalind Picard’s research on long-term human-computer relationships provides a more optimistic and design-oriented perspective. Bickmore and Picard demonstrate that conversational agents capable of sustaining relational continuity — remembering previous interactions, acknowledging the passage of time, expressing something like concern for the user’s ongoing wellbeing — produce significantly greater user engagement and adherence in health behaviour change applications than equivalent systems without these relational features. Users of the relational system report higher satisfaction, greater trust, and stronger motivation to continue with health-promoting behaviours. The relational features, in other words, are not merely cosmetic; they are instrumentally effective for the goals the system is designed to support. This finding opens an interesting design space between Turkle’s scepticism about AI relationships and a purely functionalist indifference to relational quality: relational AI design can be motivated and evaluated on the basis of its actual effects on user wellbeing, rather than on contested metaphysical claims about the nature of AI.

Marcel Heerink and colleagues’ model of social robot acceptance by aging adults identifies trust, perceived enjoyment, perceived usefulness, and social influence as the primary determinants of willingness to use assistive social robots. The model is notable for the prominence it gives to enjoyment and social influence — factors that are more central to social psychology than to the utilitarian calculus of technology adoption that dominates most technology acceptance research. Older adults, the studies find, are substantially more willing to accept social robots when they believe people important to them would approve of the technology and when the interaction is experienced as intrinsically enjoyable, regardless of whether they believe the technology is particularly useful for the functional tasks it might perform. This finding aligns with the broader literature on the primacy of social and affective dimensions in AI acceptance, and it suggests that design approaches focused exclusively on utility optimisation are likely to systematically underperform in populations for whom the relational dimensions of AI interaction are salient.

The ethics of AI companionship involves a question that is easy to state but surprisingly resistant to resolution: can a relationship with an entity that has no genuine interests of its own — that cannot benefit or be harmed by the relationship, that does not exist as a subject between interactions — constitute genuine companionship? The philosophical literature gives no consensus answer. Turkle says no; the relationship is a simulacrum that crowds out authentic connection. Others argue that the psychological reality of the user's experience is what matters — that if the attachment is genuine on the user's side, the relationship has genuine psychological substance. The empirical literature on outcomes suggests that the answer may be domain-specific: AI companionship appears to benefit some users in some circumstances and harm others in others, and the relevant variable may be whether it supplements an otherwise adequate social life or substitutes for one that is absent.

Chapter 6: AI and Mental Health

The deployment of AI systems in mental health contexts is both the most promising and the most ethically fraught application of the psychology of human-AI interaction. The promise is substantial: depression and anxiety affect hundreds of millions of people globally, effective psychological treatment is inaccessible to the majority of those who need it because of cost, stigma, geographical barriers, and workforce shortages, and AI systems potentially offer accessible, stigma-free, consistent, and scalable delivery of evidence-based psychological intervention. The risks are equally serious: inappropriate AI responses to users in crisis can cause direct harm; AI systems can reinforce rather than correct maladaptive thought patterns; and substitution of AI care for professional care in conditions that require it can delay or prevent appropriate treatment. Navigating these tensions requires both rigorous empirical evidence and principled ethical analysis.

Fitzpatrick and colleagues’ 2017 RCT of Woebot — a chatbot that delivers conversational cognitive behavioural therapy (CBT) in a mobile application — remains the landmark study in the field. The trial randomised college students with elevated symptoms of depression and anxiety to either Woebot access or a control condition receiving a self-help reading. At the two-week assessment, the Woebot group showed significantly greater reductions in depression symptoms on the Patient Health Questionnaire (PHQ-9) and significantly greater reductions in anxiety on the Generalized Anxiety Disorder scale (GAD-7). The trial was small, the follow-up was short, and the control condition was weak — a self-help reading is not an adequate active comparator for a conversational system that provides daily interaction. Subsequent studies of Woebot and analogous platforms (Wysa, Youper, Replika used for emotional support) have produced mixed results, with effect sizes generally smaller than those in the original trial and with considerable heterogeneity across populations and outcomes. The evidence base as of 2025 supports the conclusion that CBT chatbots can produce clinically meaningful symptom reduction in mild-to-moderate depression and anxiety, particularly when access to human therapy is unavailable, but does not support strong claims about equivalence to human-delivered CBT.

The mechanisms through which CBT chatbots produce therapeutic benefit are not fully understood, and they may be partially dissociable from the CBT content itself. Several candidate mechanisms have been proposed. Accessibility is likely important: a system available at 3am without stigma, without a waiting list, and without financial cost removes barriers that prevent many individuals from engaging with psychological support at the moments of greatest need. Consistency is another candidate: the chatbot always applies the protocol, never has an off day, never expresses frustration or impatience, and never introduces the biases that human therapists bring to the therapeutic relationship. The phenomenon of perceived privacy (感知隐私) is a third: many users of mental health chatbots report sharing thoughts and feelings with the AI that they would not share with a human therapist, because they feel less judged and less concerned about the therapist’s reaction. Whether this perceived privacy translates into deeper or more honest engagement with therapeutic content is an empirical question that remains inadequately investigated.

The attachment problem in AI therapy is one of the most serious concerns in the field. Established models of psychotherapy emphasise the therapeutic alliance — the quality of the collaborative relationship between therapist and client — as a primary predictor of treatment outcome, across modalities and populations. If users of AI therapy systems form genuine attachments to those systems, as the evidence from companion AI research suggests they often do, then the therapeutic benefit of the AI system may be partly mediated by the alliance-like relationship rather than by the CBT content. This raises profound questions about what happens when the system is discontinued, updated in ways that alter its apparent personality, or simply unavailable at a moment of acute need. A human therapist who terminates a therapeutic relationship is bound by professional and ethical obligations to manage that termination carefully, with appropriate referral and closure. AI therapy platforms have, with rare exceptions, no analogous obligation and no analogous practice.

The specific problem of crisis detection in AI mental health applications has become increasingly urgent as these systems scale. Current AI systems, including Woebot, are designed to detect language indicating acute suicidal ideation and to redirect users to human support. The practical performance of these detection systems — their sensitivity and specificity across demographic populations, their failure modes, and the real-world consequences of both false positives and false negatives — is not publicly available in the detail necessary for independent evaluation. The Facebook internal research controversy of 2021, in which leaked documents revealed that the company had conducted extensive research on algorithmic identification of suicidal ideation among users, raised serious questions about consent, privacy, and the appropriate boundaries of predictive mental health AI. Even systems designed with therapeutic intent face analogous questions: does the user who discloses thoughts of self-harm to a chatbot consent to that disclosure being processed by systems designed to identify and flag crisis? The legal and ethical frameworks governing this domain are underdeveloped relative to the pace of deployment.

Therapeutic alliance refers to the quality of the collaborative relationship between a therapist and client, typically conceptualised as comprising the affective bond, agreement on the goals of therapy, and agreement on the tasks of therapy. It is one of the most robust predictors of psychotherapy outcome across treatment modalities, and its applicability to human-AI therapeutic interactions is a central open question in the AI mental health literature.

Picard’s Affective Computing programme provides a technical and conceptual framework for AI systems that can recognise, interpret, and respond to human emotional states — capabilities that are directly relevant to therapeutic AI. Picard argues that the ability to recognise and appropriately respond to affective states is not a luxury add-on to AI systems but a functional necessity for AI systems engaged in tasks — teaching, advising, providing care — that are intrinsically socially and emotionally embedded. A therapy chatbot that cannot recognise distress, frustration, or disengagement in a user’s messages cannot provide effective therapeutic support; the affective dimension is not separable from the therapeutic task. The practical implementation of affective recognition in deployed systems remains limited, and the gap between Picard’s research vision and current deployed capability is a significant source of the limitations documented in the clinical trials literature.


Chapter 7: Cognitive Offloading, Skill Atrophy, and AI-Augmented Cognition

Evan Risko and Sam Gilbert’s 2016 review of cognitive offloading (认知卸载) defines the phenomenon as any act by which an individual exploits environmental resources to reduce internal cognitive demand. Writing a shopping list is cognitive offloading; so is setting a phone reminder, drawing a diagram, or asking a colleague to remember something. The concept builds on Andy Clark and David Chalmers’s extended mind thesis — the philosophical argument that cognitive processes are not necessarily bounded by the skin and skull but can incorporate external resources when those resources are reliably available, appropriately formatted, and robustly coupled with internal processes. Clark and Chalmers’s canonical example is Otto, an Alzheimer’s patient who writes everything important in a notebook and consults it as reliably as a person with intact memory consults their biological memory: the notebook, they argue, functionally constitutes part of Otto’s cognitive system in the same sense that his biological memory would if it were intact.

The practical question raised by cognitive offloading is where the line between healthy cognitive augmentation and damaging cognitive atrophy lies. Risko and Gilbert note that the empirical evidence on this question is substantially more nuanced than either technophilic enthusiasm or technophobic alarm would suggest. Using external aids to offload cognitive tasks does not, in general, degrade performance on those tasks when the external aids are available; the question is what happens when they are unavailable, and what happens to the underlying cognitive capacities over time. The most relevant longitudinal evidence comes from studies of navigation and GPS use. Münzer and colleagues, in a series of experiments, demonstrate that participants who navigate using turn-by-turn GPS instructions acquire significantly less spatial knowledge of the navigated environment than participants who navigate using map-based instructions or with no assistance. The GPS users can perform the navigation task, but they do not learn the environment; their spatial memory is not exercised by the navigation activity when the GPS is doing the relevant work. When subsequently asked to navigate the same environment without assistance, GPS users perform substantially worse than map-navigation users.

The generalisation of these navigation findings to higher-order cognitive skills mediated by AI is the central empirical question in this domain, and the evidence is considerably more mixed. Writing is perhaps the most consequential case. Does using AI writing assistants — systems that can draft, revise, suggest, and complete prose — improve or degrade writing skill? The question is difficult to answer because the relevant outcome — writing skill in the absence of the AI — is not typically measured in studies that evaluate the quality of AI-assisted writing output. Early studies on LLM-assisted writing, published in 2023 and 2024, consistently find that AI assistance improves the quality of the final written product across a range of tasks, including argument construction, evidence integration, and surface-level clarity. What these studies do not measure is whether the assisted writers, over time, become better or worse at writing without assistance — whether the AI functions as a scaffold that supports skill development or as a crutch that forestalls it.

Automation complacency (自动化自满) in expert domains is related to but distinct from automation bias. Where automation bias describes the tendency to accept automated recommendations without appropriate scrutiny, automation complacency describes the more generalised state of reduced vigilance and monitoring effort that develops in operators who are managing highly reliable automated systems. A radiologist who uses an AI-assisted detection system that correctly flags abnormalities 97% of the time will, over time, tend to allocate less attentional effort to the 3% of cases in which the system fails — not because of a single trust miscalibration decision, but because the stable reliability of the system gradually shapes the allocation of attentional resources in ways that are not represented in any explicit deliberation. The consequence is that performance on the cases the AI misses — typically the unusual, ambiguous, or rare cases — may degrade below the level the clinician would achieve without the AI, precisely because of the attentional reallocation driven by high AI reliability in the common cases.

The pedagogical implications of cognitive offloading research are among the most actively contested in contemporary educational psychology. The question of whether AI tools should be permitted in learning environments where the goal is skill acquisition cannot be answered without first specifying what skill is being acquired. If the goal is to produce graduates who can write compelling analytical essays, the relevant question is whether AI-assisted essay production leads to acquisition of essay-writing skill or to its circumvention. If the goal is to produce graduates who can effectively use AI writing tools to produce compelling analytical essays, then AI assistance is not a circumvention but an instance of the target skill. These are not the same goal, and the choice between them reflects substantive values about what education is for. What the cognitive offloading research contributes is the empirical framework for asking the question rigorously: it provides the concepts and methodological tools for identifying which cognitive capacities are exercised by which tasks, which of those capacities are developed through practice, and which are bypassed when AI tools do the relevant work.

Clark and Chalmers's extended mind thesis, read carefully, does not support the naive conclusion that all cognitive offloading is benign. It supports the conclusion that external resources can, in principle, constitute genuine cognitive extensions — not that every available external resource does so in every context. The conditions for genuine cognitive extension on Clark and Chalmers's account are demanding: the resource must be reliably available, the agent must be able to access it easily, and the information stored must be taken as authoritative. When these conditions are met in a context where the relevant skills are already established, offloading is benign and often beneficial. When they are not met, or when they are met in a context where the goal is precisely the development of the skills being offloaded, the picture is more complex.

Chapter 8: The Behavioural Economics of AI Nudges and Algorithmic Choice Architecture

Richard Thaler and Cass Sunstein’s Nudge (2008) introduced a framework for thinking about policy and design that has since become one of the most influential in social science. The core insight is that the architecture of choice — the way options are presented, ordered, framed, and defaulted — powerfully influences decisions independently of the deliberate preferences of the decision-maker, and that thoughtful design of this architecture can therefore systematically improve decisions without restricting options or relying on financial incentives or legal compulsion. A cafeteria that places fruit at eye level and desserts at the back is nudging customers toward healthier choices; a pension scheme that enrols employees by default and requires active opt-out is nudging them toward retirement saving. The normative framework Thaler and Sunstein advocate — libertarian paternalism (自由意志家长主义) — holds that such nudges are justified when they promote the decision-maker’s own considered welfare, when they do not restrict freedom of choice, and when their design is transparent.

Karen Yeung’s 2017 concept of the hypernudge (超推助) extends this framework to the specific architecture of AI-driven personalised choice environments in ways that expose the limits of the original libertarian paternalist justification. Traditional nudges operate through fixed choice architecture: the same default, the same framing, the same arrangement of options is presented to everyone. Hypernudges are different in three crucial respects. First, they are personalised: the choice architecture presented to each individual user is constructed on the basis of detailed behavioural data about that specific individual, optimised to produce the desired response from that individual given her particular history of responses to similar stimuli. Second, they are dynamic: the choice architecture updates continuously in real time as new behavioural data accumulates, adjusting to changes in the individual’s apparent preferences, mood states, and susceptibilities. Third, they are invisible: the individual has no access to the model of herself that is being used to construct her choice environment, no awareness that her environment is personalised relative to others, and no practical means of opting out of the personalisation without exiting the platform entirely.

The Cambridge Analytica case is the most widely discussed instantiation of hypernudge at scale. The company’s use of Facebook data to construct psychographic profiles of US voters, and the subsequent micro-targeting of political advertising calibrated to those profiles, represents a use of personalised algorithmic choice architecture for political persuasion — the optimisation of message framing and delivery to exploit identified psychological susceptibilities in targeted individuals. What makes this case analytically important, beyond its political and legal consequences, is that it makes vivid the structural features of hypernudging that distinguish it from conventional advertising: the targeting precision enabled by detailed behavioural data, the calibration of content to individual psychological profile, and the invisibility of the targeting to the target. Recommendation algorithms on major platforms exhibit analogous structural features, even when the objectives are commercial rather than political. The user who sees a product recommendation on a major e-commerce platform is not seeing the same recommendation as other users; she is seeing a recommendation optimised on the basis of her specific purchase history, browsing patterns, and inferred psychological state, presented in a framing calibrated to maximise the probability of her purchase.

The autonomy problem (自主性问题) raised by hypernudging is deeper than the autonomy concern associated with traditional nudges. Thaler and Sunstein’s libertarian paternalist framework attempts to address the autonomy concern about nudges by insisting that the goal of nudge design should be to help decision-makers act on their own considered preferences — to correct for cognitive biases that lead them to act against their own welfare. The hypernudge, as Yeung argues, cannot be justified on this basis: it is not attempting to identify and serve the individual’s considered preferences but to identify which stimuli will produce which responses and to deploy the optimal stimulus to produce the desired response. This is a behaviourist project, not a preference-satisfaction project, and it is consistent with the production of behaviour that the individual would, on reflection, prefer not to have produced. The autonomy violation is not the restriction of options but the manipulation of the psychological process by which options are evaluated — a more intimate and less visible intervention than anything traditional paternalism contemplates.

The EU AI Act’s treatment of prohibited AI practices reflects, in part, the conceptual analysis summarised by Yeung. The Act prohibits AI systems that deploy subliminal techniques beyond the user’s consciousness, or that exploit psychological vulnerabilities of specific groups, to distort their behaviour in ways that cause harm. The regulatory challenge is that the boundary between legitimate personalisation — presenting content the user is likely to find relevant — and prohibited manipulation — exploiting identified psychological vulnerabilities to produce behaviour the user would not endorse on reflection — is not easy to draw with legal precision, and that platforms have structural incentives to position their practices on the legitimate side of whatever line is drawn. The emerging field of algorithmic impact assessment (算法影响评估) attempts to develop methodological tools for evaluating these practices empirically — measuring whether specific algorithmic systems produce decisions or behaviours that the affected individuals would endorse under ideal conditions of full information and reflection.

The GDPR’s right to explanation for automated decisions — the requirement that individuals subject to significant automated decisions be given a meaningful explanation of the logic involved — was conceived primarily as a transparency mechanism to support contestability of specific decisions. Yeung and others have argued that this right, while valuable, is insufficient to address the autonomy concerns raised by hypernudging, because the relevant intervention is not a specific decision affecting the individual but a continuous shaping of her choice environment that no single automated decision captures. Regulating hypernudging at the level of individual decisions is like regulating water pollution by requiring explanation of each individual drop of effluent; the relevant unit of analysis is systemic and structural. Whether and how regulatory frameworks can address the systemic character of AI-driven choice architecture is one of the central open questions in the governance of AI.

Hypernudge, in Yeung's formulation, is a mode of regulation by design in which big data analytics and algorithmic personalisation are used to continuously adjust the informational and structural environment of individuals to steer their behaviour toward desired outcomes. It is distinguished from traditional nudge by its personalisation, its real-time dynamism, and its opacity to the individuals whose behaviour it shapes.

Chapter 9: The Psychological Self in an AI World

The preceding chapters have examined specific domains of psychological response to AI: trust calibration, social attribution, relational attachment, mental health care, cognitive offloading, and choice architecture. This final chapter steps back from domain-specific analysis to ask a broader question: what is the cumulative effect of living in a world saturated with AI systems on the psychological self — on identity, agency, autonomy, and the sense of being a coherent subject with a continuous inner life?

One entry point to this question is what might be called the mirror problem (镜像问题): the experience of being accurately predicted by an AI system. When a recommendation algorithm accurately infers that a user will want to watch a particular film, or a language model accurately predicts the next word in a sentence the user is composing, the predictive accuracy is unremarkable. But when an AI system can accurately predict significant choices — career decisions, relationship endings, purchases of considerable emotional significance — the experience of being predicted can be disorienting in ways that touch questions of agency and self-determination. If the system can predict what one will choose, and if one goes on to make that choice, what is the sense in which the choice was freely made? The philosophical literature on free will and determination is relevant here, but it does not quite address the specific phenomenological and psychological experience of being accurately predicted by a system one is aware of. The experience of predictability can produce a range of responses: resignation, reassurance, or a reactive impulse to behave unpredictably — to confound the model, to assert agency by acting against the predicted preference. This last response, documented in several experimental studies, is one of the more intriguing psychological effects of AI systems that have genuine predictive power over their users.

Turkle’s analysis of the erosion of solitude in Alone Together provides a framework for thinking about the longitudinal effects of AI interaction on the inner life. Turkle argues, on the basis of extensive interviews with adolescents and young adults, that the experience of being always connected — always reachable, always presenting a self to an audience, always able to fill moments of aloneness with communication — has eroded the capacity for the kind of reflective, solitary experience in which a sense of self, authentic preferences, and a coherent inner narrative are developed. The extension of this argument to AI companions is direct: if the experience of conversation with an AI companion fills the experiential space that solitary reflection might otherwise occupy, and if that reflective space is important for the development and maintenance of a coherent self, then heavy reliance on AI companionship may have consequences for self-development that are not captured in measures of loneliness, social satisfaction, or mood.

The emerging longitudinal evidence on social skill development and AI interaction is fragmentary and methodologically challenging. Cross-sectional studies comparing heavy AI conversational system users with light users on measures of social skill, social confidence, and preference for human versus AI interaction are systematically confounded by selection effects: people who are already more socially anxious or less skilled may seek out AI companions more, producing an apparent association between AI use and social skill deficit that does not reflect any causal effect of AI use. Longitudinal studies with pre-post designs are beginning to appear in the literature as of 2024 and 2025, but they are short (weeks to months), use convenience samples, and measure outcomes at a level of generality that makes inference to real-world social functioning difficult. The honest characterisation of the current evidence base is that it raises genuine concerns but does not yet establish causal conclusions about the effects of AI companion use on human social functioning.

The question of authenticity (真实性) in AI-mediated experience arises with particular salience in the domain of creative and aesthetic appreciation. Empirical studies consistently find that knowing a work of art, music, or writing was generated by AI reduces the rated quality, aesthetic value, and emotional impact of that work — even when participants cannot distinguish AI-generated from human-generated works in blind conditions. This effect, sometimes called the intentionality effect (意图效应), reflects the apparently deep human intuition that aesthetic experience is partly constituted by the perceived intentions and experiences of a creator: that a painting is beautiful partly because it is the expression of a specific human being’s vision, and that knowing no human being had a vision to express diminishes the experience. Whether this intuition is a defensible aesthetic principle or a cognitive bias to be educated away — whether the knowing knowledge of AI authorship appropriately diminishes aesthetic appreciation or merely distorts it — is an open question at the intersection of empirical psychology and aesthetic philosophy.

Rosalind Picard’s vision of affective computing (情感计算) — AI systems capable of recognising, modelling, and responding to human emotional states — opens a future in which AI systems do not merely process information about human beings but engage with them at the level of emotional experience. The implications for selfhood are profound. A system that can model a user’s emotional state with sufficient accuracy to anticipate and respond to her emotional needs before she has articulated them — or before she has herself fully recognised them — occupies a different relationship to the self than any tool or instrument. Whether this represents a genuine enrichment of human experience, a progressive outsourcing of emotional self-knowledge, or a form of emotional manipulation depends on the purposes the systems are designed to serve and the degree of transparency with which they operate. These are not abstract future concerns; they are already live questions about systems currently deployed.

The cross-disciplinary character of the questions raised in this chapter is itself a significant finding. The psychological self in an AI world is not only a psychological problem; it involves the philosophy of agency and autonomy (see PHIL 459b), the sociology of technological change and inequality (see SOC 435), the political economy of platform capitalism and surveillance (see POLS 426), and the design ethics of AI systems (see CS 492). The contribution of psychology is to ground the discussion in empirical findings about how people actually experience these phenomena — to resist both the techno-utopian narrative in which AI systems simply and benignly enhance human capability and the techno-dystopian narrative in which they straightforwardly degrade and manipulate it, and to insist on the complexity and heterogeneity of actual human responses. The findings assembled in this course suggest that the psychological effects of AI are neither uniformly positive nor uniformly negative; that they depend heavily on design choices, use contexts, and individual differences; and that understanding them empirically is a precondition for governing them wisely.

The concept of appropriate reliance introduced in Chapter 2 can be revisited in this broader context as something like a regulative ideal for the psychology of AI interaction as a whole. Just as the goal in automation contexts is to rely on AI systems in proportion to their actual reliability — neither abdicating judgment nor ignoring genuine assistance — so the broader goal might be described as engaging with AI systems in proportion to what they can genuinely provide: accepting the genuine cognitive and practical benefits of AI tools while remaining alert to the social, affective, and cognitive effects that AI interaction produces whether or not they are deliberately designed, and whether or not they serve the user's actual interests. This is easy to state and, as the preceding chapters make clear, extraordinarily difficult to achieve. The gap between the ideal and the actual is where the psychology lives — and where the design, policy, and ethical work remains to be done.
Back to top