FINE 430: Generative AI and Creative Practice

Estimated study time: 1 hr 28 min

Table of contents

Why make it up
FINE 130 (Digital Imaging) and GBDA 101 (Digital Media Design) sit firmly in the pre-generative era, and CS 798 (AI Music Generation) is music-only. UW has nothing on the visual and cross-medium generative wave that has restructured artistic practice since 2022. This course teaches GANs (Goodfellow et al.), latent diffusion (Rombach et al.), Elgammal’s Creative Adversarial Networks, and the philosophy of machine creativity (Boden, McCormack and d’Inverno, Manovich) alongside the legal-cultural firestorm (Andersen v. Stability AI, Getty v. Stability AI, Théâtre D’opéra Spatial) and the artists who shaped the response (Refik Anadol, Holly Herndon, Memo Akten, Trevor Paglen). Anchored on CMU 60-419, NYU ITP’s AI Arts, RCA Computational Arts, SAIC AI in Art, and MIT 4.S58.
  • Goodfellow, Ian, et al. “Generative Adversarial Nets.” NIPS 2014.
  • Rombach, Robin, et al. “High-Resolution Image Synthesis with Latent Diffusion Models.” CVPR 2022.
  • Elgammal, Ahmed, et al. “CAN: Creative Adversarial Networks, Generating ‘Art’ by Learning About Styles and Deviating from Style Norms.” ICCC 2017.
  • Boden, Margaret A. The Creative Mind: Myths and Mechanisms, 2nd ed. Routledge, 2004.
  • McCormack, Jon, and Mark d’Inverno, eds. Computers and Creativity. Springer, 2012.
  • Manovich, Lev. AI Aesthetics. Strelka Press, 2018.
  • Bishop, Claire. Artificial Hells: Participatory Art and the Politics of Spectatorship. Verso, 2012.
  • Benjamin, Walter. “The Work of Art in the Age of Mechanical Reproduction.” 1935. In Illuminations, ed. Hannah Arendt. Schocken Books, 1969.
  • Weiwei, Ai. Humanity. Princeton University Press, 2018.
  • Paglen, Trevor. “Invisible Images (Your Pictures Are Looking at You).” The New Inquiry, December 2016.
  • Herndon, Holly, and Mat Dryhurst. “Spawning and the Politics of AI Training Data.” 2023. spawning.ai.
  • Andersen v. Stability AI Ltd. No. 3:23-cv-00201 (N.D. Cal. 2023).
  • Getty Images v. Stability AI Ltd. No. 1:23-cv-00135 (D. Del. 2023).
  • Chung, Seung-Yeon, et al. “Théâtre D’opéra Spatial and the Colorado State Fair Controversy.” 2022. Discussion in ARTnews and The Atlantic.
  • Cave, Nick. “ChatGPT Wrote a Nick Cave Song. Here’s His Response.” The Red Hand Files, Issue 218, January 2023. theredhandfiles.com.
  • Klingemann, Mario. “Art in the Age of Neural Networks.” NeurIPS Creative AI Workshop, Montreal, 2018.
  • Steyerl, Hito. “A Sea of Data: Apophenia and Pattern (Mis-)Recognition.” e-flux journal 72, April 2016.
  • Steyerl, Hito. “Mean Images.” New Left Review 140/141, March–June 2023.
  • Grimes. “I’ll split 50% royalties on any successful AI generated song that uses my voice.” Post on X (formerly Twitter), April 24, 2023.
  • McCartney, Paul. Interview with BBC Radio 4 Today programme. June 13, 2023. On AI and the completion of “Now and Then.”
  • Authors Guild. “Open Letter on Artificial Intelligence.” Authors Guild, July 2023. authorsguild.org.
  • Online resources: CMU 60-419 course materials; NYU ITP AI Arts blog; RCA Computational Arts programme descriptions; SAIC New Arts Journalism programme; MIT 4.S58 materials; Refik Anadol Studio documentation; Holly Herndon official site; Memo Akten studio notes; Trevor Paglen project documentation; The Red Hand Files archive; Civitai community documentation; ComfyUI project documentation.

Chapter 1: The Generative Turn — What Changed in 2022

The history of art made with algorithms stretches back further than the discourse of 2022 tends to acknowledge. Vera Molnár, working in Paris from the late 1960s onward, produced plotted drawings by specifying geometric transformations in code and introducing controlled randomness into their parameters — works that were already asking whether systematic rule-following could generate aesthetic surprise. Harold Cohen built AARON, a rule-based program capable of producing original drawings that he exhibited internationally, and spent decades refining its capacity to handle colour, shading, and compositional variety. By the time Processing was released in 2001 and openFrameworks followed in 2004, a generation of artists had access to lightweight, community-supported environments for writing generative visual code, and an entire sub-genre of computational art — latticed, flowing, data-driven — had established itself in galleries, festivals, and academic programmes. The aesthetic vocabulary of that era, oriented toward code-as-craft, required the artist to understand algorithms deeply and to shape their visual output through careful technical intervention. The gap between intending an image and producing it remained substantial, and it was precisely in navigating that gap that the artistic practice lived.

What changed between April and August 2022 was the collapse of that gap for images produced in a different way. OpenAI’s DALL-E 2 in April 2022, Midjourney’s public beta in July, and the open-source release of Stable Diffusion in August represented not incremental refinement but a threshold crossing: for the first time, producing a high-quality, photorealistic, or stylistically sophisticated visual image required no technical training, no programming knowledge, and no manual craft skill. It required only the ability to write a sentence in English. The models behind these systems — latent diffusion models (潜在扩散模型) trained on billions of image-text pairs scraped from the internet — had compressed an enormous archive of human visual culture into a statistical structure from which new images could be summoned by natural language description. The speed of adoption was itself a cultural event: within weeks of Stable Diffusion’s release, millions of images were being generated daily, and within months the economic disruption of creative industries had become a policy question.

The democratisation argument emerged immediately and forcefully. If producing a polished illustration or concept painting previously required years of training and hours of execution, and now required only a few words and a few seconds, then the barriers to visual expression had been dramatically lowered. This argument has genuine force: people who have ideas but no hand-craft can now realise those ideas visually, and the communicative power of images has been distributed more widely. But the democratisation framing also conceals a substitution. The barrier that has been lowered is not the barrier between having an idea and expressing it; it is the barrier between being a skilled visual professional and competing with one. The profession of illustrator, stock photographer, concept artist, and visual designer had not been under serious algorithmic threat before 2022. The speed with which it came under that threat, and the casualness with which the tools were released without any transition mechanisms for affected workers, is one of the defining ethical facts of the moment.

The destabilisation of the creative economy proceeded along several axes simultaneously. Stock photography platforms reported dramatic declines in submission quality and volume as buyers discovered they could generate custom images for specific needs; the business model of stock licensing, which had survived the transition from analogue to digital photography, faced genuine existential pressure. Illustrators who had built careers on a recognisable style found that models fine-tuned on their published work could produce images in their style without compensation or consent. Concept artists working for game studios and film productions began to see their roles redefined from primary creators to refiners and supervisors of AI-generated material. None of these effects were hypothetical by late 2022; they were already in the job listings and the commission requests. The generation of artists graduating from fine arts programmes in 2022 and 2023 entered a professional landscape that had changed more in eighteen months than it had in the previous two decades.

This course situates itself in the practice-centred questions that this moment raises rather than in the historical or philosophical questions that, while important, tend to dissolve the urgency of what artists are actually confronting. The relevant question is not simply whether machines can be creative — a philosophical puzzle that has been posed and variously answered for decades — but what artists do with generative AI tools, how they evaluate what they are doing, and what resources exist for doing it ethically and critically. The course draws on a tradition of artists who have worked at the intersection of technology and visual culture with rigorous intellectual intent: Rokeby’s own Very Nervous System (1986–1990) as an early interactive work that makes the body’s relationship to machine perception its subject; Paglen’s forensic investigations into the politics of machine vision; Anadol’s large-scale data sculptures; Herndon’s practice-based inquiry into AI and the singing voice. These practices do not simply use technology; they think with and against it.

Generative AI in the context of this course refers to machine learning systems trained to produce novel outputs — images, text, sound, video — by learning the statistical structure of large training datasets and sampling from that learned distribution. The key distinction from earlier computational art is that the generative process is learned from data rather than explicitly programmed by the artist. This shift relocates the creative intervention from the specification of rules to the curation of prompts, the selection and curation of training data, and the design of post-generation workflows.

Chapter 2: How Generative Models Work — Enough to Think With

The lineage of modern image-generative AI begins with a simple and elegant adversarial idea. In 2014, Ian Goodfellow and collaborators at the Université de Montréal proposed what they called a generative adversarial network (生成对抗网络), a training architecture in which two neural networks — a generator and a discriminator — are placed in competition. The generator takes a random noise vector and attempts to produce a realistic image; the discriminator takes either a real image from the training set or a generated image and attempts to classify it as real or fake. The two networks train simultaneously, each improving in response to the other’s improvements: as the discriminator gets better at detecting fakes, the generator must get better at making convincing ones. The training process converges, in principle, to a generator that produces outputs indistinguishable from real training data by any discriminator. In practice, GANs proved difficult to train stably, prone to mode collapse (模式崩溃) — a failure mode in which the generator learns to produce only a narrow range of outputs that reliably fool the discriminator — and limited in the diversity of imagery they could plausibly generate. Despite these limitations, GAN-based systems produced striking results in specific domains: face generation with StyleGAN2 became a cultural touchstone (the site thispersondoesnotexist.com ran on GAN outputs), and class-conditional GANs produced recognisable images of objects across hundreds of categories. But the dream of a fully general image-generative system remained out of reach for the GAN paradigm.

The breakthrough came with a different generative principle, one grounded not in adversarial training but in the reversal of a known statistical process. Diffusion models (扩散模型), developed in the framework of denoising score matching and later formalised in the DDPM paper by Ho et al. (2020), learn to reverse a gradual noise-addition process: given an image, the forward process adds Gaussian noise in small increments until the image becomes pure noise; the model learns the reverse — given a noisy image at any stage of this process, predict what the slightly-less-noisy version looks like. At inference time, the model starts from pure noise and iterates the learned reverse process, progressively refining a coherent image from randomness. This formulation avoids the training instabilities of GANs and naturally produces diverse outputs by sampling different noise starting points. The critical practical advance came from Rombach et al.’s latent diffusion model (潜在扩散模型) paper at CVPR 2022: rather than running the diffusion process in the full pixel space of the image — computationally expensive and memory-intensive — the model first encodes the image into a compressed latent representation (潜在表示) using a variational autoencoder, runs the diffusion process in that lower-dimensional latent space, and then decodes the refined latent back into pixels. This compression step reduced the computational cost of training and inference by an order of magnitude, making large-scale text-conditioned image generation practically feasible for the first time.

The connection between natural language and image generation depends on a separate but crucial piece of infrastructure: CLIP (Contrastive Language–Image Pre-training), developed by OpenAI and published in 2021. CLIP is trained on a massive dataset of image-text pairs to learn a shared embedding space in which semantically similar images and text descriptions have nearby representations, even if they come from different modalities. By conditioning a diffusion model’s denoising process on a CLIP embedding of a text prompt, Rombach et al. enabled text-to-image generation: the model learns not only how to reverse noise but how to reverse it in a direction consistent with the semantic content of the text. The resulting architecture — VAE encoder, CLIP text encoder, latent diffusion backbone, VAE decoder — is the core of Stable Diffusion and its successors. Understanding this architecture does not require knowing the mathematics of score matching or the details of transformer attention; what matters for artistic practice is the conceptual structure. The model has compressed a vast visual archive into a high-dimensional latent space in which points near each other tend to look visually similar, and in which different regions correspond, loosely, to different aesthetic territories. A text prompt navigates this space by selecting a direction; sampling adds controlled randomness.

Ahmed Elgammal and colleagues at Rutgers introduced an important variation on the GAN framework in their 2017 ICCC paper on Creative Adversarial Networks (创意对抗网络), abbreviated CAN. Standard GANs trained on art datasets tend to produce images that closely resemble the dominant styles in the training data — they minimise discriminator loss by staying close to the mean of the art distribution. Elgammal’s insight was to add a second adversarial signal: alongside the standard realism discriminator, a style classifier judges whether the generated image’s style can be identified as one of the known styles in the training data. The generator is trained to simultaneously fool the realism discriminator (produce something that looks like art) and fool the style classifier (produce something that does not look like any known style). This creates pressure toward novelty: the model must produce images that are recognisable as art while deviating from established aesthetic norms. The results, evaluated in a controlled experiment against both human-made abstract art and standard GAN outputs, were rated by naive observers as more interesting and more likely to have been made by a human artist. CAN is theoretically significant because it is the first system to operationalise creativity as a training objective — to build the production of novelty directly into the loss function, rather than treating creativity as an emergent accident.

For the artist working with these tools, the conceptual vocabulary that matters most is the topology of latent space (潜在空间). Because the latent space is a continuous, high-dimensional manifold, it is possible to interpolate between two points — two images, two styles, two concepts — and trace a path of semantically smooth transitions between them. It is possible to add and subtract concepts in the latent representation, as in the famous word-vector arithmetic that demonstrated gender and geographic associations in word embeddings. Prompt engineering becomes, on this view, a practice of navigation and assembly in latent space: choosing anchor points, modulating weights between them, and exploiting the model’s learned associations to discover images that exist in the compressed archive of all visual culture without having been explicitly put there. The question that this vocabulary makes unavoidable — whose images built that archive, and what political valences are embedded in its geometry — is one that every serious practitioner must eventually confront.

The distinction between training-time and inference-time intervention matters enormously for how we evaluate generative AI artworks. A practitioner who uses an off-the-shelf model with text prompts alone occupies a fundamentally different creative position from one who assembles a custom training dataset, fine-tunes a base model, and develops bespoke sampling procedures. Neither position is inherently more or less artistic — photography and printing both involve using pre-existing optical and chemical systems — but the distinction affects how questions of authorship, responsibility, and credit should be allocated.

Chapter 3: Prompt as Medium — Aesthetic Practice in the Age of Text-to-Image

When practitioners began describing the skill of writing effective text-to-image inputs as prompt engineering (提示工程), the term simultaneously flattered and mystified what was happening. The engineering framing suggested systematic, reproducible expertise: learn the right incantations — “highly detailed, trending on ArtStation, 4K, concept art, by Greg Rutkowski” — and reliably produce high-quality outputs. This framing was partially accurate and partially misleading. It was accurate in that certain text patterns reliably activated aesthetic associations in the model’s latent space, and that practitioners developed genuine expertise in those patterns through systematic experimentation. It was misleading in that it suggested the model’s response to prompts was rule-governed in a way it is not: the relationship between prompt and output is probabilistic, context-sensitive, and shaped by training data associations that are opaque to the user. A practitioner cannot inspect the model’s weights and reason from first principles about why “oil painting” activates different regions of latent space than “oil on canvas.” The expertise is empirical and intuitive, not deductive.

Lev Manovich offers the most searching theoretical account of what text-to-image generation does to aesthetics. In AI Aesthetics, published in 2018 but conceptually anticipating the generative moment with remarkable accuracy, Manovich argues that a cultural technology trained on the average of all prior cultural production will tend to produce outputs that are stylistically smooth, compositionally conventional, and resolutely within the plausible centre of the aesthetic distribution it was trained on. The model cannot produce genuinely new aesthetic forms because new aesthetic forms, by definition, lie outside the training distribution; it can only produce weighted combinations of existing forms, however surprising those combinations might appear. Manovich calls this the aesthetics of the average (平均美学), and associates it with a broader cultural logic in which algorithmic systems optimised for engagement systematically favour the recognisable over the challenging, the smooth over the abrasive, the pleasant over the unsettling. The stunning imagery produced by early text-to-image systems had a characteristic quality that many observers noticed without being able to name: it was simultaneously impressive and somehow vacant, hyper-detailed and somehow shallow, the visual equivalent of a sentence that is grammatically perfect and semantically empty.

Refik Anadol’s practice represents a different relationship to the trained model that partially complicates Manovich’s critique. In works like Unsupervised (MoMA, 2022), Anadol does not use a text-to-image model to produce still images matching verbal descriptions; instead, he trains or adapts models on curated datasets — in this case, the digital archive of MoMA’s collection — and then creates real-time video installations that explore the latent structure of that specific corpus. The claim is not that the model averages the collection but that it makes the collection’s latent organisation visible: its clusters and gradients, the visual affinities that run through different periods and media, the strange neighbours that proximity in latent space reveals. Whether this claim survives scrutiny is a legitimate question — critics have argued that Anadol’s work aestheticises data in ways that naturalise the technological apparatus rather than critically interrogating it — but it represents a meaningfully different artistic strategy from prompt-and-select. The artist’s creative intervention is in the construction of the training corpus and the design of the generative procedure, not in the selection of output images from a general-purpose model.

The working methods of serious generative artists involve layers of intervention that the discourse of “just typing words” consistently obscures. Inpainting (局部重绘) and outpainting (图像扩展) allow the artist to selectively regenerate parts of an image while holding other parts fixed, enabling iterative refinement of specific areas without losing the overall composition. ControlNet, introduced in 2023, allows the artist to specify structural constraints — pose, depth map, edge map, sketch — that guide the generation process toward a predetermined spatial arrangement while leaving stylistic choices to the model. Fine-tuning (微调) and techniques like LoRA (Low-Rank Adaptation, 低秩适应) allow the artist to train a relatively small set of additional model weights on a curated set of reference images, steering the model’s outputs toward a specific aesthetic territory that does not exist as such in the base model’s training data. These tools together mean that the practice of generative image-making, for a professional practitioner, involves sustained technical engagement, iterative refinement, and aesthetic decisions distributed across multiple stages of the workflow.

The authorship question generated by prompt-based work is genuine and legally unresolved. If the artist writes a prompt, selects from a batch of generated images, and presents the selected image as a work, the chain of creative decisions includes: the choice of subject and mood, the choice of model, the specific language of the prompt, the selection among outputs, possibly the post-processing of the selected output. If the artist writes a prompt, generates ten thousand images, selects one, and presents it, the selection process begins to resemble the curatorial model of found photography: the artist’s contribution is the eye that recognises something worth presenting in material they did not originate. If the artist trains a custom model on their own extensive body of hand-made drawings, generates images with that model, and selects among them, the model is more like a printing process applied to the artist’s own aesthetic — the creative intervention is upstream, in the drawing. The US Copyright Office’s position — that “sufficient human authorship” is required but must be assessed case by case — does not resolve these distinctions; it simply relocates the argument.

The analogy between prompt engineering and photographic composition is often invoked to claim that both involve creative choices despite not originating the raw material. The analogy is partially apt and partly obscures a difference of degree that may be a difference in kind. The photographer choosing a frame does not determine the physics of light; the photographer choosing a filter does not determine the chemistry of the emulsion. But the photographer chooses the subject, the moment, the placement, and the framing with specificity that a text-to-image prompt cannot match, because the model's interpretation of text is mediated by training associations the user cannot fully predict or control. The question of whether this difference matters for creative credit depends on whether creative credit attaches to specification or to outcome.

Chapter 4: The Philosophy of Machine Creativity

Margaret Boden’s The Creative Mind remains the most systematic philosophical account of what creativity is and the most influential attempt to assess whether machines can have it. Boden identifies three modes of creativity that she argues describe the full range of creative achievement in humans. Combinational creativity (组合创造性) involves producing novel combinations of existing ideas — a new metaphor, a new melody built from familiar motifs, a new composition of known visual elements. Exploratory creativity (探索创造性) involves producing outputs that exist within an existing conceptual space by systematically exploring it — finding the unusual corners, the unexpected juxtapositions, the previously unvisited regions of a space whose rules are already known. Transformational creativity (转化创造性) involves changing the rules of the conceptual space itself — not just exploring a known aesthetic territory but redefining what counts as a valid move within it, creating new possibilities that did not previously exist. Boden argues that only transformational creativity is genuinely radical in the philosophical sense, and that this form is the hardest to attribute to computational systems because it requires not only generating outputs but altering the framework within which outputs are evaluated.

McCormack and d’Inverno’s edited volume Computers and Creativity assembles a diverse range of perspectives on these questions from computational creativity researchers, cognitive scientists, and philosophers of mind. The consensus, if one can be identified in such a heterogeneous collection, is roughly as follows: computational systems are clearly capable of combinational creativity and arguably capable of exploratory creativity; the transformational case remains genuinely contested and philosophically deep. The difficulty is that transformational creativity, as Boden defines it, involves the creator recognising the limitations of a conceptual space and inventing new rules that transcend them. This recognition appears to require something like self-awareness about one’s own creative process — an awareness that current generative AI systems conspicuously lack. A diffusion model trained on the history of visual art cannot recognise that it is generating images within a particular aesthetic paradigm and choose to transgress that paradigm; it can only sample from what it has learned. Whether Elgammal’s CAN architecture, by building style-deviation into the training objective, achieves a computational proxy for this kind of transgression — or merely a statistical simulation of it — is a question the field has not resolved.

The intentionality objection (意向性反驳) is philosophically the most direct challenge to attributing creativity to generative AI. Most philosophical accounts of creativity include an intentional component: the creator intends to produce something that satisfies certain aesthetic criteria, pursues that intention through a process of evaluation and revision, and succeeds or fails in a way that is meaningful precisely because there was a goal. On this account, a rock worn smooth by a river into a beautiful shape is not a creative act because no agent intended the result; a human sculptor creating a smooth form with similar aesthetic properties is a creative act precisely because the form was intended. Diffusion models have no intentions — they are parametric functions that map noise vectors to image tensors, and the quality of the output is not an achievement relative to any internal goal. This does not mean that diffusion models cannot produce beautiful or novel images; it means that the beauty and novelty are in the eye of the beholder, not the product of the model’s striving. The artist who uses the model’s outputs as raw material may be creative; the model itself, on intentionalist accounts, is not.

Defenders of machine creativity typically respond by arguing for what might be called an outcome-based account (结果导向论) of creativity: what matters is whether the output is novel and valuable, not whether it arose from an intentional process. On this view, the question of whether a rock or a machine produced the smooth form is irrelevant to whether the form is beautiful; similarly, the question of whether a human or a diffusion model produced the striking image is irrelevant to whether the image is creative, if creativity attaches to outcomes rather than processes. Boden herself is not entirely satisfied with this response, because she argues that the value of creative outputs is bound up with what it means for an agent to have produced them — the biographical and intentional context is not separable from the aesthetic evaluation. But the outcome-based position has considerable intuitive support, and it is worth noting that everyday attributions of creativity often do not inspect the process: a musician who generates melodies “by accident” through improvisation is still credited with creativity.

Manovich’s provocation about the aesthetics of the average is worth returning to at this philosophical level. If a diffusion model produces statistically plausible interpolations in aesthetic space, and if genuine artistic creativity consists precisely in deviating from statistical expectation — in producing the image that breaks the rule, surprises the eye, refuses the comfortable average — then there is a structural argument that the outputs of standard text-to-image generation are necessarily not creatively significant, however technically impressive they may be. This argument has significant critical purchase, but it requires some care: fine-tuning, ControlNet-guided generation, and CAN-style adversarial style-deviation all create pressure toward less average outputs, and it is not clear that the “average” claim applies with equal force to carefully curated and trained custom models. The Benjamin connection makes the argument more historically layered: just as Walter Benjamin observed that mechanical reproduction transforms the aura (灵晕) of the original artwork by producing copies that are no longer tied to a specific time and place of origin, mass AI image generation transforms the value relationship between images and their originality — but in the direction of infinite individualised variation rather than mass copies of one original. Every AI image is in some sense unique — a specific sample from a high-dimensional distribution — but none of them has aura in Benjamin’s sense, because none of them is the trace of a particular human encounter with the world.

Aura (灵晕), in Benjamin's account, refers to the quality of an original artwork that derives from its embeddedness in a specific place, time, and tradition — its presence, its singularity, its historical testimony to the moment of its creation. Mechanical reproduction strips the aura from the original by making it available everywhere, detached from its original context. AI image generation is a different kind of aura-dissolution: the generated image has never had aura to lose, because it was never embedded in a specific human history. Whether this makes AI-generated images aesthetically less significant or merely differently significant is one of the productive questions this course does not resolve but keeps open.

Chapter 5: Key Artists and Practices

Refik Anadol has become, in the early 2020s, the most publicly prominent artist working with machine learning as a primary medium. His practice is organised around what he calls data sculpture: taking large, institutional datasets — the MoMA archive, NASA imagery, climate data, urban sensor streams — training machine learning models on them, and producing large-scale architectural projections that render the latent structure of the data as a continuously evolving visual field. The Unsupervised installation at MoMA, which opened in November 2022 and ran for most of 2023, placed a fourteen-foot-tall LED wall in the museum’s ground-floor lobby and used a custom model trained on the museum’s 200-year digital archive to generate a real-time flow of morphing, colour-saturated forms that referenced the visual styles and compositional patterns of the collection without directly reproducing any work. The critical reception was divided in ways that track the broader theoretical disputes about AI art: enthusiasts saw a genuine new form of collective memory-making, a technology that made the latent organisation of cultural heritage visible in a democratically accessible format; critics saw a spectacular aestheticisation of data that naturalised the computational apparatus rather than questioning it, and that converted the museum’s collection into raw material for a technological demonstration without any of the critical interrogation that engagement with machine vision demands.

Holly Herndon’s practice represents a practice-based inquiry into the specific question of the singing voice and AI, and is distinguished by the rigour with which it follows through its own commitments. Her release of Holly+ in 2021 — a model trained on her own voice that allows anyone to transform audio into her voice — is often described as a radical gesture of openness, but Herndon’s framing is more careful than that description suggests. She describes the model as an instrument that she has made available to others, but which carries the implied condition of attribution and the explicit desire for reciprocity: users who make music with Holly+ contribute to a shared understanding of what AI voice transformation does to identity, authorship, and artistic signature. Her work with Mat Dryhurst on Spawning, the initiative that built an opt-out registry for artists who do not want their work used in AI training datasets, represents a practical extension of the same ethical orientation: if voice and visual style are forms of identity rather than merely technique, then consent and attribution are not bureaucratic niceties but ethical prerequisites. The Spawning project and the associated Fairly Trained certification represent the most developed attempt to create market and legal infrastructure for an alternative to the extraction model that characterises current AI training.

Memo Akten occupies a position that is simultaneously inside and outside the generative AI world. His video work FIGHT (2022) and the ongoing Distributed Consciousness series use custom machine learning systems to make the model’s internal states visible — to show the viewer not a polished output image but the process of pattern completion, the network’s uncertain and tentative visual guesses as it receives partial information and attempts to fill in the rest. This forensic aesthetic strategy, which makes the machine’s inference process rather than its outputs the subject of the work, refuses the black-box opacity that characterises commercial image-generation systems and insists that understanding how generative AI works is itself an aesthetic and political act. Akten writes and lectures extensively on what he calls the “computational sublime” — the tendency to respond to powerful generative outputs with awe rather than interrogation — and his practice constitutes a deliberate counter-programme: works that provoke interrogation rather than awe, that make the strangeness of machine vision legible as strangeness rather than naturalising it as magic.

Trevor Paglen’s work is the most explicit of any established artist in using AI as a forensic instrument for revealing the politics embedded in training data and machine vision systems. ImageNet Roulette (2019), made in collaboration with Kate Crawford, used a web interface to classify user-submitted photographs according to the ImageNet dataset’s person-labelling taxonomy, revealing the extent to which that taxonomy — drawn from WordNet’s hierarchical categorisation of human beings — encoded racist, sexist, and otherwise dehumanising labels directly into the infrastructure of computer vision. The project was a critical success, and ImageNet removed the offensive person categories shortly after its publication. Paglen’s essay “Invisible Images (Your Pictures Are Looking at You)” (2016) provides the theoretical framework for this forensic practice: the claim is that the images proliferating through networked infrastructure are not primarily communications between humans but inputs into machine vision systems — that they are seen by machines in ways that humans cannot observe, and that these machine-readings carry political consequences that the human eye cannot perceive. This argument extends naturally to the context of AI training data: the billions of images that trained Stable Diffusion, DALL-E, and Midjourney were “seen” by those models in a specific way that encoded specific aesthetic, cultural, and political values into the model’s latent space. Making those values visible is, for Paglen, itself an artistic and political practice.

Anna Ridler’s work is perhaps the most searching response to the scale problem that underlies all the critiques of generative AI. Where the dominant mode of image-generation AI is trained on datasets of hundreds of millions of images scraped without consent from the internet, Ridler’s practice is organised around the creation of hand-drawn, carefully curated training datasets at deliberately small scale. Her Myriad (Tulips) project (2018) involved drawing 10,000 tulip images by hand and training a GAN on that corpus; the resulting generated images have a quality of intimacy and imperfection that is entirely absent from the smooth outputs of large commercial models, because the training data is itself intimate and imperfect. Ridler’s practice makes the scale assumption of AI visible as an assumption — the idea that more data is always better, that the internet constitutes a reasonable corpus, that the artist’s relationship to training data can be mediated entirely through algorithmic filtering — and demonstrates that refusing this assumption produces meaningfully different aesthetic and ethical results. The collective authorship problem — that most generative AI art depends on models trained on millions of unconsented images — is not resolved by Ridler’s approach, but her practice provides a working existence proof that it can be avoided.

The distinction between artists who use AI as a production tool and artists who use AI as a critical subject is theoretically useful but in practice rarely clean. Anadol uses large commercial datasets while also producing work that raises questions about data and memory. Herndon makes AI music while also building infrastructure to protect artists from unconsented AI training. Paglen critiques training data while also using machine learning systems to produce forensic visualisations. The more interesting question, in most cases, is not which side of the line a given practice falls on but where and how the critique is located within the practice — whether the critical interrogation is built into the method or added as a discursive supplement.

Chapter 6: The Legal and Economic Storm

The legal challenge to generative AI began in earnest in January 2023 when a class of visual artists including Sarah Andersen, Kelly McKernan, and Karla Ortiz filed suit in the Northern District of California against Stability AI, Midjourney, and DeviantArt. Andersen v. Stability AI argues that the training of Stable Diffusion on scraped internet images constitutes copyright infringement under US law, and that the model’s ability to generate images in the style of named artists — even when the original images are not reproduced — constitutes a continuing infringement in its outputs. The case turns on several contested legal questions. The most fundamental is whether the training process itself — downloading images and using them to compute gradients that update model weights — constitutes a reproduction in the legally relevant sense, or whether the model weights are so far removed from any specific training image that no reproduction has occurred. A related question is whether training falls under the fair use (合理使用) doctrine, which permits certain uses of copyrighted material for purposes including comment, criticism, research, and transformation. The defendant companies argue that AI training is transformative in the same way that a human artist learning from studying thousands of existing works is transformative; the plaintiffs argue that the analogy fails because the model can reproduce stylistic signatures to a degree that harms the market for the original artists’ work — one of the four fair use factors.

Getty Images filed a separate suit in the District of Delaware in February 2023, relying on a particularly striking piece of evidence: Stable Diffusion outputs sometimes contain visible, garbled versions of the Getty Images watermark, suggesting that the training data included watermarked Getty images and that the model had learned to reproduce the watermark as part of the visual style of “professional stock photography.” This evidence is significant not only as proof that specific copyrighted images were included in training but as a demonstration of the model’s capacity to reproduce statistical patterns from training data in outputs, even patterns as arbitrary and non-semantic as a watermark. Getty’s suit also raises the question of statutory damages (法定赔偿) under US copyright law, which can be substantial per infringed work if wilfulness is found — a calculation that, applied to billions of training images, generates liability numbers large enough to threaten the commercial viability of the defendant companies. Both suits were, as of the writing of these notes, still in early procedural stages; they are expected to generate significant legal precedent on questions that no prior case has addressed.

The Théâtre D’opéra Spatial controversy of August 2022 raised the cultural and ethical dimensions of AI-generated art in a more accessible register. Jason Allen, a game designer from Colorado, submitted an image generated using Midjourney to the Colorado State Fair fine arts competition under the category “Digital Arts/Digitally Manipulated Photography,” won first place, and disclosed his process after the award. The resulting controversy — covered extensively in ARTnews, The Atlantic, and mainstream news outlets — crystallised several questions simultaneously: Was it ethical to enter an AI-generated image in a competition implicitly understood to be about human artistic skill? Did the competition’s rules, as written, actually prohibit AI-generated entries? What disclosure obligations should competition organisers, galleries, and exhibition venues impose? And more fundamentally, had something genuinely new and artistically significant happened, or had Allen simply gamed a category to which he was not entitled? Allen maintained that his creative contributions — the extensive prompt engineering, the iterative selection and refinement across hundreds of generated images, the final post-processing in Photoshop — constituted genuine artistic authorship. Critics responded that the point of the competition was to recognise the development of craft skill, and that AI-assisted work, however much time it took, bypassed the cultivation of that skill entirely.

The US Copyright Office clarified its position in a series of guidance documents in 2023, taking the view that copyright in a work requires human authorship and that content generated entirely by AI without sufficient human creative control is not copyrightable. The Office was careful, however, not to exclude all AI-assisted works: a work that involves substantial human creative decisions — in the arrangement of generated elements, in the selection and editing of outputs, in the combination of AI-generated material with human-created material — may still qualify for protection in its human-authored elements. The case of Zarya of the Dawn, a graphic novel in which the text and arrangement were authored by Kristina Kashtanova but the images were generated by Midjourney, resulted in a registration that covered the text and arrangement but excluded the individual images from copyright protection. This outcome — partial protection for works with mixed human and AI authorship — is likely to become the dominant framework in US law, though its application to specific cases remains highly uncertain.

The economic picture for creative professionals is documented in a body of survey and market data that, while methodologically imperfect, is coherent in its direction. Getty Images reported a 25% decline in editorial illustration submissions in the eighteen months following Stable Diffusion’s release. Independent surveys of commercial illustrators found that a significant proportion reported reduced commission volume and downward price pressure attributable to AI competition. The game industry’s adoption of AI tools for concept art and asset generation was widely reported by artists who contracted for those services. The economic effect is not evenly distributed: highly distinctive artists with strong personal followings appear to be less affected than generalist illustrators working in common commercial styles, because their work depends on a specific voice that is harder to replicate by fine-tuning a general model. But the structural pressure is real, and it is being absorbed primarily by the workers rather than the technology developers.

The legal framework for AI and copyright will almost certainly be different by the time students taking this course enter professional practice than it is as these notes are written. The principles at stake — what counts as infringement, what counts as fair use, what counts as sufficient human authorship — are being litigated and legislated in multiple jurisdictions simultaneously. What this course can offer is not a definitive legal map but a set of conceptual frameworks for evaluating whatever legal framework eventually crystallises: the distinction between training-time and inference-time harm, the distinction between style and expression, the relationship between economic injury and cultural value.

Chapter 7: Critical Frameworks — Art, Technology, and Power

Walter Benjamin’s 1935 essay “The Work of Art in the Age of Mechanical Reproduction” is the foundational text for thinking about how new reproductive technologies transform the political economy of art, and it rewards rereading in the context of generative AI even though — and precisely because — the analogy is imperfect. Benjamin’s central observation is that mechanical reproduction destroys the aura (灵晕) of the original artwork by severing it from its embeddedness in a specific place and time, making it available everywhere as a reproduced image rather than as a unique presence. The political consequences, for Benjamin, cut both ways: on one hand, the destruction of aura could liberate art from its ritual function — its status as a quasi-sacred object belonging to an elite — and enable a mass political art that mobilises its audiences rather than contemplating them from a distance. On the other hand, the same reproducibility enables fascist aesthetics, the conversion of politics into spectacle, the management of mass emotion through technically sophisticated imagery that does not require any authentic encounter with the world. The question Benjamin poses is not whether mechanical reproduction is good or bad but what political uses will be made of the aesthetic transformation it enables. Generative AI sharpens this question: the tool that allows an artist to produce a politically engaged image in minutes also allows a political operative to produce synthetic propaganda at scale.

Claire Bishop’s Artificial Hells develops a different critical framework centred on the politics of spectatorship and participation in contemporary art. Bishop traces a history of participatory art — from the Situationists and Happenings of the 1960s through the relational aesthetics of the 1990s to the more recent turn toward social practice — and argues that the critical value of participatory work depends on how it constructs its participants as subjects. Work that mobilises audiences as co-creators with genuine creative agency is politically different from work that mobilises them as passive vessels for the artist’s vision, even when both are described using the vocabulary of participation. Generative AI complicates Bishop’s categories in productive ways. Text-to-image systems ostensibly democratise image-making by making everyone a potential “creator,” but the creativity on offer is fundamentally constrained by the model’s training data, the interface design, and the platform’s content policies. The participant in a Midjourney session is a co-creator in a technically meaningful sense and a consumer in a politically meaningful one: their creative choices are channelled through a system designed to produce commercially viable outputs within a specific aesthetic range. What kind of agency is being exercised, and for whose benefit, are questions Bishop’s framework brings into focus.

The decolonial critique of generative AI is among the most developed and urgently needed critical frameworks, and it follows directly from the composition of large training datasets. The internet, from which most generative AI training data is scraped, systematically overrepresents English-language, American, and Western European cultural production relative to the full breadth of world cultural output. Image datasets derived from internet scraping thus encode a specific geographic, linguistic, and cultural perspective as a statistical norm. When a model trained on such data generates “a village,” “a wedding,” “a god,” “a warrior,” or “a beautiful woman,” it generates images that reflect the statistical distribution of these subjects as they appear in Western English-language image repositories — which is a culturally specific distribution masquerading as a universal one. Artists, scholars, and activists working in this critical tradition — among them Kate Crawford, Timnit Gebru, and the contributors to the DAIR Institute — have documented these biases extensively and argued that they cannot be patched by adding more diverse images to existing datasets without also addressing the structural conditions under which visual culture is produced and distributed globally.

The labour critique extends the decolonial analysis to the conditions of production that underlie both the training data and the filtering processes that make large generative models deployable. The human artists whose images trained the models were not compensated or consulted, as the Andersen and Getty suits foreground. But the labour story does not end there. The content moderation work required to filter traumatic, illegal, and otherwise unsuitable content from training datasets and model outputs has been performed in many cases by workers in Kenya and the Philippines working under conditions of extreme psychological stress and economic precarity, as documented by TIME and other journalists. The technical achievement of a usable image-generation system rests on this invisible labour, which is systematically excluded from the origin stories that technology companies tell about their products. The “democratisation” narrative — in which AI image generation is celebrated as making creative power available to everyone — thus glosses over a distribution of power in which the economic gains flow primarily to technology companies while the costs, including psychological harm, are borne by workers in the Global South.

The environmental critique is the third axis of material concern and is perhaps the least visible in public discourse because it is structurally difficult to attribute. Training a large generative model requires substantial computational resources — the training run for a large diffusion model may consume energy equivalent to hundreds of transatlantic flights — and running the model at inference scale to serve millions of daily users adds ongoing energy and water costs at data centres. The aesthetic pleasures of infinite image generation are not free: they are subsidised by fossil fuel consumption and water use in regions that often face water stress already. This critique does not single out generative AI as uniquely harmful relative to other computational activities — social media, video streaming, and cloud computing all have significant environmental footprints — but it does raise the question of proportionality and necessity: what is the environmental cost of what is being produced, and who bears it? For artists whose practice is explicitly concerned with ecological questions, the tool selection question has a direct ethical dimension.

The decolonial critique (去殖民地批评) of AI, as applied to generative image models, is not simply the claim that training data is biased — a claim that can be addressed by dataset curation and supplementation. It is the more fundamental claim that the very framework of a universal image-generation model, trained on an aggregated global visual corpus and optimised to produce outputs that satisfy a mass market, reproduces the logic of cultural extraction that characterises colonial epistemology: treating the cultural production of non-Western, non-English-speaking peoples as raw material to be incorporated into a Western technological product whose design decisions and economic benefits belong to others.

Chapter 8: New Forms and New Questions — Video, 3D, Music, and Text

The extension of latent diffusion from still images to video was technically foreseeable from the moment image generation became fluent, but the pace of development was still striking. Runway Research’s Gen-2 model, released in 2023, demonstrated text-to-video generation at a quality level that made short, stylised clips visually convincing within their own aesthetic register — surreal, painterly, dreamlike, clearly “AI” in its characteristic temporal instabilities but controllably so. The release of OpenAI’s Sora in February 2024 raised the bar dramatically: Sora’s outputs demonstrated physical plausibility, multi-shot temporal coherence, and a degree of photorealism that, in cherry-picked examples, approached cinematic quality. The implications for documentary filmmaking and legal evidence are profound and immediately concerning. If a model can generate a video of a person saying and doing something they never said or did, with a fidelity that makes technical detection difficult, then the evidentiary status of video footage — which has been foundational to journalism, legal process, and historical record — is fundamentally compromised. This is not a future risk; it is a present condition. Deepfakes (深度伪造) based on earlier GAN technology had already demonstrated the political and legal dangers of synthetic video; latent diffusion brings those dangers to a qualitatively new level of accessibility and fidelity.

Three-dimensional space is the next frontier that generative AI is restructuring. Neural Radiance Fields (神经辐射场), abbreviated NeRF, introduced by Mildenhall et al. in 2020, demonstrated that a neural network trained on a set of 2D photographs of a scene taken from multiple angles could learn to represent the scene’s 3D structure and lighting as a continuous function, enabling the synthesis of novel viewpoints that are geometrically and photometrically consistent with the original photographs. Subsequent work on Gaussian Splatting (高斯飞溅) made similar 3D reconstruction fast enough for real-time rendering. These techniques have immediate applications in architecture, product design, cultural heritage documentation, and forensic reconstruction — and they have implications for spatial computing (空间计算) platforms like Apple Vision Pro and Meta Quest, which require rich 3D representations of environments. For the artist, they offer a means of generating immersive 3D environments from 2D sketches or photographs, and of bringing the latent-diffusion aesthetics of 2D image generation into three-dimensional space. The question of what authentic spatial presence means in an environment generated by a model trained on photographed spaces is one that architectural and experience-design practices are only beginning to confront.

The music generation landscape has developed in parallel with the visual domain, and students whose primary interest is in music-technology intersections should also engage with the CS 798 course materials for a more technically detailed treatment. The emergence of end-to-end song generation (端到端歌曲生成) — systems like Suno and Udio that produce a full song with vocals, instrumentation, arrangement, and lyrics from a short text prompt — represents, for music, approximately the same threshold-crossing that DALL-E 2 represented for images in April 2022. Before these systems, AI music tools assisted with specific aspects of composition or production — drum programming, melody harmonisation, stem separation — but required substantial musical skill to use effectively. Suno and Udio produce a finished song. The musical quality of the outputs is highly variable and often exposes the model’s limitations in long-form structural coherence, but the baseline is already high enough to be commercially threatening in specific genres — stock music, jingles, background music for video — and the trajectory of improvement is consistent with the image-generation precedent. Holly Herndon’s argument that the musician response to AI should not be resistance but rather the development of new practices that use AI to extend rather than replace the human voice is harder to maintain in the face of systems that generate fully produced vocals, but it remains a more generative response than the demand for prohibition.

The role of AI in literary production raises questions that partially parallel those in visual and musical art but also diverge in important ways. The large language models that power AI writing tools — GPT-4, Claude, Gemini — are trained on text in roughly the same way that image models are trained on images, with the same unresolved questions about consent, compensation, and cultural bias in training data. The distinctive challenge for literature is that the threshold between AI-assisted editing and AI-generated text is particularly difficult to police or even to define, because the technology has been absorbed into the workflow of countless writers without the kind of sharp “before and after” moment that characterised image generation in 2022. The Amazon reviewer problem — in which the flood of AI-generated reviews, product descriptions, and cheap fiction has degraded the quality of platforms that depend on authentic user-generated text — is a concrete example of how AI text generation creates market failures even without any individual act of fraud. The literary establishment has been slower than the visual arts to engage these questions critically, in part because the effects are more diffuse and in part because writers have been among the beneficiaries of AI writing tools in ways that illustrators, whose tools were more directly substituted, have not.

The multimodal (多模态) turn — the development of models that can reason across image, text, audio, and video simultaneously — represents the convergence point of all these domain-specific developments and introduces a qualitatively new figure: the generalist creative AI that can, in principle, originate and execute a creative work across multiple media from a single natural language description. GPT-4V, Gemini Ultra, and their successors accept image and text inputs and produce text outputs; future systems are moving toward full multimodal input-output across all media. For the artist, this convergence does not simplify the ethical and aesthetic questions but multiplies them: the training data questions, the authorship questions, the environmental questions, and the labour questions now apply to a system that can generate not just images or songs or stories but fully produced multimedia works. The figure of the generalist creative AI is both a genuinely new creative possibility and a concentration of all the unresolved tensions of the earlier domain-specific tools.

The word "democratisation" recurs throughout the discourse of generative AI across all media domains, and it is worth pausing on what the concept actually entails. Democratisation in the most literal sense means the distribution of power to previously excluded people. Generative AI distributes the capacity to produce certain kinds of images, music, and text to people who previously lacked the technical skill to produce them. But it does not distribute the ownership of the tools, the control over the training data, the revenue from commercial deployment, or the governance decisions about which content will be generated and which will not. The democratic capacity to produce an image is nested inside a set of undemocratic structures that shape and constrain it. Attending to both levels simultaneously is the beginning of a critical relationship with generative AI tools.

Chapter 9: Practice, Responsibility, and the Artist’s Position

Three positions are available to artists working with generative AI, and the distinctions between them are analytically useful even though in practice they are rarely found in pure form. The first position is that of the uncritical tool-user (无批判工具使用者): the artist who adopts generative AI as a production tool in the same spirit that a previous generation adopted Photoshop or 3D rendering software — as a means of producing images more efficiently, exploring visual ideas more rapidly, or achieving effects that would be impractical by hand. This position is entirely coherent as a studio practice, and many commercially successful generative AI artists occupy it without apology. Its limitation is not aesthetic but ethical and political: the uncritical tool-user implicitly accepts the extraction model, the labour conditions, and the environmental costs that underlie the tools, and their practice provides normalising cultural cover for those conditions without contributing anything to addressing them. The second position is that of the critical practitioner (批判性实践者): the artist who works with and against the model’s tendencies simultaneously, using the technology’s outputs as material while subjecting the technology itself to aesthetic and conceptual scrutiny. This is the position of Akten, Paglen, and Ridler, and it requires a degree of reflexivity about process that is demanding but not technically prohibitive. The third position is that of the advocate or activist (倡导者或行动者): the artist who uses the work to make the politics of the technology visible, building structures that change how the technology operates rather than just commenting on it. Herndon and Dryhurst’s Spawning project is the clearest example of an artist-initiated infrastructure intervention.

The disclosure question is practically urgent and theoretically unresolved. Exhibition venues, auction houses, competition organisers, collectors, and audiences all have potential interests in knowing whether and how AI was involved in producing a work. The strongest case for mandatory disclosure is made on the analogy with other forms of attribution: if a photographer used an assistant to set up the shot, this is typically considered material information; if a conceptual artist had a fabricator execute their designs, this is disclosed in exhibition contexts even when no one contests the conceptual authorship of the artist. AI involvement is at least as material to understanding a work as these precedents. The counterargument is that disclosure requirements impose an asymmetric burden on AI-assisted art that is not applied to other kinds of technological mediation: no one requires disclosure that a painting was made with synthetic pigments, or that a photograph was taken with a camera that includes AI-powered image stabilisation. The question of where to draw the line — what degree of AI involvement requires disclosure — is one that no authoritative body has yet resolved, and reasonable people disagree substantially about where to draw it.

The consent-and-compensation question is the one on which the most concrete work has been done, even if nothing like a settled regime exists. Holly Herndon and Mat Dryhurst’s Spawning initiative created an opt-out registry — Have I Been Trained? — that allows artists to check whether their work appeared in common AI training datasets and to register preferences about future use. The Fairly Trained certification, issued to companies that commit to using only consensually obtained training data, provides a market signal for artists and institutions that want to use AI tools built on a different ethical foundation. Legislative approaches have been proposed in the European Union and several US states, varying from transparency requirements — training data must be disclosed — to consent requirements — data cannot be used for training without permission — to compensation schemes modelled on collective licensing regimes in music. None of these approaches has achieved broad adoption as of the time of writing, but the direction of travel in most jurisdictions is toward greater transparency at minimum, and the political pressure from the creative industries lobby — far better organised than the individual plaintiffs in the current litigation — is pushing toward more substantive requirements.

The education question is one that fine arts programmes are confronting with varying degrees of deliberateness. At one end of the spectrum are programmes that have introduced prompt engineering and AI tool literacy as core studio skills, on the grounds that students who do not develop fluency with generative AI will be professionally disadvantaged in a market that has adopted these tools widely. At the other end are programmes that have characterised AI-assisted work as a form of academic dishonesty and prohibited it in studio courses, on the grounds that the cultivation of hand-craft skill and the development of a personal visual voice require processes that generative AI short-circuits rather than supports. Neither extreme is satisfying. The prohibition position ignores the reality of professional practice and the genuine creative possibilities that AI opens; the uncritical adoption position reproduces the tool-use stance discussed above, failing to provide students with the critical resources they need to practise ethically and to understand the technology they are using. The analogy with photography is historically instructive: when photography emerged in the nineteenth century, its relationship to painting was initially characterised as parasitic and threatening, then as a separate and lesser medium, then as an artistic practice in its own right with its own standards and possibilities. The trajectory was not linear, and the institutional response was slow and contested. The photography analogy does not tell us what to do about AI, but it does suggest that the question will be settled by practice rather than by declaration.

The question of visual voice and stylistic identity cuts to something deep about what an artistic career means. For most of the history of professional visual art, the development of a distinctive personal style — a way of seeing and marking that is recognisably one’s own — has been both the measure of artistic maturity and the economic basis of artistic reputation. Commissions, gallery representation, critical attention, and collector interest all depend on the ability to identify a body of work as the product of a particular sensibility. If AI models can, given a sufficient quantity of published work, generate images in the style of a named artist with enough fidelity to satisfy casual observers, then stylistic identity is no longer a secure basis for reputation or livelihood. The responses to this observation vary: some argue that style is the wrong thing to have built a career on, and that what AI exposes is the inadequacy of style as a proxy for artistic value; others argue that the problem is one of incentive — artists should no longer publish enough work to make their style replicable — which would have perverse effects on the circulation of art; others point to the legal framework, arguing that while style cannot currently be copyrighted, the legal situation may change. The most honest observation is that generative AI has introduced a genuine instability into a relationship between artistic identity and artistic property that was previously taken for granted, and that no response short of grappling with that instability honestly will suffice.

For the philosophical questions of authorship and creativity that this chapter opens onto — the conditions under which an artist can be said to own a creative act, the relationship between intention and value, the conditions under which collaboration (with a human, with a machine) modifies or transfers creative credit — the sustained philosophical investigation belongs to PHIL 459b (Philosophy of AI). For the legal governance questions at scale — the regulatory frameworks being developed internationally, the liability structures being litigated in the courts, the IP reform proposals being advanced in legislatures — PHIL 451 (AI Ethics, Law, and Governance) provides the systematic treatment. This course takes the position that artistic practice is not downstream of philosophy and law but is itself a form of working through these questions — that the artist who works carefully with generative AI, who is honest about process, who engages with the politics of the tools, and who uses the work to make something worth having in the world is contributing to the resolution of questions that cannot be resolved by argument alone.

Stylistic identity (风格身份), in the context of this course, refers to the constellation of aesthetic choices — colour, mark, composition, subject, mood — that makes a body of work recognisably attributable to a particular artist. It is distinct from copyright, which protects specific expressions, and from trademark, which protects commercial identity markers. The claim that an AI model can replicate stylistic identity is not a legal claim; it is an observation about the relationship between statistical pattern and aesthetic signature. Whether this observation should change the legal framework — by extending some form of protection to style in the context of AI training — is an open question in ongoing legislative and policy debates.

Chapter 10: Artist Voices — Interviews, Letters, and Manifestos

The discourse around generative AI and art has been shaped as much by artists’ direct statements — their letters, interviews, and manifestos — as by critical scholarship or legal proceedings. These statements form a body of primary material that is analytically irreplaceable: they reveal what is actually at stake for the people whose working lives and creative identities are most directly affected, and they often formulate the philosophical and ethical questions with a precision that academic commentary reaches only retrospectively. This chapter surveys the most significant of these statements, not as curiosities or as propaganda for one side of the debate, but as texts to be read with the same care as any other form of primary evidence.

Nick Cave’s response to a fan who submitted an AI-generated song written “in the style of Nick Cave,” published in January 2023 in The Red Hand Files, is probably the single most widely read artistic statement about AI-generated creativity. The fan had used ChatGPT to produce a verse-and-chorus lyric that the fan found impressive; Cave’s response is a model of measured but unambiguous disagreement. Cave begins by acknowledging that the lyric is “a mess of words” with “a clear absence of meaning, of life, of any authentic striving,” and then moves from aesthetic observation to philosophical argument. The core of his position is that a song — not merely a competent lyric but a song in the fullest sense — is not a pattern generated from a large corpus of prior songs but an act of testimony. “Songs arise out of suffering,” Cave writes, “by which I mean they are predicated upon the complex, internal human struggle of creation and, well, as far as I know, algorithms don’t feel.” What AI models do, he argues, is a kind of mimicry (模仿) that is philosophically distinct from creation: they reproduce the statistical surface of what songs look like without possessing any of the conditions — vulnerability, mortality, the desire to communicate with another human being across the silence — that make songs matter. Cave calls the ChatGPT lyric “a grotesque mockery of what it is to be human,” and he means something specific by “mockery”: not insult but a kind of caricature, an image that resembles its subject closely enough to be unsettling but lacks what the subject is actually made of. Cave’s letter circulated extraordinarily widely and is cited in most subsequent serious discussions of AI and artistic creativity precisely because it articulates, from within a distinguished artistic practice, an argument that the academic literature on machine creativity had approached from the outside.

Cave’s position should be read alongside, and not simply collapsed with, that of Holly Herndon, whose response to the generative moment is structurally different. Herndon does not deny that AI can produce music that is impressive by many measures; she argues instead that the question of what to do about it admits of more than one answer. Her Holly+ project, and the subsequent Spawning initiative, represents a position of conditional openness (有条件的开放性): she is willing to have AI trained on her voice and her music, but on terms that include attribution, reciprocity, and a model of consent that she controls. The philosophical difference from Cave’s position is not that Herndon thinks AI creativity is real and Cave thinks it is not; it is that Herndon thinks the political and economic conditions under which AI training and generation happen are what is primarily at stake, and that changing those conditions is a more tractable and more productive response than refusing the technology. Both responses are coherent, and the difference between them is not a disagreement about what AI does but about what artists should do about it.

Grimes’s April 2023 announcement that she would split royalties equally with anyone who used AI to generate music in her voice — and that she had released a voice model “GrimesAI” for this purpose — was framed by some commentators as a radical gesture of generosity and by others as a calculated branding move in a declining career. Neither framing captures what is most interesting about Grimes’s position, which is the explicit adoption of what she called an “open source” model for her own creative identity. Her stated rationale was that artists have always built on each other’s sounds and styles, that attribution and lineage are already acknowledged in the music world through sampling, cover versions, and acknowledged influences, and that the appropriate response to a technology that enables voice synthesis is not prohibition but a framework of fair dealing that acknowledges the creative debt. Grimes also expressed a genuine interest in what would be made with the model — a curiosity about the space of possible music that her voice could inhabit in other artists’ hands, framed as an artistic rather than a commercial proposition. Whether this interest survived the discovery that some AI-generated “GrimesAI” outputs were used in ways she found distasteful is a matter of subsequent record; the point is that her initial position articulated a coherent aesthetic philosophy of creative identity as a resource to be shared rather than a property to be defended.

Ai Weiwei’s engagement with AI questions draws on a practice already organised around the political dimensions of mass production, data, and state surveillance. His Sunflower Seeds installation (Tate Modern, 2010), in which the gallery floor was covered with 100 million hand-crafted porcelain seeds each individually painted by artisans in Jingdezhen, was explicitly about the relationship between mass production, collective labour, and the experience of the individual — about what it means that something appears to be identical and is in fact unique, and what happens to human value when labour is scaled to industrial quantity. Ai has been clear, in multiple interviews and public statements, that AI raises a version of the same question in a different register: it scales the production of images and text to a quantity that makes individual authorship appear irrelevant, and it does so using the labour of the artists and writers whose work trained the models, who are compensated by having their work incorporated into a system that now competes with them. Ai draws a direct line between the extraction of cultural labour in AI training and the forms of extraction — physical, economic, political — that he has spent his career documenting and resisting. He is also, notably, sceptical of the idea that AI produces anything that can be called imagination: “AI,” he has said in several contexts, “is very smart but has no fantasy.” By which he means not that AI lacks computation but that it lacks the capacity to be genuinely surprised by what it produces, to encounter what it makes as something strange that exceeds intention — which is, for Ai, what imagination essentially is.

David Hockney’s position on AI is the most emphatically traditionalist among major living artists, and it is worth engaging rather than dismissing, because it rests on a coherent philosophy of visual perception that Hockney has developed over decades in his writings and teaching. Hockney’s argument, expressed in numerous interviews and elaborated in his book Secret Knowledge (2001) on optics and painting history, is that the human eye is the organ of the most sophisticated form of attention that exists, and that the history of art is the history of finding ways to record and communicate what that attention has discovered. His critique of photography — and by extension of AI-generated imagery — is that it captures light mechanically without the cognitive and emotional processing that makes seeing meaningful. In interviews given after the rise of text-to-image systems, Hockney has returned consistently to the theme of the hand and the mark: the drawn line is not simply a record of a visual observation but an index of a human encounter with the world, a trace of attention and intention that is physically inscribed in the object. AI images, for Hockney, are images without marks — surfaces without touch — and the difference is not simply aesthetic but metaphysical. “The hand, the eye, the heart,” as Hockney has summarised it: all three are absent from the AI image, and their absence is what makes it, however beautiful, philosophically inert as art.

Brian Eno’s engagement with AI and creativity is shaped by his long-standing interest in generative music (生成音乐) and ambient composition — he coined the term “ambient music” in 1978 and has worked with rule-based and chance-based compositional systems for most of his career. Eno’s position is therefore not hostile to the idea of machine-generated audio in principle; his Music for Airports and its successors were made using tape loops and deliberately non-deterministic processes that bear more than a superficial resemblance to the sampling-from-a-distribution logic of diffusion models. His critique of contemporary AI creativity is accordingly more specific and more interesting than a blanket rejection. In a widely discussed 2023 interview, Eno drew a distinction between the kinds of generative systems he had worked with — systems in which the composer sets up rules and constraints and then releases control to discover what the system produces — and large neural-network-based generators trained on human cultural production. His objection to the latter is that they short-circuit what he calls the “slip of consciousness” — the moment in which the artist, having set something in motion, is surprised by where it goes, and discovers in that surprise something about themselves they did not previously know. In Eno’s account, this is the creative moment: not the execution of a plan but the encounter with an unintended consequence of one’s own decisions. Systems trained on human production and prompted by natural language instructions, he argues, do not produce this kind of surprise — they produce competent extrapolation from the training distribution, which feels like surprise to users who are unfamiliar with the training distribution, but is not the same thing as the artist encountering what they did not know they meant. Whether Eno’s distinction holds up under scrutiny — whether the AI artist who carefully assembles a training corpus and a generative procedure is in a different relationship to surprise than Eno suggests — is a question that the course encourages readers to investigate in their own practice.

Mario Klingemann, who works under the name Quasimondo, is among the most technically sophisticated and philosophically articulate artists to have been working with neural image generation since the GAN era. Klingemann’s practice predates the latent diffusion moment by several years: his Memories of Passersby I (2018), a GAN-based installation that continuously generates unique human portraits from noise, was exhibited at Sotheby’s — an institutional signal that AI-generated art had arrived in the traditional art market — and attracted substantial critical attention. Klingemann’s statements and lectures on AI creativity are notable for their refusal of both the enthusiast and the sceptic position. He argues, from direct experience, that working with neural networks introduces a genuine form of unpredictability and discovery that is meaningfully different from working with explicitly programmed systems: the model has learned regularities that the artist did not put there, and those regularities produce outputs that the artist finds genuinely surprising. But he is also clear that the model’s “creativity” is, in his phrase, latent (潜在的) in both the technical and the metaphorical sense — it is a potential that the artist activates rather than a drive that the model exercises on its own. The role of the artist, in Klingemann’s account, is to design the conditions under which the model’s latent creativity is most productively released — to be, as he has said, “the director of the AI’s dreams.” This framing is more nuanced than either the “AI is creative” or the “AI is just a tool” position, and it connects naturally to the philosophical debates about combinational and exploratory creativity that Chapter 4 examines.

Hito Steyerl is a filmmaker and writer whose theoretical engagement with machine vision, data, and labour predates the generative AI wave and provides some of the most rigorous critical resources for understanding it. Her 2016 e-flux essay “A Sea of Data: Apophenia and Pattern (Mis-)Recognition” argues that contemporary machine vision systems do not see the world but detect patterns in data that represent the world — a distinction with profound political consequences, because the patterns the system is trained to detect reflect the political priorities of those who labelled the training data. Her 2023 essay “Mean Images” in New Left Review addresses the generative AI moment directly, with the concept of the mean image (平均图像): the product of a generative model, she argues, is a statistical mean of its training data — an average of all the ways a subject has been depicted in the corpus — and it is therefore an image of what was considered representable, photographable, and uploadable in the culture that produced the training data. Mean images are not neutral summaries; they are the residues of power — of who had cameras, who had internet access, whose depictions were considered worth preserving and sharing. Steyerl’s critique connects directly to the decolonial analysis in Chapter 7, but adds a specifically semiotic dimension: the problem is not only whose images were taken but what those images meant to those who took them, and whether the model’s compression of those meanings into a statistical structure preserves or destroys them.

Paul McCartney’s use of AI to complete “Now and Then” — a John Lennon home recording from approximately 1978 that had been impossible to use because Lennon’s vocal was inextricably mixed with piano noise — represents perhaps the most emotionally complex case study in the debate about AI and artistic authenticity. Using audio machine learning software originally developed for the Get Back documentary (the same technology that separated Lennon’s voice from ambient noise in rehearsal recordings), McCartney and the surviving Beatles were able to isolate Lennon’s vocal and use it as the lead performance on a final Beatles recording released in November 2023. McCartney described the process in a June 2023 BBC interview: “We were able to take John’s voice and get it pure through this AI,” and he was careful to frame it as completion rather than creation, as fidelity to an existing intention rather than manufacture of a new one. The case is interesting precisely because it tests the limits of the standard critique of AI voice synthesis: the AI did not generate Lennon’s voice from statistical patterns learned from his recordings, as a GrimesAI-style synthesis would; it separated a performance that Lennon had actually made and that was physically captured on tape. The result is an artefact that raises, in acute form, the question of what presence an artist needs to have in a work for the work to be authentically theirs — a question that has no easy answer in this case, and that is illuminated rather than resolved by comparison with the GrimesAI situation in which the voice is generated rather than recovered. The emotional response to “Now and Then” — widely described as moving and even cathartic by listeners who grew up with the Beatles — is itself evidence that the question of authenticity is not purely philosophical but is bound up with the social relationships that art mediates.

The open letter tradition (公开信传统) in the creative professions has produced a body of collective statements that complement the individual voices above. The Authors Guild open letter of July 2023, signed by over 10,000 writers including Margaret Atwood, Jonathan Franzen, Roxane Gay, George R. R. Martin, Jodi Picoult, and John Grisham, called on AI companies to acknowledge that their large language models were trained on copyrighted text without permission, to seek consent from authors before training on their works, and to compensate authors fairly for the use of their work. The letter is notable for its explicit connection between the philosophical question (whose creativity does the model embody?) and the economic question (who should benefit from that creativity?): the authors argue that the two are not separable, that AI companies are profiting from the compression of human creative intelligence without sharing that profit with the humans whose intelligence was compressed. Visual artists have made analogous arguments in the context of the Andersen litigation and in collective statements by organisations including the Concept Art Association and the Visual Artists Alliance. The pattern across all these collective statements is the same: a rejection of the claim that AI training is equivalent to human learning, combined with demands for consent mechanisms and economic participation. Whether these demands will produce legal or regulatory change remains to be seen; what is clear is that the creative professions have organised around them in ways that have historical precedent in the collective responses to earlier technological disruptions — photography, photocopying, digital music distribution — even if the form of the present disruption is more fundamental.

What makes Cave's letter, Herndon's Spawning project, Eno's "slip of consciousness" critique, Klingemann's "director of AI dreams," Steyerl's "mean images," and McCartney's "Now and Then" jointly interesting is that they all engage, from different angles, with the same core question: what is the relationship between process and meaning in artistic production? Cave says the process — suffering, striving, the encounter with mortality — is what gives the meaning; Herndon says the process can be shared if the terms are right; Eno says the process requires a specific kind of surprise that AI prompting does not provide; Klingemann says the process involves the artist in a genuine relationship with the model's latent creativity; Steyerl says the process of statistical averaging produces political meaning whether or not anyone intends it; McCartney says the process of recovering what was already there is different in kind from the process of generating what was not. None of these positions is obviously right or wrong; they represent genuine philosophical alternatives that students should be able to articulate and evaluate from within their own practice.

Chapter 11: Tools, Communities, and the Generative Ecosystem

Understanding the practice of AI image generation requires a working knowledge of the tool landscape, which has evolved with unusual rapidity and continues to do so. The major publicly available systems as of 2024 divide roughly into two categories: closed API systems (封闭API系统) operated as commercial services with curated content policies, and open-weight models (开放权重模型) whose parameters are publicly released and which can be run locally or adapted freely. The first category includes Midjourney, DALL-E 3 (accessible through ChatGPT and the OpenAI API), and Adobe Firefly; the second includes the Stable Diffusion family and, since 2024, Flux. Understanding the practical and political differences between these categories matters for anyone whose practice depends on knowing what they are using and under what conditions.

Midjourney’s aesthetic evolution offers a case study in how a single commercial system’s design decisions shaped a generation of practitioners. Version 1, released in February 2022, produced images that were visually distinctive in a way that was immediately recognisable — soft, painterly, slightly dreamlike, with the characteristic visual artefacts of an immature diffusion architecture. Version 4 (late 2022) and Version 5 (March 2023) produced a dramatic leap in photorealism and compositional sophistication that coincided with the peak of public attention to AI-generated imagery. Midjourney’s characteristic aesthetic signature (特征美学签名) — high contrast, rich detail, dramatic lighting, a tendency toward the grandiose and the cinematic — was not a neutral technical outcome but a design choice embedded in the model’s training curation and the fine-tuning decisions of its team. The “MJ5 look,” as practitioners described it, became so pervasive on social media that it produced its own aesthetic fatigue, and Version 6 (December 2023) was specifically designed to expand the system’s aesthetic range, reduce the frequency of the characteristic grandiose treatment, and improve accuracy to text prompts. The evolution of Midjourney’s versions illustrates a principle that is easy to overlook: the aesthetic character of a generative system is not purely a function of scale and training data size but of the specific creative decisions made in training curation, fine-tuning, and model evaluation.

The Stable Diffusion ecosystem represents a different and in many ways more philosophically interesting model. The original Stable Diffusion release in August 2022, developed by Stability AI in collaboration with academic research groups, was released under a licence that permitted free use, modification, and redistribution subject to some restrictions on harmful use. This release decision — which Stability AI’s then-CEO Emad Mostaque framed as a gesture of radical democratisation — had consequences far beyond what any commercial gating policy would have allowed. Within months of the release, a global community of developers, artists, and hobbyists had produced thousands of fine-tuned model variants, embedding tools, prompt exploration interfaces, and extensions. The community (社区) that formed around Stable Diffusion was not primarily commercial; it was driven by curiosity, technical interest, and a genuine commitment to open exploration of what the technology could do. The primary community interfaces — Automatic1111 (officially “Stable Diffusion Web UI”), a browser-based interface that consolidated most of the technical capabilities into a single, accessible application, and ComfyUI, a node-based graphical interface that allows fine-grained control over the generation pipeline — became the de facto standard environments for serious practitioners. ComfyUI in particular, with its explicit representation of the generation pipeline as a graph of interconnected processes, has the educational virtue of making the architecture of generation visible and manipulable rather than hiding it behind a conversational interface.

The LoRA (Low-Rank Adaptation, 低秩适应) technique, introduced in 2022 and rapidly adopted by the Stable Diffusion community, represents perhaps the most important development for artistic practice since the original diffusion model release. LoRA allows a practitioner to train a small number of additional model weights — far fewer than the billions of parameters in the base model — on a custom dataset of perhaps twenty to one hundred images, and to load these trained weights as a lightweight modifier that steers the base model toward the aesthetic territory represented in the custom dataset. The practical effect is that any artist can train a LoRA on their own body of work — their drawings, paintings, photographs, or illustrations — and generate new images in the aesthetic territory they have developed through years of manual practice. This has been described, from opposite political positions, as either the artist finally being able to use AI without losing their voice (because the model is adapting to them rather than them adapting to the model) or as the final completion of the extraction logic (because the artist must provide their own work as training data for a system that will then compete with them commercially). Both observations have force, and the choice between them depends on whether the artist controls the weights they train and under what conditions they share or commercialise them.

Civitai (an abbreviation of Civilian AI, somewhat humorously named), a community platform launched in late 2022, became within a year the dominant distribution hub for Stable Diffusion fine-tuned models, LoRAs, embeddings, and sample images. The platform operates under a permissive philosophy that has made it the repository of choice for practitioners wanting to share work-in-progress, experimental models, and style-specific adaptations — and also a site of significant controversy. The platform’s permissive content policies allowed the distribution of models fine-tuned specifically to generate sexual content of named real persons, including celebrities, which resulted in policy changes and partial content restrictions after sustained public criticism. The Civitai controversy is instructive as a case study in the governance of open-weight model ecosystems: when model weights are freely distributable and the tools to fine-tune them are freely available, no single platform can effectively gate their use, and the choice of where to draw the content policy line is as much a statement of community values as a technical enforcement decision. Artists who participated in the Civitai community frequently found their own model-training work shared alongside content they found ethically objectionable, raising questions about complicity and association that have no clear institutional resolution.

The political climate of the AI art community has been marked, since late 2022, by sustained conflict between practitioners who adopt generative AI tools and practitioners who oppose them, a conflict most visibly played out on ArtStation — the primary portfolio and community platform for commercial illustrators and concept artists. In December 2022, a coordinated protest campaign led to thousands of users uploading images bearing “No AI Art” logos as a gesture against the presence of AI-generated images on the platform; ArtStation responded by introducing a voluntary opt-out tag that allowed artists to mark their work as not consenting to AI training, while stopping short of banning AI-generated work from the platform. The protest produced no policy change of substance but demonstrated the depth of feeling and the organisational capacity of the anti-AI art movement within the professional illustration community. The specific grievances were not abstract: ArtStation had been one of the primary sources of images in widely used AI training datasets, and many artists found their work being used to train models that then competed for the commissions they depended on. The platform-level conflict on ArtStation mirrors and contextualises the litigation being pursued in the courts; both are expressions of the same underlying political conflict over who controls the cultural archive of professional visual art.

Adobe Firefly, launched in beta in March 2023 and integrated into Photoshop and Illustrator in subsequent updates, represents a deliberate attempt to build a commercially viable generative AI system on a foundation of consensual training data. Firefly is trained on Adobe Stock images, openly licensed content, and content in the public domain — explicitly excluding images scraped from the broader internet without consent. Adobe has also introduced a Content Credentials system, based on the open C2PA standard (Coalition for Content Provenance and Authenticity), that embeds machine-readable provenance information in generated images indicating that they were produced with AI. The Content Credentials (内容证书) initiative represents the most developed corporate attempt to address the disclosure and provenance problems that the court cases and community conflicts have raised; it does not solve the consent problem for the majority of training data used by other systems, but it establishes a practical standard that legislators and platform operators can point to as evidence that provenance tracking is technically feasible. For professional practitioners who need to demonstrate to clients, publishers, and platforms that their work meets emerging AI-disclosure standards, Firefly’s infrastructure offers a concrete and commercially supported solution.

Flux, developed by Black Forest Labs (a company founded by the researchers who originally developed the Stable Diffusion architecture) and released in August 2024, represents the state of the art in open-weight text-to-image generation at the time of writing. Flux’s architecture moves beyond the latent diffusion approach of Stable Diffusion to a flow matching (流匹配) framework that produces higher-quality outputs with faster sampling and better text-following than diffusion-based predecessors. The release of Flux as an open-weight model — with variants at different sizes for different use cases — continued the tradition of open-weight image generation established by Stable Diffusion and extended by SDXL. The rapid adoption of Flux by the ComfyUI and Automatic1111 communities demonstrated the resilience of the open-weight ecosystem: within weeks of release, Flux was the dominant model for technically sophisticated practitioners, and the LoRA and fine-tuning infrastructure had been adapted to work with the new architecture. For the artist, the practical implication is that the state-of-the-art tool is freely available, locally runnable on a high-end consumer GPU, and customisable to a degree that closed API systems do not permit. The trade-off is a steeper technical learning curve and the absence of the safety filtering that commercial systems build in — which places more responsibility on the practitioner to make considered choices about what they generate and share.

The adoption of generative AI in professional game development and film production has proceeded along a trajectory that initially prioritised concept art generation — using AI tools to rapidly iterate on visual ideas before committing to expensive hand-crafted artwork — and has since expanded to asset generation, environmental design, and in some cases final production imagery. The professional discourse around this adoption has been characterised by a tension between two truths that are both real and difficult to hold simultaneously: that the tools genuinely accelerate certain parts of the creative pipeline and enable exploration that would be economically impractical otherwise, and that their widespread adoption has reduced the economic demand for entry-level and mid-level illustration and concept art work. Companies that adopt AI tools for concept art typically describe them as liberating artists from routine production tasks to focus on higher-level creative decisions; artists who have lost work to those same tools typically describe their former roles not as routine but as the foundation of professional development — the “lower-level” work through which junior artists developed the skills and relationships necessary to advance. The divergence between these perspectives is not purely a difference of interests; it is also a difference in how the concept of creative work is understood. Whether the profession restructures around AI in a way that preserves viable entry paths for new practitioners, or concentrates skill-intensive work among a smaller number of senior artists while AI handles the volume, is a question that the next five years of practice will begin to answer.

The contrast between the Midjourney/DALL-E 3/Firefly closed-API ecosystem and the Stable Diffusion/Flux open-weight ecosystem is not merely technical but political. Closed systems make content filtering decisions on behalf of their users, control the training data without disclosure, capture the commercial value of user prompts as feedback for model improvement, and provide infrastructure that makes regulation feasible because there is a single identifiable actor to regulate. Open systems distribute those decisions — and those responsibilities — across the entire community of users and developers, making regulation difficult but also distributing the benefits of the technology more widely and the creative control more fully to the individual practitioner. The choice between these ecosystems is, for the serious practitioner, also a political choice about what kind of AI development trajectory they want to participate in and support.
Back to top