Dear Artists: Do Not Fear AI Image Generators

True, new systems devalue craft, shift power, and wreck cultures and scenes. But didn’t the piano do that to the harpsichord?
Illustration of AI generating art of robots
ILLUSTRATION: ELENA LACEY

In 1992, the poet Anne Carson published a little book called Short Talks. It’s a series of micro-essays, ranging in length from a sentence to a paragraph, on seemingly disconnected subjects—orchids, rain, the mythic Andean vicuña. Her “Short Talk on the Sensation of Airplane Takeoff” is what it sounds like. Her “Short Talk on Trout” is mostly about the types of trout that appear in haiku. In what passes for the book’s introduction, Carson writes, with dry Canadian relatability, “I will do anything to avoid boredom. It is the task of a lifetime.” Right about when she published that, the internet started to take off.

Fast-forward 30 years and one of the latest ways to avoid boredom, at least for me, is to stay up late and goof around with AI image generation. Tools such as DALL-E 2, Midjourney, and Stable Diffusion can be instructed, with textual prompts, to produce ersatz oil paintings of dogs in hats in the style of Titian, or simulated photos of plasticine models of astronauts riding horses. When I first started playing with Stable Diffusion—which is open source and very fun—I was reminded of Carson’s talks. I went back to them to figure out why. Pretty quickly I realized that the resemblance had something to do with form.

Everyone says content is king, but the secret monarch of the content economy is form—constraints, rules, minima and maxima. You grow up learning form. A high school essay is five paragraphs. Sitcoms leave eight minutes in the half hour for ads. Novels are long. Tweets are capped at 280 characters.

What differentiates my tweet or essay or studio film from yours? The choices each of us makes within the form. In a word, our style. Carson’s book takes a familiar form, the little lecture, and subverts it, manipulates it, until as the reader you start to feel like you’re inside her wonderful brain, scrolling through her mental browser history, joining her in hyperlinked fancies and half-abandoned rabbit holes. Image generation is kind of like that—but instead of communing with a single brilliant Canadian brain, you’re communing with a giant idiot world-brain. (A less neurological way to put it: vast numbers of data objects grouped in layers, connected together to an incomprehensible degree, like string-and-nail wall art of a many-masted clipper ship but on fire with the flow of data.)

In general, humans like using machine learning to assist pathologists, sharpen a phone photo, or make a better map. But the AI generators bug a lot of people. These tools work by spidering images from across the internet, absorbing the visual culture contained within them by scanning their captions, then adding fizzy visual noise to them until they look like static. To make a new image, the AI starts with a caption and some static, then runs the process backward, removing noise until an image appears that lines up with the caption, more or less. (It’s bad at drawing hands, but so am I.)

This feels gross. It feels gross to see artists databased into oblivion. It feels gross that someone could say to a computer, “I want a portrait of Alex Jones in the style of Frida Kahlo,” and the computer would do it without moral judgment. These systems roll scenes, territories, cultures—things people thought of as “theirs,” “their living,” and “their craft”—into a 4-gigabyte, open source tarball that you can download onto a Mac in order to make a baseball-playing penguin in the style of Hayao Miyazaki. The people who can use the new tools will have new power. The people who were great at the old tools (paintbrushes, cameras, Adobe Illustrator) will be thanked for their service and rendered into Soylent. It’s as if a guy wearing Allbirds has stumbled into a residential neighborhood where everyone is just barely holding on and says, “I love this place, it’s so quirky! Siri, play my Quirky playlist. And open a Blue Bottle on the corner!”

So naturally, people are upset. Art websites are banning AI-generated work, at least for now; stock image services are refusing it too. Prominent bloggers who experimented with having an AI illustrate their writing have been chastened on Twitter and have promised not to do it again. AI companies are talking a lot about ethics, which always makes me suspicious, and certain words are banned from the image generator’s interface, which is sad because I wanted to ask the bot to paint a “busty” cottage in the style of Thomas Kinkade. (One must confront one’s deepest fears.)

Don’t cancel the messenger, but come on: Image generators are going to be baked in everywhere, used for an enormous range of good, evil, or horny purposes. In a decade, or 10 minutes (time is blurry around this stuff), we’ll be saying things like, “Computer, make a version of Die Hard where all the characters are corgis.” Then we’ll post it to YouTube, which will use machine learning to make sure the film studio gets its pre-negotiated cut for the audio track. Then other systems will download the video and decide that there is a connection between the voice of arch-terrorist Hans Gruber (portrayed by Alan Rickman) and corgis, which will result in a rogue AI-enhanced compression algorithm replacing all instances of Snape in Harry Potter with a corgi, leading to the Great Corgi Cinematic Snowball Virus of 2024, after which all filmed entertainment will feature only corgis and the occasional crossbreed like corgipoos and borgles. This will ruin Game of Thrones but will make The Purge adorable.

Remember: In the days of powdered wigs, musicians who liked the pluckiness of the harpsichord complained that the piano sounded soft and dull. Much later musicians (and their unions) fought the synthesizer, fearing it would bleep-bloop careers into oblivion. New systems always seem, at first, to devalue craft, shift power, and wreck cultures and scenes. This is because they do all of that. And we, downstream in time, invariably fall prey to the historical fallacy and go, Oh, those worrywarts! How stubbornly they clung to their harpsichords. We know that, without the piano, there’d be no Shostakovich or Satie or Margaret Leng Tan; without synths, no Wendy Carlos, Kraftwerk, or Pet Shop Boys.

I asked GPT-3, an AI text generator, to write me “a Short Talk on trout in the style of Anne Carson.” It replied: “Trout are most active in the early morning and late evening, so those are the best times to go fishing.” I went back to the original. Of the trout found in haiku, Carson writes: “Worn out, completely exhausted, they are going down to the sea.” I think we can agree that the Canadian brain wins this one. But we do not have to choose between, on the one hand, an unthinking digital pseudobrain and, on the other, the artifacts of a single human mind. The miracle of the age is that we can learn from both, whenever we like. Anything to avoid boredom.


If you buy something using links in our stories, we may earn a commission. This helps support our journalism. Learn more.

This article appears in the November 2022 issue. Subscribe now.