WA authors are teaching AI how to write — without their consent

Companies like Meta and Bloomberg draw upon a database of 191,000 books to train the tools. Local writers aren’t happy, and lawsuits are in the works.

A panel of 8 book covers in two rows of 4

Many Northwest authors recently discovered that their books — including the eight above — are being used without permission to train generative AI via the Books3 database.

For many authors, the act of writing is deeply personal. There’s vulnerability in placing each punctuation mark, in the construction of each sentence, in the creation of an entire book — not to mention the life experiences brought to bear in the text. 

So what happens when writers find out work that took years, maybe decades, to create, is being used without their permission to train futuristic robots that might one day replace them?

It sounds like something out of a dystopian novel, but it’s the reality that thousands of authors are living today. 

In July, authors Sarah Silverman, Richard Kadrey and Christopher Golden made news when they filed a lawsuit accusing Meta of violating copyright laws by using their books to train Meta’s language AI, called LLaMa. Freelance writer and programmer Alex Reisner investigated. What he found was, for many authors, their worst nightmare.

In August, Reisner obtained and revealed a database, known as Books3, that includes published works used to train generative-AI programs for Meta, Bloomberg and other companies. The authors of the books were not asked for permission; the database is pulled from pirated e-books published in the past two decades.

Reisner shared in a story for The Atlantic that books by Silverman and others filing the lawsuit were not the only ones training LLaMa — the database contained approximately 191,000 titles (183,000 included author information). But the contents of Books3 were still a mystery to the public. 

So in late September, Reisner published a search tool that allows anyone to plug in a name to determine if their works are in the database, which includes novels, short stories, nonfiction, poetry collections and other books.

Works by Northwest authors in the Books3 database range from Rick Steves’ travel guides to Julia Quinn’s Bridgerton romance novels to Ijeoma Oluo’s bestselling nonfiction title So You Want to Talk About Race.

Reactions from authors on social media have been mixed.

Seattle science fiction writer Cherie Priest wrote on X (formerly known as Twitter), “F.I.F.T.E.E.N. of my books were used w/o permission to train an algorithm to imitate me.” Others have suggested they’re flattered to have been included (“I have arrived,” joked Tacoma-raised essayist Briallen Hopper on Instagram). 

Canadian poet Christian Bök, author of the collection Eunoia, wrote on X: “I am honoured to discover that EUNOIA appears among the dataset of Books3 ... used to train the minds of our futurist machines (which, like any of our children, do not need our permission to become literate).”

Crosscut checked in with a few local writers about how it felt to discover their books used in service of AI without their permission. Of those who responded, most are not pleased — some expressed anger and feelings of violation, others felt helpless.

Still others — including Bainbridge-based novelist Jonathan Evison, whose titles Lawn Boy, West of Here and two more are in Books3 — voiced pessimistic thoughts about the possibility of AI replacing authors altogether. “If Marvel movies can replace real cinema, who knows?” he told Crosscut. “The bar is only getting lower.”

Responses have been condensed for length.


Charles R. Cross

Genre: Nonfiction
Books in database: Heavier Than Heaven: A Biography of Kurt Cobain; Here We Are Now: The Lasting Impact of Kurt Cobain; Kicking & Dreaming: A Story of Heart, Soul, and Rock & Roll; Classic Rock Albums: Nevermind: Nirvana 

“It’s difficult as it is to make a living as a writer,” Charles R. Cross wrote in an email. “I’ve had several of my books ‘bootlegged’ in countries where the books weren’t released yet … but that’s small potatoes to the idea that a tech company could use AI to copy a writer’s ‘style,’ or ‘voice,’ the one thing that is central to the work of an author.”

But Cross isn’t worried about AI replacing human authors anytime soon. Referencing his upcoming book about Seattle’s cultural scene in the 1980s and ’90s, he noted, “AI will never know what it smelled like in the bathroom of [clubs like] the Vogue or the Offramp. And trust me, nothing AI can come up with could compare to the real world.” 

David Guterson

Genres: Fiction and nonfiction
Books in database: Snow Falling on Cedars; Descent: A Memoir of Madness; Ed King; Our Lady of the Forest; Problems With People; The Country Ahead of Us, The Country Behind; The Other

“I figured that with 180,000-plus books, there was a good chance mine might be one of them,” Bainbridge-based author David Guterson wrote in an email to Crosscut. “I feel like it would be better for them to ask permission and pay a fee.” As for the future, he posited, “I think we’re going to end up with books written by both humans and AI.”

Kira Jane Buxton

Genres: Science fiction
Book in database: Hollow Kingdom

“It is a little ironic for me given that an allegorical theme in my novel is the dark side of tech dependency,” wrote Kira Jane Buxton. “A writer’s voice takes time to develop. Years of practice, perseverance, rejection.”

Buxton is hopeful that AI won’t render authors obsolete, but she’s wary. “I like to imagine that there isn’t an artificial substitute for the … human life that each writer brings to the page,” she wrote. “That readers will always demand a depth of emotional resonance and creative thinking that is uniquely human. But AI is already very good at replication.”

Will Taylor

Genres: Children’s and middle-grade fiction
Books in database: Maggie & Abby and the Shipwreck Treehouse; Maggie & Abby’s Neverending Pillow Fort

Children’s book author Will Taylor wrote that he didn’t check the Books3 database to see if his works were included because he knew he was powerless.

“It feels like my work is being reduced to a mulch or chicken feed being shoveled into a machine to make someone else money,” Taylor wrote. “To have someone else take my finished product and treat it like their own starting point is deeply frustrating.”

Taylor shares Guterson’s point of view, believing that the future likely holds books written entirely by AI. That said, Taylor believes that readers will gravitate toward books written by humans. 

“AI cannot create, it can only rearrange, and human lives are inherently creative,” he wrote. “We will continue to want to see ourselves in books and stories, and hear from other humans, and share the lived experiences AI will never be able to do more than imitate.”

Cherie Priest

Genre: Science fiction
Books in database: Boneshaker; Dreadnought; Fiddlehead; Ganymede; Hellbent and 10 others

“AI cannot and will not replace human creativity because it fundamentally lacks any creativity of its own,” wrote Seattle-based author Cherie Priest. “It can’t generate anything new; it can only hoover up, remix, and spit out muddied approximations of existing media at the expense of those who wrote it — and who depend upon it to earn a living. It’s less ‘artificial intelligence’ than ‘thieving predictive text algorithm.’”

Laura Anne Gilman

Genre: Fantasy
Books in database: Flesh and Fire; Red Waters Rising; The Cold Eye and six others

Fantasy author Laura Anne Gilman agrees with Priest, saying the use of her works without consent and without compensation is theft.

“The real irony of this, to my mind, is that if they had approached authors ahead of time, offering even a token payment for use, they probably would’ve gotten a lot of takers,” she wrote. “They might’ve even gotten it for free from some folk, just to be part of the experiment. But they didn’t ask. They stole. And now we want compensation.”

Kat Richardson

Genres: Fantasy and mystery
Books in database: Six books in the Greywalker series; Indigo: a Novel; Mean Streets

“I recognize that AI is something which we cannot escape and which is not, of itself ‘evil,’” wrote Seattle-based author Kat Richardson. “It’s naive to expect that AI writing will simply go away because we object to it. It won’t. What I dislike and object to profoundly is the assumption of ownership committed by the corporations involved. That pisses me off.”

What’s next? 

The Books3 news has prompted many writers to take a stand. “Writers deserve and therefore must demand better protections and permissions for our work,” Buxton wrote. 

Richardson noted the goal should be “to create guidelines for the ethical application of AI to creative fields, and to craft law and practice that benefit, support, and encourage human creativity, that protect the creators’ ownership of their work and ensure that they can profit from it ahead of all others … like copyright was initially intended to do.”

Several current lawsuits have been brought by authors against AI companies, including the one filed in July against Meta and OpenAI by the group with Sarah Silverman.

The Authors Guild, along with internationally known authors such as John Grisham and Jodi Picoult, filed a class-action lawsuit against OpenAI for copyright infringement in September, a few days before the entirety of the Books3 database was revealed. Though the Guild is not naming more plaintiffs at this time, it said a positive outcome for authors should benefit all writers.

Many of the Washington authors affected are in support. Cross wrote in an email to Crosscut that he’ll join any class-action lawsuit that goes forward; Guterson wrote that he’s happy to be represented by the Authors Guild, as he wouldn’t know how to navigate fighting back on his own. Gilman believes these lawsuits should only be the start, emphasizing that this sort of usage and AI training should go forward only when creators consent to it.

Alex Alben, co-founder of Seattle-based think tank The AI Forum and a professor on privacy and cybersecurity at the University of Washington and the University of California, Los Angeles, said the main issue that should be addressed in these lawsuits is whether copyright was violated.

“I’m an author of both fiction and nonfiction and I can understand how the idea of somebody using my book in some way in a dataset might be off-putting, but I think the real issue is whether an AI tool or any other user is violating the copyright in my work,” Alben said in a phone interview. 

“If a database has hundreds of thousands of books in it, and if a user asks a tool such as ChatGPT to generate an answer to a query,” he said, “the issue should be whether the copyright in any given book has been violated.” 

As a fellow author, Alben is sympathetic to writers, but said there’s a central question to answer: Is training AI with published materials any different from authors drawing upon works they’ve read and consumed as their own “human training set” in their books? 

In other words: Is it different from a writer penning a story based on Pride and Prejudice? From styling an authorial voice based on Sylvia Plath’s poetry or Charles Dickens’ satire without credit? 

Perhaps many will say yes, standing firmly on the side of human cultural transmission. But as generative AI continues to advance, the boundary is likely to shift regarding what’s fair use and what is not — and what should be legal and what should not.

Get the latest in local arts and culture

This weekly newsletter brings arts news and cultural events straight to your inbox.

By subscribing, you agree to receive occasional membership emails from Crosscut/Cascade Public Media.

Please support independent local news for all.

We rely on donations from readers like you to sustain Crosscut's in-depth reporting on issues critical to the PNW.

Donate

About the Authors & Contributors