Artificial intelligence can imitate the style of famous authors with only two books.
According to the Iran Book News Agency (IBNA), citing The Decoder magazine, a joint study by Stony Brook University and Columbia University Law School reveals that artificial intelligence models can produce texts that surpass even professional imitator writers in style and quality, simply by being trained on just two works by an author. These findings could have significant implications for copyright laws and ongoing legal disputes in the United States.
In this study, professional writers and three advanced artificial intelligence systems, including GPT-4o, Claude 3.5 Sonnet, and Gemini 1.5 Pro, were asked to write texts in the style of fifty well-known authors, including Nobel and Booker laureates. Subsequently, 159 participants, comprising 28 writing experts and 131 general readers from the Prolific platform, evaluated the quality and stylistic resemblance of the works without knowing their origin (human or machine).
The results showed that while human texts were preferred in conventional ‘in-context’ (prompting) methods, specially fine-tuned versions of the GPT-4o model were recognized as significantly superior. Experts chose AI-generated texts for stylistic similarity 8 times more often and preferred them for writing quality twice as often.
According to the researchers, the difference in the amount of training data also had no impact on the outcome; authors with only two published works, like ‘Tony Tulathimutte,’ were as imitable as prolific and renowned authors such as ‘Haruki Murakami.’
On the other hand, the estimated cost of training such a model for each author is about $81, whereas commissioning similar work from a professional writer costs at least $25,000; a price reduction of nearly 99.7 percent.
Researchers have emphasized that this capability could create new challenges for copyright law. Currently, US courts are involved in cases related to the illegal use of authors’ works in AI training. In one such case against ‘Anthropic,’ it was revealed that millions of books were obtained from illicit sources like ‘LibGen’ and ‘Pirate Library Mirror’ and used to train models.
The authors of this research have warned that if audiences prefer AI-generated imitations over original works, it could be considered an instance of ‘market harm,’ a concept that plays a crucial role in US copyright law.
They believe a distinction should be made between general AI models and models specifically trained to imitate an author. The researchers’ final recommendation is to prohibit the use of individual works for imitative training or to mandate clear labeling of AI-generated texts.