Physical Address
304 North Cardinal St.
Dorchester Center, MA 02124
Physical Address
304 North Cardinal St.
Dorchester Center, MA 02124

Like every other technology company out there, Adobe has leaned heavily on AI over the past few years. The software company has launched several AI projects from 2023, including Firefly-AI-powered media-generation suite. Now, however, the company’s full embrace of technology may have led to problems, as a new lawsuit alleges that it used classified textbooks to train one of its AI models.
The lawsuit filed on behalf of Elizabeth Lyon, an Oregon author, alleges that Adobe used dozens of textbooks — including her own — to train the company. SlimLM program.
Adobe describes SlimLM as a subset of languages ​​that can be “optimized to support text on mobile devices.” They he says that SlimLM was pre-trained on SlimPajama-627B, a “deduplicated, multicorpora, open source dataset” released by Cerebras in June of 2023. Lyon, who has written several books leading to fiction, says that some of his work was included in the preparation dataset that Adobe used.
The Lyon case, which was report first and Reuters, says its documents were included in a modified version of the information that was the basis of Adobe’s software: “The SlimPajama dataset was created by copying and manipulating the RedPajama dataset (including copying Books3),” the lawsuit said. “Therefore, because it is a book from the RedPajama dataset, SlimPajama contains the Books3 dataset, including the Plaintiff’s legal documents and Class members.”
“Books 3” – large a collection of 191,000 books which have been used to train genAI systems—have been a source of legal trouble for the tech community. RedPajama has also been cited in several lawsuits. In September, a case against Apple said the company used copyrighted material train its version of Apple Intelligence. The lawsuit detailed the incident and accused the technology company of copying copyrighted works “without permission and without credit or compensation.” In October, the same case against Salesforce also said the company used RedPajama for training.
Unfortunately for the tech industry, cases like this have become all too common. AI algorithms are trained on large datasets and, in some cases, these datasets are said to include fraud tools. In September, Anthropic agreed to pay $1.5 billion to several authors who accused it and accused it of using fake translations to train its chatbot, Claude. The case was seen as a turning point in the ongoing litigation over copyright in AI training data, of which there are many.