- The News Media Alliance (NMA) alleges that AI developers are illegally scraping copyrighted news content.
- AI datasets predominantly rely on publisher materials, leading to potential copyright infringement.
- Generative AI developers have been scraping publisher content without consent for model training.
An industry trade group alleges that artificial intelligence developers are heavily relying on illegally scraping copyrighted news content to train their models without permission.
In a new 77-page white paper, the News Media Alliance (NMA) states AI datasets disproportionately use publisher materials versus other sources. As a result, AI outputs can infringe on news copyrights and compete directly with outlets.
“Many generative AI developers have chosen to scrape publisher content without permission and use it for model training and in real-time to create competing products,” the NMA wrote.
The group argues that while publishers take risks and invest, AI reaps the rewards in users, data, branding, and ads. Publishers face falling revenues, jobs, and audience trust as a result, says the NMA.
NMA urges US copyright office for declaration
To combat this, the NMA urged the U.S. Copyright Office to declare that monetizing AI with scraped news content harms publishers. It recommends licensing models and transparency rules to curb ingesting copyrighted materials.
The NMA acknowledged AI’s benefits but said its training methods have sparked lawsuits, including recent cases against Google and OpenAI over alleged copyright violations.
Google claims it will assume legal risk if customers face allegations over its AI offerings. However, its Bard search tool is not covered by that protection.
As AI proliferates, the battle around copyrighted training data will likely intensify. News outlets argue their investments deserve compensation if AI directly copies and profits from their work. But implementing copyright measures while enabling AI innovation remains a complex balancing act.