The Copyright War Has Begun. It's New York Times That Has Fired The First Shot At OpenAI, Microsoft

Therha

Editor

posted on 2 years ago — updated on 1 second ago

211
views

You are paying OpenAI to get ‘smart’ answers from ChatGPT while its creator OpenAI paid nothing to The New York Times when “copying and using” millions of articles to train ChatGPT in the first place.

The New York Times has sued ChatGPT creator OpenAI and Microsoft for copyright infringement. With the usage of generative AI tools like ChatGPT hitting newer peaks frequently, the big question which remains unanswered– how were generative AI tools created, or rather trained, in the first place?

Creators like OpenAI and investors like Microsoft are unwilling to provide a clear understanding of how content was procured to train large language models over the years. But The New York Times claims to have found the tip of the iceberg. In its lawsuit, The New York Times has said that OpenAI has “copied” articles, reports, in-depth investigations, opinion pieces, reviews, how-to guides among other content to train the large language models (LLM) powering ChatGPT and chatbots like Bing Chat without any prior “permission or payment”.

ChatGPT is a revolutionary technology. No doubts about that, at least in 2023. And riding this hype, OpenAI was quick to formulate subscription plans and started to take money from users. Microsoft, like always, seized the market opportunity and invested $10 billion. If you care to read through the lawsuit then you be may open to an alternative thought– you are paying OpenAI to get ‘smart’ answers from ChatGPT while OpenAI paid nothing to The New York Times when “copying and using” millions of its content pieces to train ChatGPT in the first place.

Not just The Times, the lawsuit claims that OpenAI “engaged in wide scale copying” from many media organisations but the content from NYT was given “particular emphasis”. Interestingly, this claim is not contested and in a way OpenAI accepts this as a fact and was trying to find “an amicable resolution” which goes without saying OpenAI will have to pay up or sign a commercial agreement with the NYT. However, the talks failed and there isn’t a deal yet.

While Microsoft is choosing to remain tightlipped about the lawsuit, OpenAI’s spokeswoman, Lindsey Held said that the lawsuit has “surprised and disappointed” OpenAI and that the company was “moving forward constructively” in coming up with an agreement.

In a statement Lindsey Held said, “We respect the rights of content creators and owners and are committed to working with them to ensure they benefit from A.I. technology and new revenue models,” Ms. Held said. “We’re hopeful that we will find a mutually beneficial way to work together, as we are doing with many other publishers.”

Plainly speaking, OpenAI accepts to use content for free to train its models and was quick to take subscription money from users and now when questions are being asked, OpenAI claims to find and work together a new revenue sharing model. At least, that may be the case for big brands like The New York Times. But about smaller media platforms? And how come someone even gets to know what content OpenAI has used from which source to train its model? Like Google, OpenAI doesn’t own any content. It just presents information sourced from others over the years and appears to present it as its own.

For the media fraternity, this lawsuit reminds how Google landed in trouble with Google News and why it was banned in countries like Spain. Alternatively, why Facebook is forced to quit the news business.

For media companies, it takes a lot of investment to create news networks and get the right manpower. And if technology companies simply copy-paste the content and make money by presenting the same content in its own desired platform then the already strained media industry may soon cease to exist.

In the lawsuits, The Times also mentioned Microsoft’s Bing search copies and categorizes its online content, “to generate responses that contain verbatim excerpts and detailed summaries of Times articles that are significantly longer and more detailed than those returned by traditional search engines”.

It added that, “using the valuable intellectual property of others in these ways without paying for it has been extremely lucrative for Defendants. Microsoft’s deployment of Times-trained LLMs throughout its product line helped boost its market capitalization by a trillion dollars in the past year alone. And OpenAI’s release of ChatGPT has driven its valuation to as high as $90 billion.”

Another interesting aspect about this lawsuit is that The Times is not demanding an exact amount of money. The lawsuit simply says that OpenAI and Microsoft “should be held responsible for billions of dollars in statutory and actual damages” for copyright infringement.

While it may appear as if OpenAI copying content to train ChatGPT is related to information and content in the past, what the lawsuit underlines is what happens in future? If readers get news and information from AI chatbots which are basically rehashing content from online media platforms, why will they take the pain of clicking on links to visit the actual source? Thereby, hinting at loss of visitors and ultimately loss of Ad revenue.