Reddit’s Strategic Play Lands it a $60M Content Licensing Deal Before IPO

Almost a year after Reddit made difficult changes to monetize its data at the expense of losing some important users, the company reportedly secured a $60 million deal. Content licensing deal for AI training is thus the new business model on the block. They can offer a legal high ground and allow companies like Reddit to reduce their advertising dependency, much to the delight of users.

February 21, 2024

Reddit AI training data
  • Almost a year after Reddit made difficult changes to monetize its data at the expense of losing some of its important users, the company has reportedly secured a $60 million deal, which is 7.4% of its 2023 revenue.
  • Reddit’s agreement follows OpenAI’s deals with the Associated Press and Axel Springer for their articles as content feed for the former’s large language models.
  • While the AI company Reddit is in business with remains unnamed, content licensing deal for AI training is thus the new business model on the block.

Reddit has reportedly bagged a deal worth $60 million for its data. According to Bloomberg, Reddit signed the deal with an unnamed AI company for the latter to avail the former’s content-rich platform for artificial intelligence (AI) model training.

Reddit, which generated $810 million in 2023 (a year-on-year increase of 20%), is expected to head for its initial public offering (IPO) in March 2024. The online social forum should be keen to bank on the opportune timing of the lucrative deal, especially after against its community, by instituting changes to its API pricing policy in July 2023, causing much furor among users and third-party developers.

Reddit is one of the several social media platforms, data aggregators, and content-based services that contend that they should receive the appropriate compensation for the data that, until recently, was freely available for AI companies to trail data-intensive large language models.

Per Originality.ai data, 63.4% of the top 1,000 websites have blocked web crawlers from OpenAI (GPTBot), Google-Extended, the Common Crawl Bot, and anthropic-ai.

Originality.AI - The Percent of the Top 1000 Websites Blocking AI Web Crawlers

Percentage of the Top 1,000 Websites Blocking AI Web Crawlers

Source: Originality.ai

See More: Unpacking the Straining Relationship of AI Companies and Websites Over Data Scraping

As a result, OpenAI previously struck a deal with the Associated Press and Axel Springer to use articles and other content as fodder for its AI training. Others like The New York Times, CNN,  BBC, Daily Mail, WebMD, Washington Post, USA Today, LA Times, CBS News, Investopedia, Stack Overflow, Bloomberg, NY Post,  etc., continue to block OpenAI’s crawler.

Reuters allows GPTBot but has blocked CCBot, Google-Extended, and anthropic-ai.

Content licensing deal for AI training is thus the new business model on the block. However, how AI companies and content and data providers address any privacy and copyright issues that may arise remains to be seen. For instance, Reddit opening up its platform for AI training may not be illegal per se, but what if that model spews out an answer that replicates or misappropriates proprietary information of an author who has posted it on Reddit?

Content attribution becomes more pertinent to web discussions like Reddit or social media sites like Facebook because data ownership is a gray area, unlike media publishers’ websites, where data ownership may not necessarily be inconspicuous. A possible workaround to this problem is limiting which subreddits the AI models can use for training.

However, data deals offer a legal high ground, given the regulations on training data scraped from the web for large language and other models are largely undefined or nonexistent, a cause for several lawsuits in the year past. They can also allow companies like Reddit to reduce their advertising dependency, much to the delight of users. 

“The Reddit corpus of data is really valuable, but we don’t need to give all of that value to some of the largest companies in the world for free,” Reddit CEO Steve Huffman said in April 2023.

Will training data deals become the norm moving forward? Share with us on LinkedInOpens a new window , XOpens a new window , or FacebookOpens a new window . We’d love to hear from you!

Image source: Shutterstock

MORE ON AI

Sumeet Wadhwani
Sumeet Wadhwani

Asst. Editor, Spiceworks Ziff Davis

An earnest copywriter at heart, Sumeet is what you'd call a jack of all trades, rather techs. A self-proclaimed 'half-engineer', he dropped out of Computer Engineering to answer his creative calling pertaining to all things digital. He now writes what techies engineer. As a technology editor and writer for News and Feature articles on Spiceworks (formerly Toolbox), Sumeet covers a broad range of topics from cybersecurity, cloud, AI, emerging tech innovation, hardware, semiconductors, et al. Sumeet compounds his geopolitical interests with cartophilia and antiquarianism, not to mention the economics of current world affairs. He bleeds Blue for Chelsea and Team India! To share quotes or your inputs for stories, please get in touch on sumeet_wadhwani@swzd.com
Take me to Community
Do you still have questions? Head over to the Spiceworks Community to find answers.