Tumblr is selling user data to train AI. Things could get weird.

Wed, 02/28/2024 - 05:00

Tech Insider

Tumblr will provide data from its users to help train AI models.

SOPA Images

Tumblr's parent company is making a deal with OpenAI and Midjourney to train AI on Tumblr posts.
There will be an opt-out option for users who don't want their content being used for training.
The upside? OpenAI will know a lot more about what might happen if Sirius Black and Remus Lupin were a couple.

404 Media reported that Auttomatic, the company that owns WordPress and Tumblr, is making a deal to provide data from their sites to help train OpenAI and Midjourney.

A representative for Auttomatic pointed me to a public blog post after the 404 Media article ran when I asked for comment. The blog post says that Auttomatic's sites currently block AI crawlers, but when they start sharing data with the AI companies, they'll offer an opt-out from doing so in the future.

"We are also working directly with select AI companies as long as their plans align with what our community cares about: attribution, opt-outs, and control," the blog post says. "Our partnerships will respect all opt-out settings."

404 Media's report included internal Auttomatic employee messages describing how engineers were tasked with compiling posts from 2014 to 2023, but had made some mistakes, according to 404's reporting. The employees included posts from deleted or suspended blogs, private posts on public blogs, and private answers from the "Ask" function, the report said.

Most notably, they also included content marked NSFW or "mature," even though they weren't supposed to include those. Tumblr banned pornography and nudity in 2018, but in 2022 it loosened those rules to allow nudity (but still not sexually explicit images). It's worth reading 404's story on what Auttomatic is or isn't doing about these apparent errors.

ChatGPT will be introduced to fanfic

Meanwhile, anyone who has spent any time on Tumblr knows that there is a beautiful cornucopia of weird and niche stuff — especially among fandoms. So now ChatGPT will be able to write even better Fawnlock fanfic. (Yes, that's a version of Sherlock Holmes fanfiction where Sherlock and Watson are part deer.) Progress?

Tumblr is not the only social platform that is making deals like this. Reddit has a $60 million-a-year deal to license its data to Google to train its AI. Facebook and Instagram, of course, are already using data for Meta's own internal AI tools.

This can be controversial for some users, who feel uncomfortable about their content — on Tumblr, this is often personal writing or photography or art — being used to train AI.

Business Insider, through its parent company, also has a deal with OpenAI to use our news coverage in training AI. But that's a little different — I'm getting paid to write this, after all.

When platforms with user-generated content are selling that content to train AI, it feels, well, understandably weird.

I suppose one upside for this is knowing that Midjourney is going to be exposed to a lot more drawings of Sonic and Tails kissing.

Read the original article on Business Insider

AI, Tech, Media, tumblr, wordpress, automattic, OpenAI, mid-journey, ai-training, ll-ms, chat-gpt, artificial-intelligence

Source

Tumblr is selling user data to train AI. Things could get weird.