Artificial intelligence coding
  • A dataset used to train AI image generators contains pictures of child sexual abuse, a study found.
  • The findings add to fears AI tools could spark a wave of AI-generated child sexual abuse content.
  • The LAION database has now been taken offline while its creators remove the images. 

A dataset used to train AI image generators contains thousands of suspected images of child sexual abuse, according to a new report.

An investigation by the Stanford University Cyber Policy Center found that the LAION-5B database, which has been used to train several AI image generators including Stable Diffusion 1.5, an earlier version of an AI model by Stability AI, contained over 3,200 images of suspected child abuse.

Just over 1,000 of those images were confirmed to be child sexual abuse material, with the report warning that their presence in the dataset may allow generative AI tools built on this data to be used to create new child abuse content.

LAION-5B is a massive public archive of around five billion images scraped from the open web.

It has been used by a variety of AI companies, which require huge amounts of data to train generative AI models that can produce new images in seconds.

Experts have long warned that AI image generators risk unleashing a tsunami of ultra-realistic AI-generated images of child sexual abuse, with the Internet Watch Foundation (IWF) warning that such images are already widely circulating on the dark web.

Online safety organizations in the UK, meanwhile, have called for "urgent action" over instances of children using AI image generators at school to create indecent content of their fellow pupils.

AI apps that use similar technology to "undress" women in photos by creating fake nude images of them are also surging in popularity, according to data from Graphika.

Stable Diffusion is the most well-known model to use the LAION database. The tool, which was developed by UK-based firm Stability AI, has become one of the most widespread and popular image generators since it launched in 2022.

A spokesperson for German non-profit LAION, behind the dataset, told Bloomberg that the group was temporarily removing its datasets from the internet while it scrubbed them of illegal content.

A Stability AI spokesperson, meanwhile, told Business Insider that the company has introduced filters to prevent users from creating illegal content with Stable Diffusion and that Stability AI's models were only trained on a filtered subset of LAION-5B.

The company released Stable Diffusion 2.0, which was also trained on a subset of the LAION-5B data but has tighter controls on sexually explicit content, last year. However, version 1.5, which does generate explicit content, is still available on the open web.

The spokesperson said Stable Diffusion 1.5 was released by AI video startup RunwayML, not Stability AI. Runway told Bloomberg that it was a collaboration between the two, however.

LAION-400M, a previous version of the 5B database which the Stanford researchers noted had been found to contain "pornographic imagery, racist slurs, and harmful social stereotypes," was used by Google to train an early version of its Imagen text-to-image generator. Google has since stopped using LAION datasets.

BI reached out to LAION for further comment but didn't immediately hear back.

December 22, 2023: This story has been updated to include the version number of the Stability Diffusion AI model referenced in the second paragraph.

Read the original article on Business Insider