dslreports logo
 
    All Forums Hot Topics Gallery
spc
Search similar:


uniqs
143
HELLFIRE
MVM
join:2009-11-25

3 recommendations

HELLFIRE

MVM

It's possible to extract copies of images used to train generative AI models

»www.theregister.com/2023 ··· extract/
»arxiv.org/abs/2301.13188
quote:
Generative AI models can memorize images from their training data, possibly allowing users to extract private copyrighted data, according to research. ... Now research led by researchers working at Google, DeepMind, the University of California, Berkeley, ETH Zurich, and Princeton University demonstrates that images used to train these models can be extracted. Generative AI models memorize images and can generate precise copies of them, raising new copyright and privacy concerns. "In a real attack, where an adversary wants to extract private information, they would guess the label or caption that was used for an image," co-authors of the study told The Register. "Fortunately for the attacker, our method can sometimes work even if the guess is not perfect. For example, we can extract the portrait of Ann Graham Lotz by just prompting Stable Diffusion with her name, instead of the full caption from the training set ("Living in the light with Ann Graham Lotz"). Only images memorized by the model can be extracted, and how much a model can memorize data varies on factors like its training data and size. Copies of the same image are more likely to be memorized, and models containing more parameters are more likely to be able to remember images too. The team was able to extract 94 images from 350,000 examples used to train Stable Diffusion, and 23 images from 1,000 examples from Google's Imagen model. For comparison, Stable Diffusion has 890 million parameters and was trained on 160 million images, while Imagen has two billion parameters – it's not clear how many images were used to train it exactly.
Take what you will from this. I'm filing this under "this a bug or feature?"

Regards