The aim of this work is to explore the potential of pre-trained vision-language models (VLMs) for universal detection of AI-generated images. We develop a lightweight detection strategy based on CLIP features and study its performance in a wide variety of challenging scenarios. We find that, contrary to previous beliefs, it is neither necessary nor convenient to use a large domain-specific dataset for training. On the contrary, by using only a handful of example images from a single generative model, a CLIP-based detector exhibits surprising generalization ability and high robustness across different architectures, including recent commercial tools such as Dalle-3, Midjourney v5, and Firefly. We match the state-of-the-art (SoTA) on in-distribution data and significantly improve upon it in terms of generalization to out-of-distribution data (+6% AUC) and robustness to impaired/laundered data (+13%). Our project is available at https://grip-unina.github.io/ClipBased-SyntheticImageDetection/

Raising the Bar of AI-generated Image Detection with CLIP / Cozzolino, Davide; Poggi, Giovanni; Corvi, Riccardo; Nießner, Matthias; Verdoliva, Luisa. - (2024), pp. 4356-4366. ( 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2024 usa 2024) [10.1109/cvprw63382.2024.00439].

Raising the Bar of AI-generated Image Detection with CLIP

Cozzolino, Davide
Primo
;
Poggi, Giovanni
Secondo
;
Corvi, Riccardo;Verdoliva, Luisa
Ultimo
2024

Abstract

The aim of this work is to explore the potential of pre-trained vision-language models (VLMs) for universal detection of AI-generated images. We develop a lightweight detection strategy based on CLIP features and study its performance in a wide variety of challenging scenarios. We find that, contrary to previous beliefs, it is neither necessary nor convenient to use a large domain-specific dataset for training. On the contrary, by using only a handful of example images from a single generative model, a CLIP-based detector exhibits surprising generalization ability and high robustness across different architectures, including recent commercial tools such as Dalle-3, Midjourney v5, and Firefly. We match the state-of-the-art (SoTA) on in-distribution data and significantly improve upon it in terms of generalization to out-of-distribution data (+6% AUC) and robustness to impaired/laundered data (+13%). Our project is available at https://grip-unina.github.io/ClipBased-SyntheticImageDetection/
2024
Raising the Bar of AI-generated Image Detection with CLIP / Cozzolino, Davide; Poggi, Giovanni; Corvi, Riccardo; Nießner, Matthias; Verdoliva, Luisa. - (2024), pp. 4356-4366. ( 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2024 usa 2024) [10.1109/cvprw63382.2024.00439].
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11588/1004276
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 43
  • ???jsp.display-item.citation.isi??? 19
social impact