Synthetic Data in Art Historical Datasets? Considerations from Art Authentication

Abstract

Symposium

Abstract

The recent hype for text-to-image models (Stable Diffusion, Dall-E, Midjourney) and previous style transfer methods (Pix2Pix), opened up the possibility to include synthetic images in art history datasets and use them to train deep learning models. Such an application is not unproblematic, prompting spurious contamination of the dataset, but yet useful, allowing to partially mitigate the common issue of lack of data. In this respect, I will pose the question of whether there is a possibility of a fruitful adoption of synthetic images in art historical datasets, discussing the case study of art authentication. In our research, we introduced GANs and Stable Diffusion generated data ‘in the style of’ the artist to be authenticated. The images were added to the set contrasting the authentic works of the artist, which is defined as anything that is not authentic; and as such, it is conceptually quite naturally permitting the presence of non-real data. However, even in this case, the synthetic data could unwittingly introduce new features not present in reality: the usefulness and risks fall into a grey area. To what point can synthetic data contribute to fecundly extending and augmenting the information in our dataset? What knowledge could algorithms acquire from such synthetic generated data? Would it raise questions regarding the integrity and contamination of the dataset? Can this study generalize to a vastity of other applications and fields?

Symposium

Presentation by Ludovica Schaerf at the Symposium From Hype to Reality: Artificial Intelligence in the Study of Art and Culture