Rotate your phone to navigate through the site properly
Being mimesis a central problem for the arts, cultural production, and epistemics, it appears necessary to address it in light of the large vision-language models (e.g. diffusion models, or contrastive learning based like CLIP or ALIGN), especially when applied to image and text generation. In this article, we inquire about the modes in which such models operate and discuss to what extent they can be said to engage in processes of projective imagination. We propose a computational pipeline for investigating the cultural landscape of a city through the eyes of a machine, and for questioning modes of embedding culture in machine learning models. Using Rome as a pivotal case study, we study the visual features and textual properties extracted by these models. More specifically, we feed 360° equirectangular panoramic images into OpenAi’s CLIP and Stable Diffusion, and analyze how mainstream culture might be captured and expressed in these models through the outputs. In this machine-triggered urban experiment, we investigate overlaps between history and machinic interpretation and whether relevant temporal correlations can be captured through urban generic images only. Furthermore, we discuss the process in light of early modern theories of imagination, particularly those of Marsilio Ficino and Giordano Bruno, and articulate whether these new models propose or not a new paradigm. As a way of contextualizing this approach within the analysis of the cultural relevance of recent multimodal machine learning models, we consider whether these models can be said to be acts of societal aesthetic appraisal.
Presentation by Darío Negueruela del Castillo and Iacopo Neri at the Symposium From Hype to Reality: Artificial Intelligence in the Study of Art and Culture