Martina G. Vilas

Computer Science PhD student

CVAI Lab, Goethe University Frankfurt

Ernst Strüngmann Institute for Neuroscience

About Me

I’m a computer science PhD student at Goethe University (Frankfurt am Main, Germany), with a background in Cognitive Neuroscience. I work under the supervision of Prof. Gemma Roig and I’m also part of the Ernst Strüngmann Institute for Neuroscience (in Cooperation with Max Planck Society).

My current research focuses on studying how AI systems abstract semantic knowledge from unimodal and multimodal sources of information. I’m generally interested in the development of tools for reverse engineering the cognitive capacities of deep learning models.

Interests

Artificial Intelligence
Mechanistic Interpretability
Explainability
Multimodal Learning
Computer Vision
NLP

Publications

(selected)

Martina G. Vilas, Federico Adolfi, David Poeppel, Gemma Roig

March, 2024 In ICML

Position: An Inner Interpretability Framework for AI Inspired by Lessons from Cognitive Neuroscience

Inner Interpretability is a promising emerging field tasked with uncovering the inner mechanisms of AI systems, though how to develop these mechanistic theories is still much debated. Moreover, recent critiques raise issues that question its usefulness to advance the broader goals of AI. However, it has been overlooked that these issues resemble those that have been grappled with in another field, Cognitive Neuroscience. Here we draw the relevant connections and highlight lessons that can be transferred productively between fields. Based on these, we propose a general conceptual framework and give concrete methodological strategies for building mechanistic explanations in AI inner interpretability research. With this conceptual framework, Inner Interpretability can fend off critiques and position itself on a productive path to explain AI systems.

Martina G. Vilas, Timothy Schaumlöffel, Gemma Roig

September, 2023 In NeurIPS

Analyzing vision transformers for image classification in class embedding space

Despite the growing use of transformer models in computer vision, a mechanistic understanding of these networks is still needed. This work introduces a method to reverse-engineer Vision Transformers trained to solve image classification tasks. Inspired by previous research in NLP, we demonstrate how the inner representations at any level of the hierarchy can be projected onto the learned class embedding space to uncover how these networks build categorical representations for their predictions. We use our framework to show how image tokens develop class-specific representations that depend on attention mechanisms and contextual information, and give insights on how self-attention and MLP layers differentially contribute to this categorical composition. We additionally demonstrate that this method (1) can be used to determine the parts of an image that would be important for detecting the class of interest, and (2) exhibits significant advantages over traditional linear probing approaches. Taken together, our results position our proposed framework as a powerful tool for mechanistic interpretability and explainability research.

Domenic Bersch, Kshitij Dwivedi, Martina G. Vilas, Radoslaw M. Cichy, Gemma Roig

August, 2022 In CCN

Net2Brain: A toolbox to compare artificial vision models with human brain responses

We introduce Net2Brain, a graphical and command-line user interface toolbox for comparing the representational spaces of artificial deep neural networks (DNNs) and human brain recordings. While different toolboxes facilitate only single functionalities or only focus on a small subset of supervised image classification models, Net2Brain allows the extraction of activations of more than 600 DNNs trained to perform a diverse range of vision-related tasks (e.g semantic segmentation, depth estimation, action recognition, etc.), over both image and video datasets. The toolbox computes the representational dissimilarity matrices (RDMs) over those activations and compares them to brain recordings using representational similarity analysis (RSA), weighted RSA, both in specific ROIs and with searchlight search. In addition, it is possible to add a new data set of stimuli and brain recordings to the toolbox for evaluation. We demonstrate the functionality and advantages of Net2Brain with an example showcasing how it can be used to test hypotheses of cognitive computational neuroscience.

Martina G. Vilas, Ryszard Auksztulewicz, Lucia Melloni

August, 2021 Review of Philosophy and Psychology, 13(4)

Active inference as a computational framework for consciousness

Recently, the mechanistic framework of active inference has been put forward as a principled foundation to develop an overarching theory of consciousness which would help address conceptual disparities in the field (Wiese 2018; Hohwy and Seth 2020). For that promise to bear out, we argue that current proposals resting on the active inference scheme need refinement to become a process theory of consciousness. One way of improving a theory in mechanistic terms is to use formalisms such as computational models that implement, attune and validate the conceptual notions put forward. Here, we examine how computational modelling approaches have been used to refine the theoretical proposals linking active inference and consciousness, with a focus on the extent and success to which they have been developed to accommodate different facets of consciousness and experimental paradigms, as well as how simulations and empirical data have been used to test and improve these computational models. While current attempts using this approach have shown promising results, we argue they remain preliminary in nature. To refine their predictive and structural validity, testing those models against empirical data is needed i.e., new and unobserved neural data. A remaining challenge for active inference to become a theory of consciousness is to generalize the model to accommodate the broad range of consciousness explananda; and in particular to account for the phenomenological aspects of experience. Notwithstanding these gaps, this approach has proven to be a valuable avenue for theory advancement and holds great potential for future research.