Stefano Sarao Mannelli

I am a tenure-track Assistant Professor in the Data Science and AI division in the Computer Science department of Chalmers University of Technology. My research interests lie in building a fundamental understanding of learning in ML systems, with a particular focus on bias generation and amplification.

Research Directions

  • Bias generation and amplification in ML systems. What is behind biases in an ML system? What is the impact of our design choices?
  • Learning differences in biological and artificial neural networks. Curriculum learning, continual learning, transfer learning, same concept but completely different results in animals and machines, why?
  • Optimisation in rough landscapes. Connecting dynamics and landscape properties in optimisation.

Keen to know more? Check out the research page or you can read my most recent publications.

Latest News

Sept 2024
First day at Chalmers as Assistent Professor!
Aug 2024
Today starts the new edition of Analytical Connectionism at the Flatiron Institute in NYC. We have a great set of speakers, TAs and project mentors! Two exciting weeks ahead!
Jul 2024
Our paper on transfer learning, probing the dependance of the best source representation on data abundance and similarity, has been accepted to TMLR! Congrats everybody!
Jul 2024
I am excited to announce two available positions: one for a PhD student and one for a Postdoc. Click the links to learn more and apply!
Jun 2024
I am delighted that our paper on cognitive fatigue has been accepted to CogSci 2024 as an oral contribution!
May 2024
Last week to apply for the 2nd edition of Analytical Connectionism! Application deadline on Friday 17th.
May 2024
I'll open soon two positions for a Postdoc and a PhD student, stay tuned!
May 2024
ICML decisions are out and I got two papers accepted!
Apr 2024
I am teaching a short lecture series on Statistical Physics of Learning at Wits University.
Mar 2024

You can find all the news here.

Recent publications

  • A meta-learning framework for rationalizing cognitive fatigue in neural systems

    Yujun Li, Rodrigo Carrasco-Davis, Younes Strittmatter, Stefano Sarao Mannelli, Sebastian Musslick
    Post thumbnail
    Post thumbnail
    The ability to exert cognitive control is central to human brain function, facilitating goal-directed task performance. However, humans exhibit limitations in the duration over which they can exert cognitive control -a phenomenon referred to as cognitive fatigue. This study explores a computational rationale for cognitive fatigue in continual learning scenarios: cognitive fatigue serves to limit the extended performance of one task to avoid the forgetting of previously learned tasks. Our study employs a meta-learning framework, wherein cognitive control is optimally allocated to balance immediate task performance with forgetting of other tasks. We demonstrate that this model replicates common patterns of... [Read Article]
  • Bias in Motion: Theoretical Insights into the Dynamics of Bias in SGD Training

    Anchit Jain, Rozhin Nobahari, Aristide Baratin, Stefano Sarao Mannelli
    Post thumbnail
    Post thumbnail
    Machine learning systems often acquire biases by leveraging undesired features in the data, impacting accuracy variably across different sub-populations. Current understanding of bias formation mostly focuses on the initial and final stages of learning, leaving a gap in knowledge regarding the transient dynamics. To address this gap, this paper explores the evolution of bias in a teacher-student setup modeling different data sub-populations with a Gaussian-mixture model. We provide an analytical description of the stochastic gradient descent dynamics of a linear classifier in this setting, which we prove to be exact in high dimension. Notably, our analysis reveals how different properties... [Read Article]
  • Tilting the Odds at the Lottery: the Interplay of Overparameterisation and Curricula in Neural Networks

    Stefano Sarao Mannelli, Yaraslau Ivashinka, Andrew Saxe, Luca Saglietti
    Post thumbnail
    Post thumbnail
    A wide range of empirical and theoretical works have shown that overparameterisation can amplify the performance of neural networks. According to the lottery ticket hypothesis, overparameterised networks have an increased chance of containing a sub-network that is well-initialised to solve the task at hand. A more parsimonious approach, inspired by animal learning, consists in guiding the learner towards solving the task by curating the order of the examples, i.e. providing a curriculum. However, this learning strategy seems to be hardly beneficial in deep learning applications. In this work, we propose an analytical study that connects curriculum learning and overparameterisation. In... [Read Article]
  • Why Do Animals Need Shaping? A Theory of Task Composition and Curriculum Learning

    Jin Hwa Lee, Stefano Sarao Mannelli, Andrew Saxe
    Post thumbnail
    Post thumbnail
    Diverse studies in systems neuroscience begin with extended periods of training known as 'shaping' procedures. These involve progressively studying component parts of more complex tasks, and can make the difference between learning a task quickly, slowly or not at all. Despite the importance of shaping to the acquisition of complex tasks, there is as yet no theory that can help guide the design of shaping procedures, or more fundamentally, provide insight into its key role in learning. Modern deep reinforcement learning systems might implicitly learn compositional primitives within their multilayer policy networks. Inspired by these models, we propose and analyse... [Read Article]
  • The RL Perceptron: Generalisation Dynamics of Policy Learning in High Dimensions

    Nishil Patel, Sebastian Lee, Stefano Sarao Mannelli, Sebastian Goldt, Andrew Saxe
    Post thumbnail
    Post thumbnail
    Reinforcement learning (RL) algorithms have proven transformative in a range of domains. To tackle real-world domains, these systems often use neural networks to learn policies directly from pixels or other high-dimensional sensory input. By contrast, much theory of RL has focused on discrete state spaces or worst-case analysis, and fundamental questions remain about the dynamics of policy learning in high-dimensional settings. Here, we propose a solvable high-dimensional model of RL that can capture a variety of learning protocols, and derive its typical dynamics as a set of closed-form ordinary differential equations (ODEs). We derive optimal schedules for the learning rates... [Read Article]
  • Explore other publications here.