My name is Anastasia Psarou. I am a PhD student currently working as part of the COeXISTENCE team towards training machines to drive better than humans using Reinforcement Learning. I earned my masterâs degree in Electrical and Computer Engineering at the University of Thessaly in Greece, specializing in software development and artificial intelligence. Beyond my academic pursuits, I enjoy travelling and immersing myself in nature through activities like hiking.
I have a keen interest in software development, particularly exploring the integration of artificial intelligence into various aspects of our daily lives. My masterâs thesis focused on the task of âtranslatingâ videos featuring individuals using Greek sign language to facilitate understanding of its meaning.
List of main publications and preprints
-
Autonomous vehicles need social awareness to find optima in multi-agent reinforcement learning routing games
Psarou, Anastasia,
Gorczyca, Ćukasz,
GaweĆ, Dominik,
and Kucharski, RafaĆ
arXiv preprint arXiv:2510.11410
2025
Previous work has shown that when multiple selfish Autonomous Vehicles (AVs) are introduced to future cities and start learning optimal routing strategies using Multi-Agent Reinforcement Learning (MARL), they may destabilize traffic systems, as they would require a significant amount of time to converge to the optimal solution, equivalent to years of real-world commuting. We demonstrate that moving beyond the selfish component in the reward significantly relieves this issue. If each AV, apart from minimizing its own travel time, aims to reduce its impact on the system, this will be beneficial not only for the system-wide performance but also for each individual player in this routing game. By introducing an intrinsic reward signal based on the marginal cost matrix, we significantly reduce training time and achieve convergence more reliably. Marginal cost quantifies the impact of each individual action (route-choice) on the system (total travel time). Including it as one of the components of the reward can reduce the degree of non-stationarity by aligning agentsâ objectives. Notably, the proposed counterfactual formulation preserves the systemâs equilibria and avoids oscillations. Our experiments show that training MARL algorithms with our novel reward formulation enables the agents to converge to the optimal solution, whereas the baseline algorithms fail to do so. We show these effects in both a toy network and the real-world network of Saint-Arnoult. Our results optimistically indicate that social awareness (i.e., including marginal costs in routing decisions) improves both the system-wide and individual performance of future urban systems with AVs.
-
Autonomous Vehicles Using Multi-Agent Reinforcement Learning for Routing Decisions Can Harm Urban Traffic
Psarou, Anastasia,
Akman, Ahmet Onur,
Gorczyca, Ćukasz,
Hoffmann, MichaĆ,
Varga, Zoltån György,
JamrĂłz, Grzegorz,
and Kucharski, RafaĆ
arXiv preprint arXiv:2502.13188
2025
Autonomous vehicles (AVs) using Multi-Agent Reinforcement Learning (MARL) for simultaneous route optimization may destabilize traffic environments, with human drivers possibly experiencing longer travel times. We study this interaction by simulating human drivers and AVs. Our experiments with standard MARL algorithms reveal that, even in trivial cases, policies often fail to converge to an optimal solution or require long training periods. The problem is amplified by the fact that we cannot rely entirely on simulated training, as there are no accurate models of human routing behavior. At the same time, real-world training in cities risks destabilizing urban traffic systems, increasing externalities, such as CO2 emissions, and introducing non-stationarity as human drivers adapt unpredictably to AV behaviors. Centralization can improve convergence in some cases, however, it raises privacy concerns for the travelersâ destination data. In this position paper, we argue that future research must prioritize realistic benchmarks, cautious deployment strategies, and tools for monitoring and regulating AV routing behaviors to ensure sustainable and equitable urban mobility systems.