AEGIS is a large-scale dataset and benchmark for detecting errors in Multi-Agent Systems (MAS). It provides systematically generated failure scenarios with verifiable ground-truth labels across ...