AEGIS is a large-scale dataset and benchmark for detecting errors in Multi-Agent Systems (MAS). It provides systematically generated failure scenarios with verifiable ground-truth labels across ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results