Agents. Benchmarks. Systems. Repeat.
Benchmarking How Well Agent Skills Work Across Diverse Tasks
Comprehensive benchmark evaluating agent skills across heterogeneous task distributions, revealing the importance of curated skill composition.
Xiangyi Li et al. (2026)