More Skills, Worse Results? The Hidden Physics of Agent Skill Libraries
We systematically investigate how agent skill libraries behave at scale by testing 11 leading LLMs across 989 real-world skills with over 3 million API calls. We discover that routing accuracy degrades logarithmically with library size ($R^2 > 0.97$), compounding exponentially across multi-step pipelines. Routing errors are not random — 86% stay within the correct functional cluster, concentrating in a dangerous similarity band of [0.55, 0.75). We identify a "black hole" effect where abstract skills hijack 20–35% of routing under ambiguous prompts, but are completely neutralized by concrete operational anchors. Crucially, routing and execution obey different physics: routing is memoryless, while successful execution rescues downstream steps by ~4x through state concretization. These findings yield actionable design principles — Distinguishable and synergistic skill description quality (67% accuracy gap), prompt specificity, and pipeline coupling structure — that determine whether an agent gracefully handles 500 skills or collapses at 50.
AgentPhysicsResearchSkill