How many skills can a Claude Code or Cursor library hold before discovery starts to break?

Empirical evidence puts the practical ceiling between 50 and 95 skills. A 95-skill library reports about 2,007 tokens per turn in available_skills metadata, and routing accuracy against metadata-only matching is roughly 31 to 44 percentage points below full-body retrieval. The exact number depends on how distinguishable your skill descriptions are; libraries with overlapping verbs degrade earlier.

What is the discovery ceiling in AI agent skill libraries?

The discovery ceiling is the point at which adding more skills makes existing skills harder for the agent to find and invoke. It's a property of the token budget allocated to skill listings combined with the signal-to-noise ratio among one-line descriptions. Past the ceiling, more skills means worse hit rates on the ones you already shipped.

Why does my AI agent ignore skills I added recently?

Most commonly because skill listings exceed the per-turn budget and the model never sees the new skill's metadata. In Claude Code the undocumented skillListingBudgetFraction setting silently truncates listings as the conversation grows; skills at the bottom of the listing order disappear from context first. A second cause is description overlap with an existing skill, which makes neither one clearly the right match.

What lives below the SKILL.md layer in a production AI agent?

Roughly 85 percent of what makes an agent work in production isn't in any SKILL.md. It lives in schema migration history, incident postmortems, runtime telemetry baselines, permission graphs, and the accumulated knowledge of third-party API quirks. Skills delegate to that layer; they cannot contain it.

How do I decide which skills to retire?

Three signals matter: skills that haven't fired in 30 days, skills whose descriptions overlap a sibling skill by more than 60 percent of salient verbs, and skills classified by static analysis as dead, bloated, conflicting, stale, or cyclic. Quarterly retirement audits keep the survivors distinguishable enough for the agent to route correctly.

Your 50th Skill Makes the First 49 Less Reliable

May 27, 2026 · 4 min read · ai-skills, claude-code, ai-engineering, production-ai

You added a skill last Tuesday. The agent hasn’t called it once. Each new skill silently weakens the discovery odds of the ones you already have.

You assume it’s a description problem. It isn’t.

Everyone’s pushing past 50 skills now. Vercel ships a plugin with 40. Google’s gws brings 95. I do it too. My local registry is at 38.

Each one I added lowered the odds that the others get reached when I need them.

The Catalog Sits in Context Every Turn

The three-tier loading model promises that bodies stream in on demand, but the metadata (name plus one-line description) sits in the system prompt every turn, for every installed skill. The math is mechanical: Anthropic’s spec budgets roughly 30 to 50 tokens per skill plus 109 chars of overhead, which compounds fast.

A 95-skill library reports 2,007 tokens per turn in available_skills. At Opus pricing over a typical workday, that’s about three dollars per developer per day for listings the agent cannot distinguish anymore.

The dollars are not the issue. The signal-to-noise ratio is.

Routing Degrades With Catalog Size

A study published this month measured retrieval against an 80,000-skill catalog. Best-in-class routing hit 74% at top-1. One task in four picks the wrong skill or no skill at all. Drop to metadata-only matching, which is the default at session start, and accuracy falls another 31 to 44 percentage points across every method tested.

There is no fix coming from better embeddings. The problem is that fifty good one-line descriptions stop being distinguishable from each other. Adding the 51st makes “publish a draft” and “ship a draft” and “deploy a post” harder to tell apart, not easier.

The undocumented setting skillListingBudgetFraction in Claude Code makes this worse without warning. As remaining context shrinks during a long session, the absolute budget for listings shrinks proportionally. Skills at the bottom of the listing order get truncated. The model never sees them. It cannot invoke what it cannot see.

You added the skill. The agent does not know it exists.

The 85% Lives Below the Skill Layer

A reader who runs an 89,000-page programmatic SEO site mentioned in a thread that skills cover maybe 15% of what the system actually does. The other 85% he called “scar tissue code.” The conditions nobody writes until they see a 2 AM page from a third-party API returning malformed JSON on month-end.

That 85% is real. It looks like this:

A redundant null check in payment validation because one vendor sends "N/A" for a specific contract type on month-end.
A Stripe webhook timeout set to 18 seconds because their p95 was 14 and you wanted headroom.
A retry policy on the embedding queue that backs off harder for one endpoint because its rate limiter returns 200 before timing out at 60 seconds.

None of that is in a SKILL.md. None of it can be. It lives in the schema migration history, the incident postmortems, the runtime telemetry baselines, the permission graphs, the third-party API quirk log that nobody calls a log. Skills delegate to that layer. They do not contain it.

The author who ships a thousand-skill library is not adding capability. They are adding a discovery problem the user solves at runtime.

The Metric Was Always Subtraction

The right engineering metric for an AI runtime is not skills added. It is lines of imperative code deleted per skill kept. Cursor’s team published numbers last month: 12,000 lines of TypeScript replaced by 200 lines of skill markdown. That is a 60x ratio. Code that no longer exists has zero bugs, zero CI cost, zero migration burden.

That ratio cuts both ways. Every skill that does not delete real code is overhead.

A static analyzer published in April classifies skill failures into five types: dead, bloated, conflicting (one says “use jq” and another says “never use jq”), stale, cyclic. Run it against a 50-skill library and 30% to 40% typically sit in one of those buckets. They are not earning their seat in every session’s system prompt.

The Library Became a Portfolio Metric

The skill library has become a portfolio metric. Founders ship 200-skill libraries for the same reason juniors ship 4,000-line PRs. Visible output beats invisible discipline. The discipline doesn’t show up on the GitHub README.

But you don’t optimize a runtime for the README. You optimize it for whether the agent can find the right skill on the third turn of a real session. Every skill you keep that doesn’t earn its slot is a quiet vote against every other skill in the library.

Which three would you retire tomorrow? Not because they’re bad. Because their slots are worth more than they’re paying.

Skills are the menu. The kitchen is everywhere else. I build kitchens.

Rene

Fixed price and milestones — or a clear no with reasons.

Your 50th Skill Makes the First 49 Less Reliable

The Catalog Sits in Context Every Turn

Routing Degrades With Catalog Size

The 85% Lives Below the Skill Layer

The Metric Was Always Subtraction

The Library Became a Portfolio Metric

Before you go —

Almost there

Your 50th Skill Makes the First 49 Less Reliable

The Catalog Sits in Context Every Turn

Routing Degrades With Catalog Size

The 85% Lives Below the Skill Layer

The Metric Was Always Subtraction

The Library Became a Portfolio Metric

Scope my automation in 24h

Request received