Your 50th Skill Makes the First 49 Less Reliable

May 27, 2026 · 4 min read · ai-skills, claude-code, ai-engineering, production-ai
Your 50th Skill Makes the First 49 Less Reliable

You added a skill last Tuesday. The agent hasn’t called it once. Each new skill silently weakens the discovery odds of the ones you already have.

You assume it’s a description problem. It isn’t.

Everyone’s pushing past 50 skills now. Vercel ships a plugin with 40. Google’s gws brings 95. I do it too. My local registry is at 38.

Each one I added lowered the odds that the others get reached when I need them.

The Catalog Sits in Context Every Turn

The three-tier loading model promises that bodies stream in on demand, but the metadata (name plus one-line description) sits in the system prompt every turn, for every installed skill. The math is mechanical: Anthropic’s spec budgets roughly 30 to 50 tokens per skill plus 109 chars of overhead, which compounds fast.

A 95-skill library reports 2,007 tokens per turn in available_skills. At Opus pricing over a typical workday, that’s about three dollars per developer per day for listings the agent cannot distinguish anymore.

The dollars are not the issue. The signal-to-noise ratio is.

Routing Degrades With Catalog Size

A study published this month measured retrieval against an 80,000-skill catalog. Best-in-class routing hit 74% at top-1. One task in four picks the wrong skill or no skill at all. Drop to metadata-only matching, which is the default at session start, and accuracy falls another 31 to 44 percentage points across every method tested.

There is no fix coming from better embeddings. The problem is that fifty good one-line descriptions stop being distinguishable from each other. Adding the 51st makes “publish a draft” and “ship a draft” and “deploy a post” harder to tell apart, not easier.

The undocumented setting skillListingBudgetFraction in Claude Code makes this worse without warning. As remaining context shrinks during a long session, the absolute budget for listings shrinks proportionally. Skills at the bottom of the listing order get truncated. The model never sees them. It cannot invoke what it cannot see.

You added the skill. The agent does not know it exists.

The 85% Lives Below the Skill Layer

A reader who runs an 89,000-page programmatic SEO site mentioned in a thread that skills cover maybe 15% of what the system actually does. The other 85% he called “scar tissue code.” The conditions nobody writes until they see a 2 AM page from a third-party API returning malformed JSON on month-end.

That 85% is real. It looks like this:

  1. A redundant null check in payment validation because one vendor sends "N/A" for a specific contract type on month-end.
  2. A Stripe webhook timeout set to 18 seconds because their p95 was 14 and you wanted headroom.
  3. A retry policy on the embedding queue that backs off harder for one endpoint because its rate limiter returns 200 before timing out at 60 seconds.

None of that is in a SKILL.md. None of it can be. It lives in the schema migration history, the incident postmortems, the runtime telemetry baselines, the permission graphs, the third-party API quirk log that nobody calls a log. Skills delegate to that layer. They do not contain it.

The author who ships a thousand-skill library is not adding capability. They are adding a discovery problem the user solves at runtime.

The Metric Was Always Subtraction

The right engineering metric for an AI runtime is not skills added. It is lines of imperative code deleted per skill kept. Cursor’s team published numbers last month: 12,000 lines of TypeScript replaced by 200 lines of skill markdown. That is a 60x ratio. Code that no longer exists has zero bugs, zero CI cost, zero migration burden.

That ratio cuts both ways. Every skill that does not delete real code is overhead.

A static analyzer published in April classifies skill failures into five types: dead, bloated, conflicting (one says “use jq” and another says “never use jq”), stale, cyclic. Run it against a 50-skill library and 30% to 40% typically sit in one of those buckets. They are not earning their seat in every session’s system prompt.

The Library Became a Portfolio Metric

The skill library has become a portfolio metric. Founders ship 200-skill libraries for the same reason juniors ship 4,000-line PRs. Visible output beats invisible discipline. The discipline doesn’t show up on the GitHub README.

But you don’t optimize a runtime for the README. You optimize it for whether the agent can find the right skill on the third turn of a real session. Every skill you keep that doesn’t earn its slot is a quiet vote against every other skill in the library.

Which three would you retire tomorrow? Not because they’re bad. Because their slots are worth more than they’re paying.

Skills are the menu. The kitchen is everywhere else. I build kitchens.

Rene

Written quote in 24h

5 fields. I reply within 24h with either “yes, fixed price X, duration Y” or “no, here’s why not”.

Request received

You’ll hear from me within 24h with an honest assessment.

Prefer to talk? 30-min roadmap call →