Production-Ai on René Zander | AI Automation Consultant

AI Agent Model Evaluation: 5 Tests Before the Night Shift

Thu, 11 Jun 2026 10:00:00 +0000

A model upgrade used to be good news for anyone running agents overnight.

Now the next model arrives before the last one has finished probation.

Anthropic released Opus 4.8 on May 28. Twelve days later, Fable 5 arrived with longer autonomous runs and another page of benchmark wins.

The night shift is getting easier to hire.

Wed, 27 May 2026 07:00:00 +0000

You added a skill last Tuesday. The agent hasn’t called it once. Each new skill silently weakens the discovery odds of the ones you already have.

You assume it’s a description problem. It isn’t.

Everyone’s pushing past 50 skills now. Vercel ships a plugin with 40. Google’s gws brings 95. I do it too. My local registry is at 38.

Each one I added lowered the odds that the others get reached when I need them.