Tagged: Production-Ai

2 posts

AI Agent Model Evaluation: 5 Tests Before the Night Shift

AI Agent Model Evaluation: 5 Tests Before the Night Shift

June 11, 2026 · 4 min read · blog
A five-test protocol to catch regressions, compare cost, and canary a new model before it runs an AI agent unattended.
Your 50th Skill Makes the First 49 Less Reliable

Your 50th Skill Makes the First 49 Less Reliable

May 27, 2026 · 4 min read · blog
Past a token-budget threshold, each new skill silently lowers reliability of the rest. Where the work actually lives is below the skill layer.