Tagged: Production-Ai

2 posts

AI Agent Model Evaluation: 5 Tests Before the Night Shift

June 11, 2026 · 4 min read · blog

A five-test protocol to catch regressions, compare cost, and canary a new model before it runs an AI agent unattended.

May 27, 2026 · 4 min read · blog

Past a token-budget threshold, each new skill silently lowers reliability of the rest. Where the work actually lives is below the skill layer.