Blog

Engineering notes on AI agents, automation, and the infrastructure behind them.

Why Execution-Only AI Agents Fail: Add a Steering Layer

Why Execution-Only AI Agents Fail: Add a Steering Layer

June 21, 2026 · 4 min read
Execution-only AI agents finish every task but never ask if the task was worth doing. The fix is a steering layer over your work.
AI Agent Best Practices: Trust Your Own Results Before Google

AI Agent Best Practices: Trust Your Own Results Before Google

June 16, 2026 · 3 min read
Your AI agent reaches for googled best practices before your own proven fixes. Wire a trust order into your CLAUDE.md and agent loop instead.
Why AI Coding Agents Skip Your Definition of Done

Why AI Coding Agents Skip Your Definition of Done

June 16, 2026 · 4 min read
AI coding agents agree to your process, then skip it. Why review can't catch it, and the one fix that works: a deterministic finish-line gate.
Your Task Manager Is the Best Agent Memory You're Not Using

Your Task Manager Is the Best Agent Memory You're Not Using

June 13, 2026 · 4 min read
Agent memory without a new vector DB. Your task app is years of curated, ranked context. ATS gives your agent a hybrid-retrieval channel into it.
AI Agent Model Evaluation: 5 Tests Before the Night Shift

AI Agent Model Evaluation: 5 Tests Before the Night Shift

June 11, 2026 · 4 min read
A five-test protocol to catch regressions, compare cost, and canary a new model before it runs an AI agent unattended.
Build the Harness Once With Your Best Model. Run It on a Cheap One.

Build the Harness Once With Your Best Model. Run It on a Cheap One.

June 3, 2026 · 4 min read
Agents forget and good ones cost. The fix is not a better model. Put the goal in deterministic scripts and run a cheap model against them.
Most of Your AI Skills Will Rot. Here's Which Ones Compound.

Most of Your AI Skills Will Rot. Here's Which Ones Compound.

June 3, 2026 · 4 min read
A skill's lifespan is set by what it couples to, not how good the prompt is. Why most AI skills rot, which parts compound, and how to tell.
Claude Code Stops Following Your CLAUDE.md: Read-Once Rules and Hooks

Claude Code Stops Following Your CLAUDE.md: Read-Once Rules and Hooks

June 2, 2026 · 4 min read
Claude Code reads your CLAUDE.md once at startup, so rules decay as the session fills up. Move the ones that must never break into hooks.
Claude Opus 4.8 Is Out. The Number I Care About Isn't on the Benchmark Chart.

Claude Opus 4.8 Is Out. The Number I Care About Isn't on the Benchmark Chart.

May 29, 2026 · 3 min read
Opus 4.8 shipped May 28. For unattended cron agents, the upgrades that matter are not the benchmark scores. A use-case breakdown from real builds.
Your 50th Skill Makes the First 49 Less Reliable

Your 50th Skill Makes the First 49 Less Reliable

May 27, 2026 · 4 min read
Past a token-budget threshold, each new skill silently lowers reliability of the rest. Where the work actually lives is below the skill layer.
Self-Hosted Voice AI: Why GDPR Is the Wrong Test (NIS2 Is the Real One)

Self-Hosted Voice AI: Why GDPR Is the Wrong Test (NIS2 Is the Real One)

May 21, 2026 · 4 min read
A GDPR tick isn't a NIS2 test. What you really need to verify with hosted voice AI vendors before NIS2 puts the board on the hook personally.
Splitting Grounding from Reasoning in Browser-Agent Stacks

Splitting Grounding from Reasoning in Browser-Agent Stacks

May 19, 2026 · 4 min read
Browser-agent stacks bundle grounding and reasoning. A local 2B parser splits them, beats GPT-4o on ScreenSpot-v2 by 2.5x, costs $4 to train.