Why Execution-Only AI Agents Fail: Add a Steering Layer
My AI agents could finish any task I handed them. Not one of them could tell me the task was a waste of a month.
That gap was never about model quality. It was about which layer I aimed them at. I had handed over execution: write the draft, run the sync, ship the change. Steering, deciding what is worth doing and in what order before the field moves underneath me, I kept for myself. My own judgment is the part that ages fastest.
I run a lot of projects at once, in a field that reprices itself every few weeks. My task system was a task manager with semantic search bolted on. It could find any task in a second. It could not tell me that one project had been blocked for a week on a decision I never made in another.
Retrieval Is Not Structure
Semantic search gives you recall. You think of a thing, it finds the thing. That felt like intelligence until I noticed what it could never do: see that two of my goals depended on each other.
A flat list, no matter how searchable, has no shape. Every task looks equally ready. The one blocked three steps back looks exactly like the one I can start now. What I needed was not better recall. It was a graph.
The Dependency Layer
I found beads, a Git-backed dependency graph built as memory for AI coding agents. I put it under my own human workflow instead.
The command that changed things was bd ready. Instead of staring at every open task across ten projects, I get only the unblocked frontier: the steps I can act on now, with everything waiting on something else hidden until it clears. The first time I ran it, I could finally see which of my goals were standing on top of each other.
That fixed order. It did not fix direction.
A Graph Still Trusts Your Plan
beads enforces the sequence I declared. It assumes the goals themselves are still the right goals. In a slow field that assumption holds. In a fast one it is the actual risk: executing a perfectly ordered plan toward a destination that stopped mattering three weeks ago.
So I moved the agent up a layer. Off execution. Onto steering.
The Drift Audit
Now an agent reads my whole task graph on a schedule and asks one thing: where am I drifting from what I said I wanted? Weekly, it catches tactical drift, the half-finished thread, the project I have not touched. Monthly, it catches the strategic kind, the goal I keep funding out of habit.
It is not checking whether I did the work. It is checking whether the work still points where I claimed.
What I Don’t Know I’m Missing
Here is the uncomfortable part. I add tasks that make complete sense to me the moment I add them. But my knowledge has an edge, and the edge moves without telling me.
So a second agent scans my open tasks the way a recommendation feed scans your history, except it reads them against what actually shipped in the field this week. It flags the paths the world quietly made obsolete, and the ones it made cheap overnight. It keeps me off dead roads I would have happily walked for another month.
Feeding the Loop With My Own Receipts
The last piece came from a plain question: how do people running beads track whether any of this works?
The answer was to stop steering on vibes. My metrics dashboard and the hours I track every day now feed straight back into the steering layer. One month it showed me a project I had named my top priority had eaten a stack of tracked hours and shipped nothing. I had not noticed. The numbers had.
That is the part that still unsettles me. Once an agent steers on my own receipts, the most dangerous task on my list is no longer the one I keep avoiding. It is the one I am finishing fastest, toward a goal that quietly stopped being worth it.
The execution layer was never the hard part. It is maybe a tenth of the judgment that matters. Everything that decides whether a task deserved to exist sits one layer up.
So here is the question worth sitting with. If your AI can finish every item on your list, who is checking that the list is still worth finishing?