Claude Code SDK Agents: Build Production Agents Without the Loop

April 1, 2026 · 11 min read · claude-code, agents, claude-sdk, automation, mcp
Claude Code SDK Agents: Build Production Agents Without the Loop

Most “build an agent with Claude” tutorials hand you a while-loop around client.messages.create, a hand-rolled tool dispatcher, and a promise that you’ll wire up file reads and shell execution yourself. That works. It also means you spend two weeks rebuilding the same plumbing that Claude Code already ships with.

The Claude Code SDK, sometimes called the Claude Agent SDK, is the shortcut. Same runtime as the claude CLI, exposed as a library in TypeScript and Python, plus a print mode you can call from a bash cron job. You get file tools, bash, MCP client, subagents, hooks, and permission modes without writing any of it.

My verdict up front: use the Claude Code SDK when you want Claude Code’s tool stack inside your own app or script. Use the raw Anthropic API when you need tight-loop inference, custom routing, or token-sensitive workloads where every byte of context matters. I run ten production agents as bash scripts calling claude -p, documented in I run 10 AI agents in production and they’re all bash scripts. The pattern works because the SDK already has the plumbing I would otherwise be maintaining.

Three ways to build an agent with Claude

There are three distinct surfaces, and people mix them up constantly.

The raw Claude API (@anthropic-ai/sdk in TypeScript, anthropic in Python). This is client.messages.create(...). You send messages, you get a response, you handle tool calls yourself, you implement the loop. Nothing is hidden. Nothing is included. If you want the model to read a file, you define a read_file tool, implement the handler, route the tool call back into the next turn, and manage context size. Total control, total responsibility.

The Claude Code CLI in print mode (claude -p "<prompt>"). This is the claude binary running non-interactively. Under the hood it is already an agent: the tool loop, the file tools, the bash tool, MCP support, subagents, and hooks are all wired up. You hand it a prompt and flags, it runs until the task is done, it prints the result. This is what I use in cron scripts. Shell in, shell out.

The Claude Code SDK (@anthropic-ai/claude-code in TypeScript, claude-code-sdk in Python). This is the same runtime as the CLI, but importable. You call query({ prompt, options }), you iterate over messages as they stream, and you get every capability the CLI has from inside your TS or Python app. When I need agent behavior inside a longer-running Node service instead of a bash one-shot, this is the tool.

The three sit on a spectrum. Raw API at one end, full-featured Claude Code at the other, and the SDK lets you pick how much of the runtime you want.

When the SDK beats raw API calls

Here is the decision I run every time I start a new agent project.

Use raw API when: you need a tight inference loop (classification, extraction, chat completion), you are doing single-shot calls where tools are either unnecessary or small and specific, you care about latency to the millisecond, or you need to fit the request into a Lambda-style function with no subprocess budget. Prompt caching, structured output via tool_use, extended thinking, all of that is easier to configure directly on the message. If you want the deep end of that stack, I wrote up Claude API prompt caching and Claude extended thinking separately.

Use claude -p when: you want an agent to run from cron, a shell script, a Telegram bot webhook, or a systemd unit. You want to write a prompt in plain text, shell it out, and read stdout or JSON. You do not want a long-running Node process just to run a five minute task. This is the shape of my morning briefing agent, my agenda follow-up agent, my weekly planning agent. Ten of them. All bash.

Use the Claude Code SDK when: you are building a TS or Python application that needs agent behavior inline. A web service that accepts a job and runs a research task. A CI/CD step that analyzes a diff. A Slack bot that needs to read files in the workspace. You want to stream messages to a UI. You want hooks to run your own code before or after a tool call. You want to inject MCP servers programmatically, not via config file.

If you are weighing the full build vs buy question for agent platforms, I keep a separate guide at AI agents build vs buy that compares managed agent runtimes to rolling your own.

Quick start: your first headless agent

Install the CLI and the SDK from npm.

npm install -g @anthropic-ai/claude-code
export ANTHROPIC_API_KEY=sk-ant-...
# or run `claude auth` for an interactive login stored in the keychain

The simplest possible agent is one line of bash.

claude -p "Summarize the last 10 commits on this repo and flag anything that looks like a breaking change." \
  --model claude-sonnet-4-6 \
  --permission-mode bypassPermissions \
  --output-format json

This runs the full Claude Code agent: it will call git log, maybe git diff, read files if it needs to, and print a JSON blob with the final message. --permission-mode bypassPermissions skips the interactive “allow this tool?” prompts, which is what you want in cron. --output-format json gives you something parseable.

Here is the equivalent from TypeScript using the SDK.

import { query } from "@anthropic-ai/claude-code";

const prompt = "Summarize the last 10 commits on this repo and flag anything that looks like a breaking change.";

for await (const message of query({
  prompt,
  options: {
    model: "claude-sonnet-4-6",
    permissionMode: "bypassPermissions",
    cwd: process.cwd(),
  },
})) {
  if (message.type === "assistant") {
    for (const block of message.message.content) {
      if (block.type === "text") {
        process.stdout.write(block.text);
      }
    }
  }
  if (message.type === "result") {
    console.log("\n---");
    console.log(`Tokens: ${message.usage?.input_tokens} in, ${message.usage?.output_tokens} out`);
    console.log(`Cost: $${message.total_cost_usd?.toFixed(4)}`);
  }
}

Same agent, same tools, same behavior. One you call from a shell, the other you call from a Node process. Pick the surface that fits where the rest of your code lives.

One quirk worth flagging. If your TS or bash script is itself being executed inside a Claude Code session (for example, a sub-task spawned by another agent), the environment variables CLAUDECODE and CLAUDE_CODE_ENTRYPOINT are already set, and the CLI will refuse to nest. Unset them before the recursive call.

env -u CLAUDECODE -u CLAUDE_CODE_ENTRYPOINT claude -p "..." --model claude-sonnet-4-6

I learned that the hard way when a scheduled script started failing silently because it was triggered from an interactive session during testing.

Using MCP servers from the agent

The best part of the Claude Code runtime is that MCP is first class. You do not have to implement tool routing for your MCP server. You register the server, and the agent sees every tool it exposes.

There are two ways to pass MCP config.

From the CLI, put your servers in ~/.claude.json or a project-local .mcp.json, then they are available by default. You can also pass --mcp-config path/to/config.json for a one-off.

From the SDK, pass mcpServers directly.

import { query } from "@anthropic-ai/claude-code";

for await (const message of query({
  prompt: "Pull my TickTick tasks due today and summarize them by project.",
  options: {
    model: "claude-sonnet-4-6",
    permissionMode: "bypassPermissions",
    mcpServers: {
      ticktick: {
        type: "stdio",
        command: "node",
        args: ["/home/user/ticktick-mcp/ticktick-mcp-server.js"],
      },
    },
  },
})) {
  if (message.type === "assistant") {
    // handle streamed assistant messages
  }
}

That is the pattern I use for my own production agents. The agent sees mcp__ticktick__get_tasks_by_date, mcp__ticktick__search_tasks_semantic, and every other tool my server exposes, without any routing code on my side.

If you want to write the MCP server itself, I covered the TypeScript side in Build an MCP server in TypeScript. The SDK plus your own MCP server is the pattern I would recommend for any agent that needs to touch a specific system: calendar, CRM, internal API, whatever.

Budget, permissions, and safety

Headless agents will do whatever you ask. That is the point. It is also the risk. Three knobs keep production agents safe.

Budget caps. The --max-budget-usd flag stops the run when cumulative cost hits a ceiling. I set it on every cron agent.

claude -p "Do the weekly planning review." \
  --model claude-opus-4-7 \
  --max-budget-usd 1.50 \
  --permission-mode bypassPermissions

If a prompt unexpectedly spirals (which happens when the model hits an edge case and keeps retrying), the cap ends the run. Opus is expensive. A cap is not optional.

Permission modes. There are four. ask prompts before every tool call, which is right for interactive development. allowAll allows tool calls without prompting but still blocks some destructive patterns. bypassPermissions is fully unattended, use this for cron only. plan is interesting: the agent plans what it would do, lists the tool calls, but never executes them. Useful for dry runs.

For any agent that writes to a production system, I start in ask mode, watch it work through a real task, then graduate to bypassPermissions once I trust it.

Structured output. --output-format json (or stream-json for streaming) makes downstream parsing safe. The shell output from a Claude agent is not a contract; the JSON is. Parse the JSON, check the result status, then act on it.

claude -p "$PROMPT" --output-format json --permission-mode bypassPermissions \
  | jq -r '.result'

You can go further and wire up hooks. The SDK supports preToolUse and postToolUse hooks that run your own code when the agent is about to call a tool or just finished one. I use them to log every bash command a cron agent runs, separately from the agent’s own transcript, so I have an audit trail even when the agent’s summary is wrong.

Headless vs human-in-the-loop

There is a design question before the code: should this agent run unattended, or should a person see the plan before execution?

Headless is right for: scheduled reports, log monitoring, deploy verification, data fetching, summarization, anything where “do it again tomorrow” is the spec. The morning briefing agent on my VPS runs at 6:30 Madrid time, checks calendar and tasks, writes a summary, Telegrams it to me. I read the summary. If it is wrong, the blast radius is zero.

Human-in-the-loop is right for: anything that changes external state in a non-reversible way. Sending emails. Cancelling bookings. Running kubectl delete. Moving files out of a protected directory. Writing to prod databases. For these, I either keep the agent in ask mode, or I use plan mode to generate the plan, show it to myself on Telegram, and only run the execution pass after a thumbs-up.

A good heuristic: if the worst case from the agent doing something wrong is “I have to read a bad summary”, go headless. If the worst case is “I have to restore from backup”, keep a human in the loop.

Which approach for which workload

Here is how I map workloads to surfaces.

WorkloadTool
Extract structured fields from a PDFRaw API with tool_use
Classify incoming emailsRaw API, small Haiku prompt, cached
Nightly report with file and git accessclaude -p in cron
Telegram bot that runs arbitrary tasks for meclaude -p with MCP servers
CI job that reviews a PR diffSDK from a Node script
Internal dashboard that runs agent tasks on demandSDK from a web service
Research assistant with subagentsSDK, it already knows how to dispatch subagents
Deploy verifier that reads logs and posts to Slackclaude -p or SDK, both fit

The raw API still wins when you know exactly what you want the model to do and you want to pay for only that. The SDK and CLI win when you want tools, filesystem awareness, and MCP to come included.

For a concrete example of the last row, here is a deploy verifier I can run from either surface.

#!/usr/bin/env bash
set -euo pipefail

PROMPT=$(cat <<'EOF'
Check the latest Cloudflare Pages deploy for renezander.com:
1. Read the last entry from /var/log/cf-deploy.log
2. Fetch the homepage and check it returns 200
3. If anything looks wrong, summarize what broke.
4. Post a one-line status to Telegram via /home/user/bin/telegram-notify.sh
EOF
)

claude -p "$PROMPT" \
  --model claude-sonnet-4-6 \
  --permission-mode bypassPermissions \
  --max-budget-usd 0.50 \
  --output-format json \
  | jq -r '.result'

Same idea from TypeScript, if you want it inside a longer Node service.

import { query } from "@anthropic-ai/claude-code";

async function verifyDeploy(): Promise<string> {
  const prompt = `
    Check the latest Cloudflare Pages deploy for renezander.com:
    1. Read the last entry from /var/log/cf-deploy.log
    2. Fetch the homepage and check it returns 200
    3. If anything looks wrong, summarize what broke.
    4. Return a one-line status.
  `;

  let finalText = "";

  for await (const message of query({
    prompt,
    options: {
      model: "claude-sonnet-4-6",
      permissionMode: "bypassPermissions",
      maxBudgetUsd: 0.5,
    },
  })) {
    if (message.type === "result" && message.subtype === "success") {
      finalText = message.result;
    }
  }

  return finalText;
}

Both run the same agent. The bash one is 15 lines. The TS one plugs into the rest of your Node app. Pick based on where the agent lives, not based on which feels more “real”.

One last thing. The Claude Code SDK is where Anthropic keeps adding agent capability. Subagents, hooks, permission modes, MCP improvements, they all land in the Claude Code runtime first. If you build on the raw API, you are rebuilding these features every time they ship. If you build on the SDK, you get them for free on the next npm update. That is why most of my production agents are not on the raw API anymore. The velocity is on the SDK side.

Download the AI Automation Checklist (PDF)