Claude API Tool Use: Function Calling Guide for Production

March 26, 2026 · 17 min read · claude-api, tool-use, function-calling, llm-tools
Claude API Tool Use: Function Calling Guide for Production

Tool use is the default pattern for any Claude workload beyond chat. If you are building anything that reads from a database, hits an API, writes a file, or decides between branches of logic, you should be using tools. If you are not, you are probably over-prompting and under-engineering.

I run ten Claude-powered agents in production as bash scripts on a Debian VPS. Every one of them uses tool use, not prompt chaining, to decide what to do next. The model picks a tool, I execute it, I feed the result back, the model continues. That loop is boring, predictable, and debuggable. It beats “parse the JSON out of the model’s free-form answer” every time.

This guide is the thing I wish existed when I first wired Claude up to real systems. Schemas, the call loop, parallel calls, streaming, MCP, and the failure modes that will bite you in week three.

What tool use actually is

Tools are typed function signatures you hand to the model. The model does not execute them. It requests a call. You intercept that request, execute whatever code you want in your runtime, return the result, and the model continues reasoning with that result in its context.

That distinction matters. The model is not running code inside Anthropic’s servers. Every tool call is a contract between the model’s output and your runtime. You choose what the tool does. You choose what counts as a valid result. You choose whether to retry, to refuse, or to escalate to a human.

This is the mental model you need:

user message
  > model: "I want to call tool X with args {...}"
  > your code: execute X, get result
  > back to model: "here is the result"
  > model: either calls more tools, or returns final answer

A tool can be anything that can be expressed as a function. Look up a user by email. Charge a card. Query a vector DB. Read a file. Send a Telegram message. If your runtime can do it, the model can request it.

Anatomy of a tool definition

Every tool you send to Claude has three fields:

{
  name: "lookup_order",
  description: "Look up an order by its order ID...",
  input_schema: {
    type: "object",
    properties: { order_id: { type: "string" } },
    required: ["order_id"],
  },
}

The name is an identifier. Use snake_case and keep it stable across versions.

The description is what the model actually reads when it decides whether to call this tool. Treat it like a function docstring. Write when to use it, when not to use it, and include a short example for edge cases. A vague description (“looks up orders”) will get you a model that calls the tool at the wrong time or misses the right moment entirely. A precise description (“Look up an order by its internal order ID. Use this when the user provides an order ID that starts with ORD-. Do not use this for customer email lookups; use lookup_customer for that.”) gives the model something to work with.

The input_schema is JSON Schema. This is where most people leave value on the table. If a field should only be one of five values, use enum. If a string has a format, declare it. If a field is required, list it in required. The tighter your schema, the less the model hallucinates shapes you can’t handle.

A solid tool definition looks like this:

{
  name: "create_ticket",
  description:
    "Create a customer support ticket. Use this after you have " +
    "confirmed the customer's identity and collected the issue " +
    "description. Do not call this to log internal notes; use " +
    "`add_internal_note` for that. Example: user reports a broken " +
    "checkout flow, you call this with category='checkout', " +
    "priority='high'.",
  input_schema: {
    type: "object",
    properties: {
      customer_id: { type: "string", description: "Internal customer ID." },
      category: {
        type: "string",
        enum: ["checkout", "shipping", "billing", "account", "other"],
      },
      priority: {
        type: "string",
        enum: ["low", "normal", "high", "urgent"],
      },
      summary: { type: "string", maxLength: 200 },
    },
    required: ["customer_id", "category", "priority", "summary"],
  },
}

Every production tool I have shipped followed roughly that shape. If your schema would not survive a TypeScript compile, it is not tight enough.

Tool choice strategies

The tool_choice parameter controls how the model decides whether to use tools at all.

{ "type": "auto" } is the default. The model decides. It may respond directly without calling any tool, or it may call one or more tools. Use this for general agents that sometimes chat and sometimes act.

{ "type": "any" } forces the model to call some tool, but lets it pick which one. Use this when you want the model to make a decision expressed as a tool call, and you never want a free-form reply.

{ "type": "tool", "name": "specific_tool" } forces a specific tool. This is the pattern I use for structured output. Give the model one tool whose schema is the shape you want back, force that tool, parse the arguments. More reliable than prefill-and-parse when you truly need the schema. See Claude API structured output for the full comparison of tool-forcing versus prefill.

{ "type": "none" } disables tools for that turn. Useful when you want the model to summarize or explain without taking action.

The call loop

Here is the loop every tool-using agent runs, in pseudocode:

messages = [user_message]
while iterations < budget:
  response = claude.messages.create(tools=TOOLS, messages=messages, ...)
  messages.append({ role: "assistant", content: response.content })
  if response.stop_reason == "end_turn":
    return response
  if response.stop_reason == "tool_use":
    tool_results = []
    for block in response.content:
      if block.type == "tool_use":
        result = execute(block.name, block.input)
        tool_results.append({
          type: "tool_result",
          tool_use_id: block.id,
          content: result,
        })
    messages.append({ role: "user", content: tool_results })
  iterations += 1

Two stop reasons matter here. end_turn means the model is done and the last text block is the answer. tool_use means the model wants to call one or more tools. On tool_use you execute every tool_use block in the response, wrap each result in a tool_result block tagged with the matching tool_use_id, and send them back as a single user message.

In TypeScript with the Anthropic SDK:

import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic();
const MAX_ITERATIONS = 10;

async function runAgent(userMessage: string) {
  const messages: Anthropic.MessageParam[] = [
    { role: "user", content: userMessage },
  ];

  for (let i = 0; i < MAX_ITERATIONS; i++) {
    const response = await client.messages.create({
      model: "claude-sonnet-4-6",
      max_tokens: 4096,
      tools: TOOLS,
      messages,
    });

    messages.push({ role: "assistant", content: response.content });

    if (response.stop_reason === "end_turn") {
      return response;
    }

    if (response.stop_reason === "tool_use") {
      const toolResults: Anthropic.ToolResultBlockParam[] = [];
      for (const block of response.content) {
        if (block.type === "tool_use") {
          const result = await execute(block.name, block.input);
          toolResults.push({
            type: "tool_result",
            tool_use_id: block.id,
            content: JSON.stringify(result),
            is_error: result.error ? true : false,
          });
        }
      }
      messages.push({ role: "user", content: toolResults });
      continue;
    }

    break;
  }
  throw new Error("Iteration budget exhausted");
}

That is the entire shape of a tool-using agent. Everything else is schema design, error handling, and guardrails.

Parallel tool calls

The model can emit multiple tool_use blocks in one response. If a user asks “look up order ORD-123 and also check whether [email protected] has a ticket open”, Claude will often return both calls at once. Execute them in parallel. Return every result in the next user turn, each tagged with the right tool_use_id.

const toolResults = await Promise.all(
  response.content
    .filter((b): b is Anthropic.ToolUseBlock => b.type === "tool_use")
    .map(async (block) => {
      const result = await execute(block.name, block.input);
      return {
        type: "tool_result" as const,
        tool_use_id: block.id,
        content: JSON.stringify(result),
        is_error: !!result.error,
      };
    }),
);

You must return a tool_result for every tool_use block in the same message. Miss one and the API rejects the next request. This is the most common mistake I see. If a call fails, still return a tool_result with is_error: true and a short message explaining the failure.

Error handling

Set is_error: true on the tool_result when something went wrong. Keep the content short and actionable. The model reads it and usually retries with different arguments.

{
  type: "tool_result",
  tool_use_id: block.id,
  content: "Order not found. The ID 'ORD-xyz' did not match any order. Check the format: valid IDs start with ORD- followed by 6 digits.",
  is_error: true,
}

Do not return stack traces. The model does not care and you are burning tokens on noise. Return the one sentence a junior engineer would need: what failed, why, what to try instead.

For transient failures (timeouts, 429s, upstream flaps) you have two choices. Retry inside the tool (hidden from the model) or surface the failure and let the model decide. I prefer hidden retries with an exponential backoff, and I only surface the error after three attempts fail. The model does not need to know your upstream is flaky.

Disambiguation via schemas

Most “the model called the wrong tool” problems are schema problems, not prompting problems. If you have two tools that could both plausibly fit a request, the model will guess. Tighten the schemas until they cannot be confused.

Bad:

{ name: "send_message", input_schema: { properties: { message: { type: "string" } } } }
{ name: "create_note", input_schema: { properties: { text: { type: "string" } } } }

Better:

{
  name: "send_customer_email",
  description: "Send an email to the customer's registered address. Use only when the customer has asked for a written response.",
  input_schema: { properties: { customer_id: { type: "string" }, subject: { type: "string" }, body: { type: "string" } }, required: ["customer_id", "subject", "body"] }
}
{
  name: "add_internal_note",
  description: "Add a note visible only to support agents. The customer will not see this. Use for triage context.",
  input_schema: { properties: { ticket_id: { type: "string" }, note: { type: "string", maxLength: 1000 } }, required: ["ticket_id", "note"] }
}

Enums and required fields do most of the work. If your schema lets the model guess a string, it will guess wrong eventually.

Streaming tool calls

Tool arguments can be long. If you are generating a 2KB JSON blob as a tool input, streaming lets you show progress or start parsing early. Claude emits input_json_delta events inside content_block_delta events while the tool call is being generated.

const stream = client.messages.stream({
  model: "claude-sonnet-4-6",
  max_tokens: 4096,
  tools: TOOLS,
  messages,
});

let partialJson = "";
for await (const event of stream) {
  if (event.type === "content_block_delta" && event.delta.type === "input_json_delta") {
    partialJson += event.delta.partial_json;
    // parse partial_json with a tolerant parser if you need live progress
  }
}
const finalMessage = await stream.finalMessage();

Streaming is worth it when tool arguments are large or when you want to surface progress to a user. For most tool calls (short arguments, fast model), plain request/response is simpler and just as fast.

Interactions that matter

Tool use composes with everything else Claude offers. These three combinations come up constantly.

Tool use and prompt caching. Tool definitions are often the largest static chunk in your system. Cache them. Add cache_control: { type: "ephemeral" } to the last tool in the array (or to your system prompt if tools are small) and you pay 10% of input cost on cache hits. For an agent that runs in a loop with the same tool definitions across a hundred iterations, this matters a lot. See Claude prompt caching for the full breakdown.

Tool use and extended thinking. With extended thinking enabled, the model plans in its thinking block before deciding which tool to call. You get visibly better tool selection on multi-step problems: the model reasons “I need to first look up the customer, then check recent tickets, then decide whether to escalate” inside the thinking block, and then dispatches the right tool. Enable thinking for agents that need to chain three or more tools to reach an answer.

Tool use and MCP. MCP (Model Context Protocol) is a standard for defining tools once and plugging them into any client. When Claude calls an MCP tool, you see a normal tool_use block. The difference is that the MCP server, not your application code, handles execution. I run my own TickTick MCP server for personal task management, and the client side code treats those tools identically to hand-defined ones. If you are shipping a tool surface you want other Claude clients to reuse, write it as an MCP server. See Build an MCP server in TypeScript.

Common failure modes and fixes

Every tool-using agent fails in roughly the same ways. Here are the ones I hit most often and what fixes them.

Model calls the tool with the wrong shape. The argument is missing a required field, or a string is passed where an array is expected. Fix: tighten the schema. Add required, add enum, add maxLength. If the schema cannot be looser than the truth, the model cannot violate it.

Model refuses to call a tool you want it to call. It keeps answering in free text. Fix: the description is too restrictive, or the model does not think the user’s request matches. Weaken the “do not use when…” clauses in the description. If the model should always act, switch to tool_choice: { type: "any" } and drop the chat-only branch.

Model hallucinates a tool name. It invents search_customers when you only defined lookup_customer_by_email. Fix: use tool_choice with a specific tool when you want a specific call. Or rename your tool to the phrase the model keeps reaching for. The model is telling you what it expected to find.

Infinite loop. The agent calls the same tool with the same arguments forever. Fix: always cap iterations. I default to 10, sometimes 20. Also detect exact repeats: if the last two assistant turns are byte-identical, stop and return the last result. Something upstream is wrong and more iterations will not fix it.

Schema mismatch silently succeeds. Tool call returns data in a shape the model did not expect, model produces nonsense. Fix: validate tool results against a response schema before returning them. Do not trust your own upstream.

A realistic example: customer support triage

Here is the full loop for a support agent with three tools. It looks up an order, creates a ticket if needed, and emails the customer. It demonstrates parallel calls, error handling, and iteration budget.

import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic();
const MAX_ITERATIONS = 8;

const TOOLS: Anthropic.Tool[] = [
  {
    name: "lookup_order",
    description:
      "Look up an order by its order ID. Use this when the user " +
      "references an order ID (format: ORD- followed by 6 digits). " +
      "Returns order status, shipping info, and customer ID.",
    input_schema: {
      type: "object",
      properties: {
        order_id: { type: "string", pattern: "^ORD-[0-9]{6}$" },
      },
      required: ["order_id"],
    },
  },
  {
    name: "create_ticket",
    description:
      "Create a customer support ticket. Call this after you have " +
      "identified the customer and understood the issue. Do not call " +
      "for general questions that can be answered directly.",
    input_schema: {
      type: "object",
      properties: {
        customer_id: { type: "string" },
        category: {
          type: "string",
          enum: ["shipping", "refund", "damaged", "wrong_item", "other"],
        },
        priority: {
          type: "string",
          enum: ["low", "normal", "high", "urgent"],
        },
        summary: { type: "string", maxLength: 280 },
      },
      required: ["customer_id", "category", "priority", "summary"],
    },
  },
  {
    name: "email_customer",
    description:
      "Send an email to the customer. Use once the ticket is created " +
      "and you have a concrete next step to communicate. Keep the body " +
      "short, courteous, and free of internal jargon.",
    input_schema: {
      type: "object",
      properties: {
        customer_id: { type: "string" },
        subject: { type: "string", maxLength: 100 },
        body: { type: "string", maxLength: 2000 },
      },
      required: ["customer_id", "subject", "body"],
    },
  },
];

async function execute(name: string, input: unknown): Promise<{ ok: boolean; data?: unknown; error?: string }> {
  try {
    switch (name) {
      case "lookup_order":
        return { ok: true, data: await lookupOrder((input as { order_id: string }).order_id) };
      case "create_ticket":
        return { ok: true, data: await createTicket(input as CreateTicketInput) };
      case "email_customer":
        return { ok: true, data: await emailCustomer(input as EmailCustomerInput) };
      default:
        return { ok: false, error: `Unknown tool: ${name}` };
    }
  } catch (err) {
    return { ok: false, error: err instanceof Error ? err.message : String(err) };
  }
}

export async function runSupportAgent(userMessage: string) {
  const messages: Anthropic.MessageParam[] = [
    { role: "user", content: userMessage },
  ];

  for (let i = 0; i < MAX_ITERATIONS; i++) {
    const response = await client.messages.create({
      model: "claude-sonnet-4-6",
      max_tokens: 4096,
      system:
        "You are a customer support triage agent. When a customer " +
        "reports an issue, look up the relevant order, create a ticket, " +
        "and send a brief confirmation email. Keep tone direct and " +
        "helpful. If you cannot find the order, ask the customer for " +
        "clarification rather than guessing.",
      tools: TOOLS,
      messages,
    });

    messages.push({ role: "assistant", content: response.content });

    if (response.stop_reason === "end_turn") {
      return response;
    }

    if (response.stop_reason === "tool_use") {
      const toolBlocks = response.content.filter(
        (b): b is Anthropic.ToolUseBlock => b.type === "tool_use",
      );
      const toolResults = await Promise.all(
        toolBlocks.map(async (block) => {
          const result = await execute(block.name, block.input);
          return {
            type: "tool_result" as const,
            tool_use_id: block.id,
            content: result.ok
              ? JSON.stringify(result.data)
              : `Error: ${result.error}`,
            is_error: !result.ok,
          };
        }),
      );
      messages.push({ role: "user", content: toolResults });
      continue;
    }

    break;
  }

  throw new Error("Support agent exhausted iteration budget");
}

That is a complete agent. Parallel execution via Promise.all, error handling via is_error, iteration cap, tight schemas. Drop it into an HTTP handler, a queue worker, or a Telegram bot and you have something a team can actually use.

For more on when to build this kind of agent versus buying a vendor offering, see AI agents: build vs buy. For when to pick Claude over OpenAI for this pattern, Claude API vs OpenAI for business automation.

Production guardrails

Everything above gets you a working agent. These are the things that keep it alive.

Iteration budget. Always cap iterations. 10 is a reasonable default. Log the iteration number on every loop. If you consistently hit the cap, your tools are wrong, not the model.

Cost cap per invocation. Track input plus output tokens per run. Set a hard limit. Return an error when exceeded rather than letting a runaway loop bill you at Opus prices.

PII redaction in tool_results. If a tool returns a raw email, phone number, or payment detail, the model sees it and may echo it in its final answer. Redact on the way in (mask customer data in the tool result) or filter on the way out (strip regex matches from the final text).

Audit logging. Every tool call gets a row in a log: timestamp, user ID, tool name, input, output, latency, error. If something goes wrong in production you need the full call trace. Model response IDs help correlate logs to Anthropic-side usage data.

Rate limits per user. The model will happily call email_customer ten times in ten seconds if the loop goes wrong. Put a rate limit in front of any side-effecting tool.

Human-in-the-loop for high-stakes actions. Refund, delete, send-to-production: do not let the model execute these directly. Have the tool queue the action for human approval and return “queued, awaiting approval”. The model’s plan stays clear, the risk stays bounded.

Testing tool-calling agents

Tool-calling agents need real tests, not just eyeballing outputs.

Mock the tools. Do not hit real APIs in unit tests. Mock execute to return canned results. Assert the model called the right tools in the right order with the right arguments.

Assert call shape, not just final text. The final assistant message can be correct for wrong reasons. Check that the model called lookup_order before create_ticket. Check that create_ticket received the customer ID from the lookup result.

Fuzz the schemas. Send oddly shaped user inputs. Emoji. Wrong order ID formats. Mixed languages. Make sure the model either calls the right tool or asks for clarification, and never crashes your execute function with an unexpected input shape.

Replay recorded conversations. Capture real user conversations (with consent), scrub PII, and replay them against new prompt or tool changes. This is the single most valuable regression test I have for agents.

For agents built on top of Claude Code itself, see Claude Code SDK agents for a pattern that handles multi-turn, multi-tool flows out of the box.

When tool use is the wrong pattern

Tool use is the default, but it is not always right.

For pure text generation (summaries, translations, rewrites), skip tools and return text. Tools add latency and tokens.

For structured output where you already know the schema, force a single tool (tool_choice: { type: "tool", name: "..." }) rather than running a multi-turn agent. One call in, one call out.

For anything where a static prompt plus a template would work, use the template. Do not build an agent because the word “agent” is fashionable.

Download the AI Automation Checklist (PDF)

Checkliste herunterladen Download the checklist

Kostenloses 2-seitiges PDF. Kein Spam. Free 2-page PDF. No spam.

Kein Newsletter. Keine Weitergabe. Nur die Checkliste. No newsletter. No sharing. Just the checklist.

Ihre Checkliste ist bereit Your checklist is ready

Klicken Sie unten zum Herunterladen. Click below to download.

PDF herunterladen Download PDF Ergebnisse gemeinsam durchgehen? → Walk through your results together? →