Migrate OpenAI to Claude: API Migration Guide for 2026

April 4, 2026 · 13 min read · claude-api, openai, migration, llm-engineering
Migrate OpenAI to Claude: API Migration Guide for 2026

Most teams I talk to arrive at the same moment: the OpenAI bill crosses $500/month, an agent loop that worked on GPT-4o starts fumbling tool calls, or legal raises an eyebrow about single-provider risk. Then the question lands in my inbox: what does it actually take to migrate OpenAI to Claude?

Short answer: a weekend if you have one endpoint, two weeks if you have a real product. The SDKs are similar enough that the ported code looks boring. The interesting work is in the prompts, the tool use loop, and the parts of your codebase that silently depend on OpenAI-specific behavior like seed, logprobs, or the response_format JSON schema flag.

This guide walks the full openai to claude migration. API shape deltas, feature map, model selection, side-by-side code, prompt adjustments, cost math, and the gotchas I have hit shipping both providers into production. My verdict: if you are paying more than $500/month to OpenAI and your workload leans on reasoning or tool use, you will come out ahead on Claude. If you are running cheap extraction at tight latency, stay where you are until the ecosystem tooling around Haiku 4.5 matures.

Why teams migrate

Five reasons come up over and over.

Cost. Sonnet 4.6 runs $3 input / $15 output per million tokens. GPT-4o is $2.50 / $10. On paper OpenAI is cheaper. In practice Claude’s prompt caching (90% read discount) flips that math the moment you have a system prompt above a few thousand tokens, and tool-heavy workflows spend most tokens on inputs, not outputs.

Reasoning quality on long tool loops. Claude degrades more gracefully across 20+ tool calls. I have watched GPT-4o forget what the user asked after step 12. Claude narrates its way back to the original goal.

Tool use reliability. Claude’s tool use returns a structured tool_use content block with the arguments already parsed. No stringified JSON to re-parse, no “sometimes the model forgets to close a brace” retries.

Contract compliance. OpenAI’s ToS and data handling rules have shifted more than once in the last two years. Teams on enterprise contracts want a second provider in the loop before the next change.

Multi-provider risk reduction. Single-vendor dependency on anything, including an LLM, is a production risk. Having a working Claude path means you can move traffic in an afternoon when you need to.

The API shape deltas

Install the SDK.

npm install @anthropic-ai/sdk      # TypeScript
pip install anthropic              # Python

Auth uses ANTHROPIC_API_KEY instead of OPENAI_API_KEY. Same env var pattern, different name.

The endpoint mapping is the single most important thing to internalize:

client.chat.completions.create()  >  client.messages.create()

From there, the request shape differs in five places.

System prompts. OpenAI encodes them as {role: "system", content: "..."} inside the messages array. Claude takes system as a top-level parameter. If your codebase has a buildMessages() helper that prepends a system message, rip that logic out and pass system separately.

No n parameter. Claude returns one completion per call. If you were using n: 3 for diversity, you now loop the call three times or use a higher temperature with a single call.

max_tokens is required. OpenAI lets you omit it and defaults to a model-specific cap. Claude makes you declare the output budget upfront. Pick a number that fits your use case (2000 for chat, 8000 for long generation, up to 64000 for Sonnet 4.6 with the right header).

Response shape. OpenAI returns choices[0].message.content as a single string. Claude returns content as an array of content blocks. Each block has a type of text, tool_use, or thinking. Even for a plain chat call you index into the array:

const response = await client.messages.create({...});
const text = response.content[0].type === "text"
  ? response.content[0].text
  : "";

Streaming events. OpenAI streams delta tokens. Claude streams typed events: message_start, content_block_start, content_block_delta, content_block_stop, message_stop. If you have a custom streaming UI, the event handler gets rewritten. Worth it for the cleaner semantics.

Feature map

OpenAI featureClaude equivalentNotes
response_format: json_objectTool use patternDefine a tool with the target schema, force it with tool_choice. See the structured output guide.
response_format: json_schema (strict)Tool use with input_schemaClaude enforces the schema at the tool level.
Function callingTool useSame concept, the field is input_schema instead of parameters.
tools arraytools arrayNear-identical structure.
temperature, top_p, max_tokensSame namesmax_tokens is now required.
seedNot supportedTests that assert on exact output need mocking.
logprobsNot supportedNo token-level probabilities.
response_format: audio / imageNot supportedText and vision input only.
Vision inputSupportedDifferent content block shape, covered below.
Prompt cachingNative cache_control blocksExplicit cache points, 90% read discount. See prompt caching.
reasoning_effort (o1, o3)Extended thinkingthinking: {type: "enabled", budget_tokens: 10000}.
Assistants API / Agents SDKClaude Code SDKDifferent agent model, maps cleanly for most workflows.
Files API (retrieval)Files API + CitationsComparable, different shape.

The single biggest behavioral shift is structured output. OpenAI trained you to set response_format and get JSON. On Claude you define a tool, force it with tool_choice: {type: "tool", name: "extract"}, and read the arguments out of the tool_use block. This is covered in depth in the tool use guide and the structured output post. Once you get it, it becomes the default pattern even when you are not calling real tools.

Model selection

Do not map models by name. Map by job.

JobOpenAIClaude
Default balanced workhorsegpt-4oclaude-sonnet-4-6
Cost-sensitive extractiongpt-4o-miniclaude-haiku-4-5-20251001
Heavy reasoning, planningo1, o3claude-opus-4-7 with extended thinking
Legacy budget tiergpt-3.5-turboclaude-haiku-4-5-20251001

Sonnet 4.6 is where most of your traffic should land. Haiku 4.5 is fast and cheap enough to replace mini-tier work, with the caveat that ecosystem tooling (vector DB integrations, observability defaults) still ships with GPT-4o-mini as the example. You will write a few more lines of config.

Opus 4.7 with extended thinking replaces the o-series for anything that benefits from visible reasoning. The key difference: OpenAI hides the reasoning chain, Claude returns it as a thinking content block you can log, audit, or strip before showing the user.

For the full pricing breakdown, see the LLM API cost comparison.

Code: before and after

Basic chat

Before (OpenAI):

import OpenAI from "openai";
const client = new OpenAI();

const response = await client.chat.completions.create({
  model: "gpt-4o",
  messages: [
    { role: "system", content: "You are a helpful assistant." },
    { role: "user", content: "Summarize this in 20 words: ..." },
  ],
});

const text = response.choices[0].message.content;

After (Claude):

import Anthropic from "@anthropic-ai/sdk";
const client = new Anthropic();

const response = await client.messages.create({
  model: "claude-sonnet-4-6",
  max_tokens: 1024,
  system: "You are a helpful assistant.",
  messages: [
    { role: "user", content: "Summarize this in 20 words: ..." },
  ],
});

const text = response.content[0].type === "text"
  ? response.content[0].text
  : "";

Three deltas: system is top-level, max_tokens is required, content is an array.

JSON output

Before (OpenAI):

const response = await client.chat.completions.create({
  model: "gpt-4o",
  messages: [{ role: "user", content: "Extract the name and age from: ..." }],
  response_format: {
    type: "json_schema",
    json_schema: {
      name: "person",
      schema: {
        type: "object",
        properties: {
          name: { type: "string" },
          age: { type: "number" },
        },
        required: ["name", "age"],
      },
      strict: true,
    },
  },
});

const parsed = JSON.parse(response.choices[0].message.content);

After (Claude):

const response = await client.messages.create({
  model: "claude-sonnet-4-6",
  max_tokens: 1024,
  tools: [{
    name: "extract_person",
    description: "Extract person details from the input",
    input_schema: {
      type: "object",
      properties: {
        name: { type: "string" },
        age: { type: "number" },
      },
      required: ["name", "age"],
    },
  }],
  tool_choice: { type: "tool", name: "extract_person" },
  messages: [{ role: "user", content: "Extract the name and age from: ..." }],
});

const toolUse = response.content.find(b => b.type === "tool_use");
const parsed = toolUse?.input;  // already an object, no JSON.parse

The Claude version feels heavier because you write a tool, but you also get a parsed object back instead of a string you have to validate. I run this pattern in every production extraction pipeline I ship.

Function calling (full loop)

Before (OpenAI):

const messages = [{ role: "user", content: "Weather in Madrid?" }];

const response = await client.chat.completions.create({
  model: "gpt-4o",
  messages,
  tools: [{
    type: "function",
    function: {
      name: "get_weather",
      parameters: { type: "object", properties: { city: { type: "string" } } },
    },
  }],
});

const call = response.choices[0].message.tool_calls?.[0];
if (call) {
  const args = JSON.parse(call.function.arguments);
  const result = await getWeather(args.city);
  messages.push(response.choices[0].message);
  messages.push({ role: "tool", tool_call_id: call.id, content: result });
  // call again with the result
}

After (Claude):

const messages = [{ role: "user", content: "Weather in Madrid?" }];

const response = await client.messages.create({
  model: "claude-sonnet-4-6",
  max_tokens: 1024,
  tools: [{
    name: "get_weather",
    input_schema: { type: "object", properties: { city: { type: "string" } } },
  }],
  messages,
});

const toolUse = response.content.find(b => b.type === "tool_use");
if (toolUse) {
  const result = await getWeather(toolUse.input.city);
  messages.push({ role: "assistant", content: response.content });
  messages.push({
    role: "user",
    content: [{ type: "tool_result", tool_use_id: toolUse.id, content: result }],
  });
  // call again with the result
}

Two things to notice. Arguments come pre-parsed as an object, no JSON.parse. And the tool result goes back as a user message with a tool_result content block, not a dedicated tool role.

Streaming

After (Claude, TypeScript):

const stream = await client.messages.stream({
  model: "claude-sonnet-4-6",
  max_tokens: 1024,
  messages: [{ role: "user", content: "Write a haiku." }],
});

for await (const event of stream) {
  if (event.type === "content_block_delta" && event.delta.type === "text_delta") {
    process.stdout.write(event.delta.text);
  }
}

Vision

const response = await client.messages.create({
  model: "claude-sonnet-4-6",
  max_tokens: 1024,
  messages: [{
    role: "user",
    content: [
      { type: "image", source: { type: "base64", media_type: "image/png", data: b64 } },
      { type: "text", text: "What is in this image?" },
    ],
  }],
});

OpenAI puts images inside a user message as image_url content parts. Claude uses image with a source object. The porting is mechanical.

Prompt migration tips

Your OpenAI prompts will mostly work on Claude. Small changes get you 10-20% better outputs.

Drop “You are a helpful assistant.” Claude is already helpful. That sentence wastes context. Put your actual instructions in system instead: role, constraints, output format, edge cases.

Use XML tags for structured inputs. Claude was trained heavily on XML-delimited content. <context>...</context>, <examples>...</examples>, <user_input>...</user_input> work better than loose markdown. This is a measurable quality bump on long prompts.

Make reasoning explicit. Even without extended thinking, Claude responds well to “Before answering, think through X, Y, Z in <thinking> tags, then give your final answer in <answer> tags.” You strip the thinking block before showing the user.

Recalibrate temperature. Claude at temperature: 1.0 behaves roughly like OpenAI at 0.7 for creative tasks. Start at 1.0 and lower if output drifts.

Remove retry-on-parse hacks. If your code catches JSON.parse failures and retries, you no longer need that layer once you move to the tool-use pattern. Arguments arrive as objects.

Cost math

For a workload of 1M input tokens and 200K output tokens per day:

ModelInput $/MOutput $/MDaily cost
gpt-4o$2.50$10.00$4.50
claude-sonnet-4-6$3.00$15.00$6.00
claude-sonnet-4-6 (90% cached)$0.30$15.00$3.30
gpt-4o-mini$0.15$0.60$0.27
claude-haiku-4-5$0.80$4.00$1.60

Sonnet looks more expensive until you cache. If your system prompt and tool definitions are stable across calls (they usually are in production), prompt caching brings Sonnet below GPT-4o for input-heavy work. Haiku 4.5 is not a GPT-4o-mini replacement on price alone. It wins when tool use reliability matters.

Pull the full cost comparison for the rest of the models.

Migration strategies

Four patterns. Pick one. Do not mix.

Big-bang. Change the SDK, change the endpoint, redeploy. Works for small apps with one LLM call, a staging environment, and a rollback plan. I have done this for internal tools in a 2-hour afternoon.

Shadow traffic. Route 1% of production calls to Claude in parallel with OpenAI. Compare outputs offline. Ramp to 10%, 50%, 100% over a week. Mandatory for anything user-facing or revenue-bearing. The observability cost is a week, the de-risking is worth it.

Feature by feature. Migrate the lowest-risk endpoint first. Summarization is usually a gift. Complex multi-tool agents last. This is what I recommend for most mid-size products.

Hybrid long-term. Keep both providers permanently. Claude for reasoning and tool use, OpenAI for cheap extraction and embeddings. The engineering cost is an abstraction layer. The payoff is a hedge against pricing, outage, or ToS changes.

A realistic 2-week plan for a mid-size app:

  • Day 1-2: audit every OpenAI call site. List features in use (tools, response_format, seed, streaming, vision). Flag anything unsupported.
  • Day 3-4: build an abstraction layer with a generateText() and generateStructured() shape that both SDKs implement. Keep OpenAI as the default.
  • Day 5-6: port the first endpoint. Run eval pairs (same input, both providers). Tune prompts.
  • Day 7-8: port the next three endpoints. Set up cost monitoring for Claude.
  • Day 9-10: shadow traffic 10% of production. Compare outputs, error rates, latency.
  • Day 11-12: ramp to 50%. Fix the 2-3 edge cases you will discover.
  • Day 13-14: ramp to 100%. Keep OpenAI as a feature-flag fallback for one more sprint.

The gotchas list

No seed parameter. Tests that asserted on exact Claude output need mocking or golden-snapshot tolerance.

Tool use narration. Claude often writes a sentence of text before calling a tool (“I will look that up for you.”). OpenAI jumps straight to function_call. If your UI hides pre-tool text, it will need to keep doing that, just on a different content block structure.

{{system}} template variables. Prompt templates that inject a system message into the messages array need to route to the top-level system parameter instead. Check every template.

Rate limits work differently. Claude tracks input tokens per minute, output tokens per minute, and requests per minute as separate limits. OpenAI combines more of them. A workload that fits under OpenAI’s TPM might hit Claude’s input limit first, or vice versa. Check tier headroom before cutover.

Error shapes. Claude errors come back with error.type values like invalid_request_error, rate_limit_error, overloaded_error. OpenAI’s are structured differently. Error handlers that switch on error codes need updating.

No logprobs. If you were using token probabilities for confidence scoring or filtering, there is no drop-in replacement. You rebuild that feature with a self-evaluation call or a separate classifier.

Thinking token budget. Extended thinking consumes output tokens from max_tokens. Set max_tokens high enough to cover thinking plus the answer, or the response cuts off mid-reason.

Image format. OpenAI accepts image_url for both URLs and base64. Claude splits into source.type: "url" and source.type: "base64". Your upload pipeline picks the right one.

Multi-turn tool use. When you pass the response’s content blocks back as an assistant message, pass the entire array, not just the tool_use block. Claude needs the full prior turn including any narrative text.

Should you migrate?

Migrate if:

  • You spend more than $500/month on OpenAI and have stable system prompts (caching will win you money back).
  • Your product uses tool calling or multi-step agent loops (Claude’s reliability shows up here).
  • You need extended thinking for planning or debugging workflows.
  • You need a second provider for compliance, contract negotiation, or outage resilience.

Stay on OpenAI if:

  • You are running cheap, high-volume extraction where every tenth of a cent matters and you are already using GPT-4o-mini under its TPM limit.
  • You depend on seed, logprobs, or audio/image output.
  • Your team has zero bandwidth for a prompt retuning pass, and the current outputs are good enough.

For the deeper provider comparison, see Claude API vs OpenAI for business automation.

Download the AI Automation Checklist (PDF)

Checkliste herunterladen Download the checklist

Kostenloses 2-seitiges PDF. Kein Spam. Free 2-page PDF. No spam.

Kein Newsletter. Keine Weitergabe. Nur die Checkliste. No newsletter. No sharing. Just the checklist.

Ihre Checkliste ist bereit Your checklist is ready

Klicken Sie unten zum Herunterladen. Click below to download.

PDF herunterladen Download PDF Ergebnisse gemeinsam durchgehen? → Walk through your results together? →