GitHub Issue Management AI: Build Claude-Powered Triage That Works

April 9, 2026 · 12 min read · github, claude, ai-agents, automation, open-source
GitHub Issue Management AI: Build Claude-Powered Triage That Works

Maintainers do not ship software on Tuesday mornings. They triage. They read a new issue, check whether it is a duplicate of something filed three weeks ago, decide whether it is a bug or a question, pick a priority, add two or three labels, and sometimes write a polite comment asking for a repro. Then they do it again for the next issue in the queue. The job is pure admin, and on any active repo it eats real hours every week.

This walkthrough shows how to replace most of that work with a Claude agent that reads the issue, pulls in repo context, and uses tool use to return a strict triage payload your code can act on. You will leave with a working mental model, a tool schema you can paste into your service, and an opinion on whether to build it or skip straight to a hosted solution.

Primary keyword on the table: github issue management ai. Secondary: ai github label automation, automate github issues, issue classification, github bot claude. I will use them where they belong and nowhere else.

The problem maintainers know too well

Issue triage looks cheap in isolation. One issue, two minutes. Ten issues a day, that is twenty minutes. On a popular OSS repo or a busy internal monorepo, the number is closer to sixty or ninety minutes of fragmented attention. Fragmented, because every issue pulls you out of whatever you were building.

The work itself is pattern matching. Is this a bug or a feature request. Is it a duplicate. Which component is affected. Is it p1 or p3. Do we need a repro. These are exactly the classification tasks that language models are good at, and the structured output patterns from Claude API tool use make the output reliable enough to act on without a human in the middle for every step.

What AI changes in issue triage

A well-designed Claude agent reads the issue body, scans the comments, pulls the last 50 issues for context, checks the linked PRs, and considers the repo README plus existing label taxonomy. In one call it returns a category, a priority, suggested labels, a duplicate candidate, a draft comment, and a confidence score. Your code decides what to do with that payload based on thresholds.

The difference from classic ML label classifiers: you are not training anything. You are writing a prompt, pointing it at a tool schema, and letting the model reason over free-form context. The prompt is the model. When you want to change behaviour, you edit English, not retraining pipelines.

This is the same agent pattern covered in detail in the production AI agent architecture guide. Issue triage is the cleanest first agent to ship because the inputs are bounded, the outputs are structured, and the cost of a wrong call is low if you gate on confidence.

Three levels of automation to choose from

Pick the one that matches your risk appetite.

Level 1: Suggest-only. Claude drafts a triage comment on every new issue. The maintainer reads, edits, applies labels manually. You get context pre-chewed without losing control. This is where everyone should start.

Level 2: Auto-label, manual close. Claude applies labels above a confidence threshold (0.8 works as a default). Duplicate closures stay human because closing someone’s issue incorrectly is a bad look. Draft comments get posted as suggestions.

Level 3: Fully automated. Labels, links to duplicates, and closures all run on confidence thresholds. Reserved for internal repos where blast radius is small and you have observability in place.

Most teams live at Level 2 in production. I would recommend staying there for at least a month before even considering Level 3.

The minimal stack

Nothing fancy. The pieces:

  • GitHub webhook on issues.opened and issues.reopened, received by a GitHub App
  • A small Node or Python service, or an n8n workflow if you prefer low-code
  • Claude API with tool use for structured output
  • GitHub REST API for applying labels, posting comments, linking duplicates

GitHub App over PAT every time. You get repo-scoped installation tokens, no user impersonation, and a cleaner permission audit. The webhook verification signature comes free.

Walkthrough: context > Claude > action

The flow in five steps.

Step 1: GitHub App and webhook

Register a GitHub App, subscribe to Issues events, grant Issues: read & write and Contents: read. Install it on the repo. The App sends a signed POST to your receiver on every matching event. Verify the X-Hub-Signature-256 header before doing anything else.

Step 2: context collection

Before you call Claude, you pull:

  • The issue body and title
  • The comments on the issue (cap at the last 20 to stay cheap)
  • The last 50 issues on the repo (titles only, plus numbers and state)
  • The repo README, CONTRIBUTING.md, and the full list of existing labels

This context block is what makes the triage call intelligent. Without it, Claude is guessing at your label taxonomy. With it, the model learns your conventions in-context.

Step 3: Claude call with tool use

import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic();

const triageTool = {
  name: "triage_issue",
  description: "Classify a GitHub issue and recommend triage actions.",
  input_schema: {
    type: "object" as const,
    properties: {
      category: {
        type: "string",
        enum: ["bug", "feature", "question", "duplicate"],
      },
      priority: {
        type: "string",
        enum: ["p0", "p1", "p2", "p3"],
      },
      labels_to_add: {
        type: "array",
        items: { type: "string" },
        description: "Labels from the repo's existing taxonomy only.",
      },
      is_duplicate_of: {
        type: "number",
        description: "Issue number if duplicate, else null.",
      },
      draft_comment: {
        type: "string",
        description: "A polite, edit-ready triage comment.",
      },
      confidence: {
        type: "number",
        description: "0.0 to 1.0 confidence in this classification.",
      },
    },
    required: ["category", "priority", "confidence"],
  },
};

const response = await client.messages.create({
  model: "claude-sonnet-4-6",
  max_tokens: 1024,
  tools: [triageTool],
  tool_choice: { type: "tool", name: "triage_issue" },
  system: [
    {
      type: "text",
      text:
        "You are a triage assistant for a GitHub repo. Use the tool to classify. " +
        "Be conservative on duplicates. Only use labels that exist in the provided taxonomy.",
    },
    {
      type: "text",
      text: repoContextBlock,
      cache_control: { type: "ephemeral" },
    },
  ],
  messages: [
    {
      role: "user",
      content: `New issue #${issue.number}: ${issue.title}\n\n${issue.body}\n\nRecent comments:\n${commentsBlock}\n\nLast 50 issues:\n${recentIssuesBlock}`,
    },
  ],
});

Note the cache_control on the repo context block. The README, CONTRIBUTING, and label list do not change between issues. Cache it. On the second call in the same 5-minute window you pay roughly one-tenth the price for that chunk. See the Claude API prompt caching post for the full caching rules.

Step 4: act on the structured response

Pull the tool input out of the response. Act on it with thresholds.

const toolUse = response.content.find((b) => b.type === "tool_use");
const triage = toolUse.input as TriageOutput;

if (triage.confidence >= 0.8) {
  await octokit.rest.issues.addLabels({
    owner, repo,
    issue_number: issue.number,
    labels: triage.labels_to_add,
  });
}

if (triage.is_duplicate_of && triage.confidence >= 0.9) {
  await octokit.rest.issues.createComment({
    owner, repo,
    issue_number: issue.number,
    body: `This looks like a duplicate of #${triage.is_duplicate_of}. Closing, reopen if I got this wrong.`,
  });
}

if (triage.draft_comment) {
  await octokit.rest.issues.createComment({
    owner, repo,
    issue_number: issue.number,
    body: triage.draft_comment,
  });
}

Step 5: edge cases

Private repos need the App installed with read scope on contents. GitHub API rate limits sit at 5,000 requests per hour per installation, which is plenty for triage volume. Claude rate limits depend on your tier. Large issues with long threads: paginate the comments endpoint and truncate the oldest comments before they blow your token budget.

The triage_issue tool schema

The schema above in one block, ready to paste.

{
  "name": "triage_issue",
  "description": "Classify a GitHub issue and recommend triage actions.",
  "input_schema": {
    "type": "object",
    "properties": {
      "category": {"type": "string", "enum": ["bug", "feature", "question", "duplicate"]},
      "priority": {"type": "string", "enum": ["p0", "p1", "p2", "p3"]},
      "labels_to_add": {"type": "array", "items": {"type": "string"}},
      "is_duplicate_of": {"type": "number", "description": "Issue number if duplicate, else null"},
      "draft_comment": {"type": "string"},
      "confidence": {"type": "number", "description": "0.0 to 1.0"}
    },
    "required": ["category", "priority", "confidence"]
  }
}

Forcing tool use (tool_choice: {type: "tool", name: "triage_issue"}) removes ambiguity. The model cannot drift into prose. You either get a valid tool call or an API error, which is much easier to handle than parsing free text.

Duplicate detection strategies

Three options, trading cost against recall.

Claude-based. Pass the titles of the last 50 to 100 issues into the prompt and let the model spot candidates. Simple, no extra infrastructure, works well up to a few hundred historical issues. Starts to miss older dupes past that.

Embedding-based. Embed every issue when it is created, store vectors in a DB, and run a similarity search on each new issue. Top 3 results go to Claude for confirmation. Scales to tens of thousands of issues. Needs a vector store and an embedding model.

Hybrid. Embeddings narrow to 20 candidates, Claude confirms which (if any) is a true duplicate. Best recall, still cheap to run. This is what I would ship on a repo with over a thousand lifetime issues.

Auto-labeling without false positives

Threshold work is where triage bots earn their keep or become annoying.

Start at confidence 0.8 for auto-apply, 0.6 for suggest-only. Run in Level 1 suggest-only mode for the first week, log every decision the model made, then eyeball the outputs. Tune the system prompt against the errors you see. Common fixes: “never apply a label not in the provided list”, “treat bug reports without a reproduction as needs-repro not bug”, “questions go to discussion, not question, if the repo has discussion”.

Do not attempt to dial in the thresholds before you have 50 real outputs to look at. You will fit on noise.

Cost per issue

With Sonnet 4.6 pricing and a realistic context payload:

  • Input: around 5,000 tokens (repo context + issue + recent issues)
  • Output: around 500 tokens (the tool call)
  • Uncached: roughly $0.02 per issue
  • With prompt caching on the repo context block: roughly $0.005 per issue on warm calls

Run the math on your volume. A repo with 500 new issues per month lands at $2.50 to $10 in API cost. Compare to 30 seconds to 2 minutes of maintainer time per issue at whatever your loaded hourly rate is. The ratio is not close.

Production hardening

The list of mistakes I would rather you skip.

  • One bot comment per issue, maximum. Track whether the bot has already commented and short-circuit. Nothing kills trust in a bot faster than double-commenting.
  • Respect CODEOWNERS. If a file path pattern is owned by a team, route the priority assignment to them rather than guessing.
  • No auto-close in week one. Run in suggest mode. Gate closures behind manual review until you have a confidence calibration you trust.
  • Log every decision. Store the tool call input, the confidence, and the actions taken. You will want this when you tune thresholds and when someone asks why their issue got labelled needs-repro.
  • Rate limit yourself. A wave of 200 issues from a spam bot should not trigger 200 Claude calls. Deduplicate by author and add a per-repo ceiling.

When to build vs use an off-the-shelf product

Build if you have three or more repos with consistent triage needs, ops capacity to run a service, and willingness to babysit prompt drift when you change your label taxonomy. Build if you want the agent to hook into internal systems (Slack, PagerDuty, your own CRM) that no off-the-shelf tool will integrate with.

Skip the build and use something hosted if you want this running this week, if you do not have time to tune thresholds over a month, or if you just want the triage output and the review loop without owning the infrastructure. FixClaw is the hosted version of exactly this architecture, productised, with the hardening already baked in. Same mental model, zero maintenance.

For a deeper breakdown of the architecture FixClaw ships, see the FixClaw AI GitHub triage guide.

Extensions worth building

Once the base triage agent is running, the obvious next moves:

  • Auto-assign reviewers for issues tagged with specific components, using a mapping from label to GitHub team.
  • Weekly triage digest posted to the maintainer Slack: new issues, stale issues, unlabeled issues, duplicates flagged but not acted on.
  • Stale issue handling with a respectful comment (“no activity for 90 days, closing, reopen any time”). Cheap wins for backlog hygiene.
  • Related PR linking on issue open: search for open PRs whose body mentions the issue keywords and link them.

Most of these are small extensions of the same tool-use pattern. Add a tool, add a system prompt instruction, ship.

If you want to expose the triage capability itself as a reusable server that Claude Code, Cursor, or any MCP client can call on demand, see the TypeScript MCP server guide. And if you are building the agent loop in Claude Code rather than as a standalone service, the Claude Code SDK agents post covers the harness side.

A weekend-build roadmap

A realistic path from zero to shipped.

Saturday. Register the GitHub App. Stand up a webhook receiver (Hono, Express, Fastify, whatever). Verify signatures. Write the first version of the tool schema. Hardcode a test issue payload and get a Claude call returning valid tool input.

Sunday. Wire the context collectors (README, CONTRIBUTING, recent issues, labels). Deploy the receiver somewhere with a public URL (Cloudflare Workers, Railway, a Hetzner VPS with a tunnel). Install the App on one repo and watch the first real issue flow through in suggest mode.

Week 1. Run in Level 1 suggest-only. Log every decision. Eyeball the outputs daily. Tune the system prompt against the errors you see.

Week 2. Move to Level 2. Turn on auto-label at 0.8 confidence. Keep duplicate closures manual. Refine the duplicate detection strategy based on what you have learned.

After two weekends and two weeks of tuning, you have a working triage agent on one repo. Rolling to the second repo is a config change. Rolling to the tenth is the same config change.

The build is not hard. The tuning is the work. If that month of tuning is the part you do not have time for, that is the moment to use a hosted solution instead of owning the operations.

Let FixClaw handle your issue triage