Field Note #2: HubSpot lead scoring told us who was engaged. Not who would pay.

Replacing weighted-sum lead scoring with reasoning-traced AI prioritization in HubSpot. Five custom properties, one nightly batch, reps that actually trust the score.

Subject
HubSpot revenue prioritization
Industry
B2B SaaS / DACH mid-market
Stack
HubSpot workflows, Claude Sonnet, webhook + nightly batch

The case

HubSpot native lead scoring is a weighted sum of engagement behaviors: email opened, pricing page visited, demo requested. It tells you who is curious. It does not tell you who will pay.

Sales reps figure that out within a month and stop trusting the score. The actual prioritization moves to a Tuesday-morning meeting where the manager picks ten accounts from instinct, the team dials them, and the other 10,000 contacts sit untouched.

The fix is not a better weighted sum. It is replacing the score with an LLM that reads the contact’s HubSpot fields, scores them against an explicit ICP, and writes back a tier, a confidence, the matched signals, and a one-sentence rationale — all as HubSpot properties next to the contact.

The numbers

Live state on one contact (Lena Bergmann, VP Digital Transformation, novatech-ag.com):

HubSpot propertyValue
ai_icp_fitStrong
ai_score85
ai_pipeline_rank12 (of 247 open contacts)
ai_next_action“Personal note from AE within 24h; reference scaling signal”
ai_reasoning_trace“B2B SaaS, 200 employees, VP role, repeat pricing-page visits in last 7 days, no deal-breakers.”

Operational metrics: nightly batch over 12k contacts costs ~€180/month on Claude Sonnet, ~€18/month on Haiku. Re-score latency: ~15 min for full base. Rep adoption: scores read on 94% of contacts touched in week 4, vs 11% for the native HubSpot score in the month before.

What worked

  • Reasoning trace as a HubSpot property, not a CloudWatch log. Reps trust scores they can audit at the moment of decision. If the trace lives in another system, it does not exist.
  • Five properties, not one. ai_icp_fit, ai_expected_revenue, ai_next_action, ai_reasoning_trace, ai_pipeline_rank. Each is a different question; reps reach for different ones.
  • Nightly batch over per-event scoring. 10x cheaper, accurate enough for a daily pipeline review. Per-event would have been ~€1,800/month for marginal benefit.
  • ICP as a versioned system-prompt line. Single file, version number, prompt loads at runtime. Whenever sales updates the ICP, the model immediately uses the new one.

What failed

  • Per-event re-scoring on every engagement. Cost blew through the monthly cap in 11 days. The model was no more accurate, just more expensive.
  • Letting the model choose its own cost ceiling. It does not. Kill switch in the workflow that halts at €500/month and pings the owner.
  • Surfacing the rank without the rationale. Reps in the first 2 weeks said “I do not trust this.” Surfacing ai_reasoning_trace next to ai_score flipped adoption inside a fortnight.
  • A single weighted formula, no floor rule. Initial version greenlit a high-value contact who had churned out twice. Floor rule (no Tier A if any dimension < 3) prevents single-signal dominance.

The architecture

HubSpot contact event ───► Trigger (workflow or nightly batch)
                              │
                              ▼
                       AI scoring endpoint
                       (Claude Sonnet + ICP file)
                              │
                              ▼
                   Structured JSON output:
                   { icp_fit, expected_revenue,
                     next_action, reasoning_trace,
                     pipeline_rank_inputs }
                              │
                              ▼
                  Webhook writes 5 properties
                       back to HubSpot
                              │
                              ▼
        Reps see ranked queue + reasoning trace
        in HubSpot's standard contact view

Cost gate: monthly cap as workflow halt + owner ping. Eval set: 50 won + 50 lost deals from the last 12 months. AUC threshold: ≥ 0.80 vs ground truth.

Next-step checklist

  • Write the ICP as a single line (industry, company size, role). One file, versioned.
  • List fit indicators (positive) and deal-breakers (negative). 4-6 each.
  • Pick HubSpot object (contact / company / deal). For B2B SaaS, default to company.
  • Create the 5 properties in HubSpot property setup. Type each to match the JSON schema.
  • Write the prompt. Schema for output. Test on 5 known-won + 5 known-lost contacts.
  • Set up nightly batch. Cap the monthly budget. Pipe owner alerts on cap-breach.
  • Eval AUC on 50 won + 50 lost deals. Threshold ≥ 0.80 before flipping live.
  • Wire pipeline_rank into the rep’s queue view. Sort DESC. Top 20-30 per rep per week.

Full case study: AI Revenue Prioritization in HubSpot CRM. Free templates: HubSpot Revenue Prioritization Template, AI Lead Scoring Pack.

Does this shape match what you're building?

If you want me to scope a similar system for you — I respond in 24 hours.

Request a scope