Field Note #2: HubSpot lead scoring told us who was engaged. Not who would pay.
Replacing weighted-sum lead scoring with reasoning-traced AI prioritization in HubSpot. Five custom properties, one nightly batch, reps that actually trust the score.
- Subject
- HubSpot revenue prioritization
- Industry
- B2B SaaS / DACH mid-market
- Stack
HubSpot workflows, Claude Sonnet, webhook + nightly batch
The case
HubSpot native lead scoring is a weighted sum of engagement behaviors: email opened, pricing page visited, demo requested. It tells you who is curious. It does not tell you who will pay.
Sales reps figure that out within a month and stop trusting the score. The actual prioritization moves to a Tuesday-morning meeting where the manager picks ten accounts from instinct, the team dials them, and the other 10,000 contacts sit untouched.
The fix is not a better weighted sum. It is replacing the score with an LLM that reads the contact’s HubSpot fields, scores them against an explicit ICP, and writes back a tier, a confidence, the matched signals, and a one-sentence rationale — all as HubSpot properties next to the contact.
The numbers
Live state on one contact (Lena Bergmann, VP Digital Transformation, novatech-ag.com):
| HubSpot property | Value |
|---|---|
ai_icp_fit | Strong |
ai_score | 85 |
ai_pipeline_rank | 12 (of 247 open contacts) |
ai_next_action | “Personal note from AE within 24h; reference scaling signal” |
ai_reasoning_trace | “B2B SaaS, 200 employees, VP role, repeat pricing-page visits in last 7 days, no deal-breakers.” |
Operational metrics: nightly batch over 12k contacts costs ~€180/month on Claude Sonnet, ~€18/month on Haiku. Re-score latency: ~15 min for full base. Rep adoption: scores read on 94% of contacts touched in week 4, vs 11% for the native HubSpot score in the month before.
What worked
- Reasoning trace as a HubSpot property, not a CloudWatch log. Reps trust scores they can audit at the moment of decision. If the trace lives in another system, it does not exist.
- Five properties, not one.
ai_icp_fit,ai_expected_revenue,ai_next_action,ai_reasoning_trace,ai_pipeline_rank. Each is a different question; reps reach for different ones. - Nightly batch over per-event scoring. 10x cheaper, accurate enough for a daily pipeline review. Per-event would have been ~€1,800/month for marginal benefit.
- ICP as a versioned system-prompt line. Single file, version number, prompt loads at runtime. Whenever sales updates the ICP, the model immediately uses the new one.
What failed
- Per-event re-scoring on every engagement. Cost blew through the monthly cap in 11 days. The model was no more accurate, just more expensive.
- Letting the model choose its own cost ceiling. It does not. Kill switch in the workflow that halts at €500/month and pings the owner.
- Surfacing the rank without the rationale. Reps in the first 2 weeks said “I do not trust this.” Surfacing
ai_reasoning_tracenext toai_scoreflipped adoption inside a fortnight. - A single weighted formula, no floor rule. Initial version greenlit a high-value contact who had churned out twice. Floor rule (no Tier A if any dimension < 3) prevents single-signal dominance.
The architecture
HubSpot contact event ───► Trigger (workflow or nightly batch)
│
▼
AI scoring endpoint
(Claude Sonnet + ICP file)
│
▼
Structured JSON output:
{ icp_fit, expected_revenue,
next_action, reasoning_trace,
pipeline_rank_inputs }
│
▼
Webhook writes 5 properties
back to HubSpot
│
▼
Reps see ranked queue + reasoning trace
in HubSpot's standard contact view
Cost gate: monthly cap as workflow halt + owner ping. Eval set: 50 won + 50 lost deals from the last 12 months. AUC threshold: ≥ 0.80 vs ground truth.
Next-step checklist
- Write the ICP as a single line (industry, company size, role). One file, versioned.
- List fit indicators (positive) and deal-breakers (negative). 4-6 each.
- Pick HubSpot object (contact / company / deal). For B2B SaaS, default to company.
- Create the 5 properties in HubSpot property setup. Type each to match the JSON schema.
- Write the prompt. Schema for output. Test on 5 known-won + 5 known-lost contacts.
- Set up nightly batch. Cap the monthly budget. Pipe owner alerts on cap-breach.
- Eval AUC on 50 won + 50 lost deals. Threshold ≥ 0.80 before flipping live.
- Wire pipeline_rank into the rep’s queue view. Sort DESC. Top 20-30 per rep per week.
Full case study: AI Revenue Prioritization in HubSpot CRM. Free templates: HubSpot Revenue Prioritization Template, AI Lead Scoring Pack.
Does this shape match what you're building?
If you want me to scope a similar system for you — I respond in 24 hours.
Request a scope