Building a PQL Scoring Model That Adapts to Each ACV Band
A step-by-step framework for designing PQL scoring models that weight behavioral signals differently based on your customer's ACV tier — with scoring formulas, threshold calibration, and routing logic.
A product-qualified lead score is only useful if it routes the right accounts to the right motion at the right moment. Most teams build a single PQL model — one set of signal weights, one threshold, one routing rule — and apply it uniformly across every deal size. That model is, by construction, optimized for the average deal. It systematically misroutes the two segments that matter most: the fast-moving SMB buyer who has already decided, and the slow-building enterprise account where a high score this week is noise, not intent.
The fix is not a more complex model. It is a model that knows which annual contract value (ACV) band a lead belongs to and weights signals, sets thresholds, and times routing accordingly. This guide gives the band definitions, the weighting formulas, the calibration process, and the routing rules to build one.
Why a Global PQL Score Misroutes Leads
Consider two accounts that both reach a global PQL score of 80 on a 100-point scale. The first is a 4-person startup that signed up two days ago, connected one integration, and ran the core workflow eleven times. The second is a 1,200-person enterprise that signed up two days ago, provisioned 40 seats, but has had only one user touch the product, running the core workflow twice.
A global model treats these identically. It should not. The startup is a self-serve buyer who will likely convert without a rep. The enterprise has a single champion and no internal consensus; routing a rep now means a conversation about a deal that does not yet exist inside the customer's organization.
OpenView's annual Product Benchmarks report has consistently found that PLG companies with multi-segment go-to-market motions outperform single-motion peers on net revenue retention, in large part because they match the sales touch to the buyer's actual evaluation process rather than to an averaged score (OpenView Partners, Product Benchmarks). The averaged score is the problem. The buyer's process is the signal.
This is the same structural reasoning behind segmenting a PQL definition by stage: the qualifying behavior is not a fixed line but a function of who the buyer is and where they are in their journey. ACV band is the most operationally useful axis for that segmentation because it maps directly to which sales resource — if any — should engage.
Defining ACV Bands for Scoring
Before weighting any signal, define the bands. Three bands are sufficient for most B2B SaaS; a fourth (self-serve, no sales touch ever) is worth carving out explicitly so the model can suppress routing rather than score it.
| Band | Predicted ACV | Typical buyer | Evaluation length | Decision unit | Sales motion |
|---|---|---|---|---|---|
| Self-serve | < $3K | Individual / very small team | Hours to days | One person | None — pure PLG |
| SMB | $3K–$15K | Small team, single decision-maker | Days to 2 weeks | 1–2 people | Light sales assist |
| Mid-market | $15K–$50K | Department, 2–4 stakeholders | 2–6 weeks | Buying committee forming | Inside sales |
| Enterprise | > $50K | Multiple departments, procurement | 6–16 weeks | Formal committee + procurement | Field sales / AE |
Predict the band from a blend of firmographics and early product signals. Firmographic enrichment (employee count, industry, funding) gives a prior; early product signals (seats provisioned, SSO connected, data volume imported, number of distinct active users) update it. The prediction does not need to be perfect — it needs to be directionally correct often enough that band-specific weighting beats a global model. In practice, even a coarse three-bucket prediction with 70% accuracy outperforms a uniform model.
A common mistake is to wait for the confirmed ACV — which only exists after the deal closes — and so never apply band logic to live leads. Use the prediction, and let it sharpen over the account lifetime.
Weighting Signals Differently by Band
The core insight is that the same behavioral signal carries different predictive weight depending on the band. Two signal families dominate, and they trade places in importance as ACV rises:
- Usage depth — how intensively a small set of users engages the core workflow (sessions per user, advanced features touched, workflow completions per active user).
- Usage breadth — how widely adoption spreads across users and job functions (distinct active users, departments represented, integrations spanning multiple systems).
At low ACV, breadth is the stronger predictor: a small company where several people use the product is one that has embedded it into a team process and will convert. At high ACV, depth is the stronger predictor early, because enterprise expansion follows a land-and-expand pattern — a champion goes deep first, and breadth comes later through procurement-driven rollout. Weighting breadth too heavily at the enterprise band penalizes exactly the accounts that are progressing normally.
A workable weighting structure, expressed as the contribution each signal family makes to the 100-point score:
| Signal family | Self-serve | SMB | Mid-market | Enterprise |
|---|---|---|---|---|
| Usage depth | 30 | 30 | 35 | 45 |
| Usage breadth | 35 | 35 | 30 | 20 |
| Intent events (pricing, limits, upgrade clicks) | 25 | 25 | 20 | 15 |
| Account fit / firmographic strength | 10 | 10 | 15 | 20 |
Read down a column and the philosophy is visible. The self-serve and SMB columns lean on breadth and intent — fast buyers who explore widely and react to monetization triggers. The enterprise column leans on depth and fit — a champion building a case inside a high-quality account, where a pricing-page visit is far less meaningful than a power user running the core workflow daily.
The scoring formula
For each account, compute the band-weighted score as a normalized sum:
PQL Score (band b) =
w_depth(b) × NormDepth
+ w_breadth(b) × NormBreadth
+ w_intent(b) × NormIntent
+ w_fit(b) × NormFit
Each Norm term is the account's raw signal scaled to a 0–1 range against that band's own distribution — not a global distribution. Scaling depth against the global distribution would crush enterprise depth scores, because enterprise power users run more workflows than the average user across all bands. Normalize within band so that an account scores high relative to its peers, which is the comparison that actually predicts conversion.
The intent term deserves a note: not all intent events are equal, and the strongest are compound. A single pricing-page view is weak; three pricing views plus an upgrade-button click within 48 hours is strong. This compound-event logic is the same principle covered in depth for sales-assist trigger instrumentation — single events are noise, clustered events are signal.
Example Scoring Rubrics
Concrete rubrics make the weighting tangible. Each rubric below shows the raw thresholds that map a signal to its normalized value within band.
SMB rubric ($3K–$15K)
| Signal | 0 points | Partial | Full normalized value |
|---|---|---|---|
| Distinct active users (breadth) | 1 | 2 | 3+ within 14 days |
| Workflow completions per user (depth) | < 3 | 3–9 | 10+ |
| Integrations connected | 0 | 1 | 2+ |
| Intent (pricing views + upgrade clicks) | 0 | 1–2 | 3+ in 48h |
| Fit (employee count in band) | out of band | adjacent | in band |
Mid-market rubric ($15K–$50K)
| Signal | 0 points | Partial | Full normalized value |
|---|---|---|---|
| Distinct active users (breadth) | 1–2 | 3–5 | 6+ within 21 days |
| Departments represented | 1 | 2 | 3+ |
| Workflow completions per active user (depth) | < 5 | 5–14 | 15+ |
| Intent (demo request, pricing, limits) | 0 | 1 | 2+ distinct types |
| Fit (firmographic ICP match) | low | medium | high |
Enterprise rubric (> $50K)
| Signal | 0 points | Partial | Full normalized value |
|---|---|---|---|
| Power-user depth (sessions/week, top user) | < 3 | 3–7 | 8+ |
| Advanced features adopted | 0–1 | 2–3 | 4+ |
| SSO / admin / security config touched | none | viewed | configured |
| Breadth (distinct active users) | 1 | 2–4 | 5+ within 45 days |
| Fit (ICP + procurement signals) | low | medium | high |
Notice how the enterprise rubric tolerates low breadth far longer (the breadth full-value window extends to 45 days, against 14 for SMB) and rewards security and admin configuration — signals that a serious evaluation, not a casual trial, is underway. These distinctions are exactly what a global model erases.
Calibrating Thresholds Per Band
A score is meaningless without a threshold that converts it into a routing decision. The threshold is where most teams undermine an otherwise sound model: they pick a round number (say, 70) and apply it everywhere. The correct threshold is derived empirically, per band, from the conversion outcomes of past accounts.
The calibration process:
- Assemble a labeled dataset per band. Pull historical accounts in each predicted band, with their PQL scores at a fixed point (for example, score as of day 14) and their eventual outcome (converted to paid within the band's typical sales cycle: yes/no).
- Plot conversion rate by score decile within band. For each band, bucket accounts into score deciles and compute the conversion rate of each decile. The relationship should be monotonic — higher score, higher conversion. If it is not, the weighting is wrong, not the threshold.
- Find the inflection point. Identify the score at which conversion rate jumps materially above the band's base rate. That inflection is the candidate threshold. For SMB it often sits lower (fast buyers convert from moderate scores); for enterprise it sits higher and later.
- Apply a precision/recall trade-off explicit to the band. SMB sales assist is cheap, so favor recall — a lower threshold that catches more buyers even at the cost of some false positives. Enterprise field sales is expensive, so favor precision — a higher threshold that only routes accounts with strong consensus signals. SaaS Capital's research on go-to-market efficiency underscores that the cost of a sales touch must be matched to deal size; over-investing reps in low-ACV deals is one of the most common efficiency leaks in scaling SaaS (SaaS Capital, B2B SaaS metrics research).
- Lock thresholds and set a recalibration cadence. Record the threshold per band and the date. Recalibrate quarterly and after any onboarding, pricing, or packaging change.
The output is a small table that the routing layer consumes:
| Band | Threshold (of 100) | Optimization | Expected precision | Expected recall |
|---|---|---|---|---|
| SMB | 55 | Recall | ~45% | ~75% |
| Mid-market | 65 | Balanced | ~55% | ~60% |
| Enterprise | 75 | Precision | ~70% | ~45% |
These precision and recall figures are illustrative starting points; the calibration analysis produces the real ones from the team's own data.
Routing Logic and the Hold-for-Scoring Window
A score crossing a threshold should not always fire a route immediately. Routing logic adds two pieces of timing control: a hold-for-scoring window and a band-specific rep assignment.
The hold-for-scoring window is a deliberate delay before routing, sized to each band's activation speed. SMB accounts activate in days, so a short hold (24–48 hours) lets the score stabilize without losing the buying window. Enterprise accounts activate over weeks, so an early threshold crossing is more likely to be a single eager user than genuine account momentum — a longer hold (5–7 days) with a requirement that the score hold above threshold across the window filters out flukes.
The rep assignment routes by band and by trigger type. Not every PQL belongs in the same queue. The principle of routing different signals to different pools is the foundation of a clean PLG-to-sales-led handoff, and it applies here directly.
| Band | Score ≥ threshold | Hold window | Routes to | Touch type |
|---|---|---|---|---|
| Self-serve | (suppressed) | n/a | No rep | Automated nurture only |
| SMB | Yes | 24–48h | SDR pool, round-robin | Templated email + optional call |
| Mid-market | Yes | 3 days | Inside sales | Personalized outreach |
| Enterprise | Yes, sustained | 5–7 days | Named AE for account | Champion-focused, multi-thread |
| Any band | Hard intent (demo request) | None | Fast lane | Immediate human response |
The bottom row is the override: an explicit, unambiguous buying signal — a demo request, a contact-sales click — bypasses the hold window in any band. These are not behavioral inferences; they are direct asks, and they deserve a fast lane.
Score Decay: Keeping Signals Fresh
Behavioral signals are perishable. An account that hit a usage limit yesterday is hotter than one that hit it three weeks ago and went quiet. Without decay, scores ratchet up monotonically and never reflect cooling interest, which leads to sales outreach landing on accounts that have already moved on.
Apply an exponential decay to each behavioral signal based on time since the event:
weighted_signal = raw_signal × e^(−λ × days_since_event)
The decay constant λ should differ by band, mirroring the activation timelines. SMB signals decay fast because the buying window is short — a half-life of around 7 days. Enterprise signals decay slowly because the evaluation is long — a half-life of around 30 days. A signal's half-life is the number of days for its weight to fall to 50%; convert to λ with λ = ln(2) / half_life.
| Band | Signal half-life | Implied λ | Rationale |
|---|---|---|---|
| SMB | ~7 days | 0.099 | Short buying window, stale fast |
| Mid-market | ~14 days | 0.050 | Moderate cycle |
| Enterprise | ~30 days | 0.023 | Long evaluation, signals persist |
Decay also serves a hygiene function: it naturally retires accounts that crossed a threshold once and never re-engaged, preventing the pipeline from clogging with stale high scores. ProfitWell's analysis of expansion and retention motions repeatedly finds that timing of outreach — not just targeting — separates effective from ineffective sales-assist programs (ProfitWell / Paddle research). Decay is the mechanism that encodes timing into the score itself.
Common Failure Modes
Even a well-designed model fails in predictable ways. Watch for these:
- Global normalization. Scaling signals against a global distribution rather than within band crushes enterprise depth and inflates self-serve breadth. Always normalize within band.
- A single threshold across bands. The most common error. A threshold tuned to the average over-fires for enterprise and under-fires for SMB.
- No decay. Scores ratchet up and route stale accounts. Outreach lands after the window has closed.
- Confusing fit with intent. A high-fit, low-PQL account is a nurture target, not a sales-ready lead. A high-PQL, low-fit account is a self-serve customer, not an opportunity. Keep the two scores separate and combine them only at the routing step.
- Routing all bands to one rep pool. Sending a $5K SMB lead to a field AE wastes an expensive resource; sending a $75K enterprise account to an SDR templated-email queue squanders the largest deal. Assign by band.
- Never recalibrating. Onboarding and pricing changes shift the behavior distribution. A threshold set against last quarter's distribution drifts out of tune within a release cycle.
- Scoring before band prediction is possible. Firing a band-specific model before any band signal exists forces a default band — usually the average — reintroducing the global-model problem for brand-new accounts. Hold scoring until at least a coarse band prediction is available.
Frequently Asked Questions
What is a PQL scoring model?
A PQL (product-qualified lead) scoring model assigns a numerical score to each user or account based on in-product behavior that predicts purchase readiness. Unlike a marketing-qualified lead score, which weights demographic and marketing-engagement signals, a PQL score weights product usage signals — feature adoption, usage depth, usage breadth, and intent events like viewing pricing or hitting a usage limit. The output is a routing decision: who gets a sales touch, when, and from which rep pool.
Why should a PQL score change based on ACV band?
Buying behavior differs structurally by deal size. A $5K-ACV SMB buyer often evaluates and decides in days, with a single decision-maker exploring a broad set of features. A $75K-ACV enterprise buyer evaluates over weeks, with multiple stakeholders going deep on a narrow set of workflows. A single global model optimized for the average will fire too late for fast SMB buyers and too early for slow enterprise buyers who have not yet built internal consensus.
How do you determine the ACV band for a lead before they have purchased?
Use a predicted band, not a confirmed one. Combine firmographic enrichment (company size, industry, funding stage) with early product signals (seats provisioned, integrations connected, data volume imported). A 5-person company importing 200 records is almost certainly SMB; a 2,000-employee company connecting SSO in week one is almost certainly enterprise. Refine the prediction as more signal arrives.
What signals belong in a PQL score versus a fit score?
Keep them separate. A fit score answers whether an account is worth selling to at all, using firmographics and ICP match. A PQL score answers whether the account is ready to buy now, using behavioral signals. Routing multiplies them: high-PQL/low-fit is a self-serve customer, high-fit/low-PQL is a nurture target.
How often should PQL thresholds be recalibrated?
Per band, at least quarterly, and immediately after any change to onboarding, pricing, or packaging. Thresholds drift because the behavior distribution drifts — a new onboarding flow that pushes more users to connect an integration inflates scores, and a threshold set against the old distribution over-fires.
What is score decay and why does it matter?
Score decay reduces a signal's weight as time passes since the event. A user who hit a usage limit yesterday is hotter than one who hit it three weeks ago. Without decay, stale signals accumulate and trigger outreach after the buying window has closed, wasting rep time and annoying prospects.
Can a PQL scoring model work without a data warehouse?
A basic model can run inside a product analytics tool or a CRM with computed fields, but ACV-band-aware scoring with decay functions and per-segment calibration is far easier to maintain in a warehouse where event data, firmographic enrichment, and CRM outcomes live together. The warehouse also enables the retrospective calibration analysis that sets thresholds correctly.
See Your Growth Ceiling Now
Calculate when your SaaS growth will plateau — free, no signup required.
Conclusion
A PQL score is a routing instrument, and a routing instrument tuned to the average routes the average well and everything else poorly. ACV band is the axis that fixes this, because it maps directly to how the buyer evaluates and which sales resource — if any — should respond. The build is concrete: predict the band, weight depth and breadth in proportion to it, normalize within band, calibrate a separate threshold per band against real conversion outcomes, route through a hold window sized to activation speed, and decay every signal so the score reflects current interest rather than accumulated history.
The discipline that makes it work is empirical calibration, not model complexity. Start by separating the enterprise threshold from the SMB threshold and measuring the lift; that single change typically recovers more misrouted pipeline than any amount of additional signal engineering. From there, the natural next step is rolling individual user scores up to the account level — the subject of the companion guide on product-qualified account rollup design — and instrumenting the sales-assist triggers that turn a high score into a well-timed human conversation.
Frequently Asked Questions
What is a PQL scoring model?
Why should a PQL score change based on ACV band?
How do you determine the ACV band for a lead before they have purchased?
What signals belong in a PQL score versus a fit score?
How often should PQL thresholds be recalibrated?
What is score decay and why does it matter?
Can a PQL scoring model work without a data warehouse?
Related Posts
Feature Gating vs Usage Gating: Choosing the Right Free-Tier Wall
A decision framework for choosing between feature gating (access to capabilities) and usage gating (volume limits on unlimited capabilities) when designing the free tier wall for a PLG product.
16 min readWhere to Place the Paywall: Running In-Product Monetization Experiments
A rigorous framework for designing and running paywall placement experiments inside a product — covering friction calibration, value-gap identification, experiment design, and conversion measurement.
14 min readPrioritizing PLG Experiments by Conversion Leverage, Not Gut Feel
A quantitative framework for prioritizing the PLG experiment backlog by conversion leverage — calculating expected impact on trial-to-paid conversion, retention, and expansion, then sequencing experiments by leverage, not intuition.
18 min read