PLG

Building a PQL Scoring Model That Adapts to Each ACV Band

A step-by-step framework for designing PQL scoring models that weight behavioral signals differently based on your customer's ACV tier — with scoring formulas, threshold calibration, and routing logic.

SaaS Science TeamJune 14, 202617 min read
pql scoringproduct-qualified leadacvplgsales routinglead scoringsaas metrics

A product-qualified lead score is only useful if it routes the right accounts to the right motion at the right moment. Most teams build a single PQL model — one set of signal weights, one threshold, one routing rule — and apply it uniformly across every deal size. That model is, by construction, optimized for the average deal. It systematically misroutes the two segments that matter most: the fast-moving SMB buyer who has already decided, and the slow-building enterprise account where a high score this week is noise, not intent.

The fix is not a more complex model. It is a model that knows which annual contract value (ACV) band a lead belongs to and weights signals, sets thresholds, and times routing accordingly. This guide gives the band definitions, the weighting formulas, the calibration process, and the routing rules to build one.

See Your Growth Ceiling NowTry Free

Why a Global PQL Score Misroutes Leads

Consider two accounts that both reach a global PQL score of 80 on a 100-point scale. The first is a 4-person startup that signed up two days ago, connected one integration, and ran the core workflow eleven times. The second is a 1,200-person enterprise that signed up two days ago, provisioned 40 seats, but has had only one user touch the product, running the core workflow twice.

A global model treats these identically. It should not. The startup is a self-serve buyer who will likely convert without a rep. The enterprise has a single champion and no internal consensus; routing a rep now means a conversation about a deal that does not yet exist inside the customer's organization.

OpenView's annual Product Benchmarks report has consistently found that PLG companies with multi-segment go-to-market motions outperform single-motion peers on net revenue retention, in large part because they match the sales touch to the buyer's actual evaluation process rather than to an averaged score (OpenView Partners, Product Benchmarks). The averaged score is the problem. The buyer's process is the signal.

This is the same structural reasoning behind segmenting a PQL definition by stage: the qualifying behavior is not a fixed line but a function of who the buyer is and where they are in their journey. ACV band is the most operationally useful axis for that segmentation because it maps directly to which sales resource — if any — should engage.

Defining ACV Bands for Scoring

Before weighting any signal, define the bands. Three bands are sufficient for most B2B SaaS; a fourth (self-serve, no sales touch ever) is worth carving out explicitly so the model can suppress routing rather than score it.

BandPredicted ACVTypical buyerEvaluation lengthDecision unitSales motion
Self-serve< $3KIndividual / very small teamHours to daysOne personNone — pure PLG
SMB$3K–$15KSmall team, single decision-makerDays to 2 weeks1–2 peopleLight sales assist
Mid-market$15K–$50KDepartment, 2–4 stakeholders2–6 weeksBuying committee formingInside sales
Enterprise> $50KMultiple departments, procurement6–16 weeksFormal committee + procurementField sales / AE

Predict the band from a blend of firmographics and early product signals. Firmographic enrichment (employee count, industry, funding) gives a prior; early product signals (seats provisioned, SSO connected, data volume imported, number of distinct active users) update it. The prediction does not need to be perfect — it needs to be directionally correct often enough that band-specific weighting beats a global model. In practice, even a coarse three-bucket prediction with 70% accuracy outperforms a uniform model.

A common mistake is to wait for the confirmed ACV — which only exists after the deal closes — and so never apply band logic to live leads. Use the prediction, and let it sharpen over the account lifetime.

Weighting Signals Differently by Band

The core insight is that the same behavioral signal carries different predictive weight depending on the band. Two signal families dominate, and they trade places in importance as ACV rises:

  • Usage depth — how intensively a small set of users engages the core workflow (sessions per user, advanced features touched, workflow completions per active user).
  • Usage breadth — how widely adoption spreads across users and job functions (distinct active users, departments represented, integrations spanning multiple systems).

At low ACV, breadth is the stronger predictor: a small company where several people use the product is one that has embedded it into a team process and will convert. At high ACV, depth is the stronger predictor early, because enterprise expansion follows a land-and-expand pattern — a champion goes deep first, and breadth comes later through procurement-driven rollout. Weighting breadth too heavily at the enterprise band penalizes exactly the accounts that are progressing normally.

A workable weighting structure, expressed as the contribution each signal family makes to the 100-point score:

Signal familySelf-serveSMBMid-marketEnterprise
Usage depth30303545
Usage breadth35353020
Intent events (pricing, limits, upgrade clicks)25252015
Account fit / firmographic strength10101520

Read down a column and the philosophy is visible. The self-serve and SMB columns lean on breadth and intent — fast buyers who explore widely and react to monetization triggers. The enterprise column leans on depth and fit — a champion building a case inside a high-quality account, where a pricing-page visit is far less meaningful than a power user running the core workflow daily.

The scoring formula

For each account, compute the band-weighted score as a normalized sum:

PQL Score (band b) =
    w_depth(b)   × NormDepth
  + w_breadth(b) × NormBreadth
  + w_intent(b)  × NormIntent
  + w_fit(b)     × NormFit

Each Norm term is the account's raw signal scaled to a 0–1 range against that band's own distribution — not a global distribution. Scaling depth against the global distribution would crush enterprise depth scores, because enterprise power users run more workflows than the average user across all bands. Normalize within band so that an account scores high relative to its peers, which is the comparison that actually predicts conversion.

The intent term deserves a note: not all intent events are equal, and the strongest are compound. A single pricing-page view is weak; three pricing views plus an upgrade-button click within 48 hours is strong. This compound-event logic is the same principle covered in depth for sales-assist trigger instrumentation — single events are noise, clustered events are signal.

Example Scoring Rubrics

Concrete rubrics make the weighting tangible. Each rubric below shows the raw thresholds that map a signal to its normalized value within band.

SMB rubric ($3K–$15K)

Signal0 pointsPartialFull normalized value
Distinct active users (breadth)123+ within 14 days
Workflow completions per user (depth)< 33–910+
Integrations connected012+
Intent (pricing views + upgrade clicks)01–23+ in 48h
Fit (employee count in band)out of bandadjacentin band

Mid-market rubric ($15K–$50K)

Signal0 pointsPartialFull normalized value
Distinct active users (breadth)1–23–56+ within 21 days
Departments represented123+
Workflow completions per active user (depth)< 55–1415+
Intent (demo request, pricing, limits)012+ distinct types
Fit (firmographic ICP match)lowmediumhigh

Enterprise rubric (> $50K)

Signal0 pointsPartialFull normalized value
Power-user depth (sessions/week, top user)< 33–78+
Advanced features adopted0–12–34+
SSO / admin / security config touchednoneviewedconfigured
Breadth (distinct active users)12–45+ within 45 days
Fit (ICP + procurement signals)lowmediumhigh

Notice how the enterprise rubric tolerates low breadth far longer (the breadth full-value window extends to 45 days, against 14 for SMB) and rewards security and admin configuration — signals that a serious evaluation, not a casual trial, is underway. These distinctions are exactly what a global model erases.

Calibrating Thresholds Per Band

A score is meaningless without a threshold that converts it into a routing decision. The threshold is where most teams undermine an otherwise sound model: they pick a round number (say, 70) and apply it everywhere. The correct threshold is derived empirically, per band, from the conversion outcomes of past accounts.

The calibration process:

  1. Assemble a labeled dataset per band. Pull historical accounts in each predicted band, with their PQL scores at a fixed point (for example, score as of day 14) and their eventual outcome (converted to paid within the band's typical sales cycle: yes/no).
  2. Plot conversion rate by score decile within band. For each band, bucket accounts into score deciles and compute the conversion rate of each decile. The relationship should be monotonic — higher score, higher conversion. If it is not, the weighting is wrong, not the threshold.
  3. Find the inflection point. Identify the score at which conversion rate jumps materially above the band's base rate. That inflection is the candidate threshold. For SMB it often sits lower (fast buyers convert from moderate scores); for enterprise it sits higher and later.
  4. Apply a precision/recall trade-off explicit to the band. SMB sales assist is cheap, so favor recall — a lower threshold that catches more buyers even at the cost of some false positives. Enterprise field sales is expensive, so favor precision — a higher threshold that only routes accounts with strong consensus signals. SaaS Capital's research on go-to-market efficiency underscores that the cost of a sales touch must be matched to deal size; over-investing reps in low-ACV deals is one of the most common efficiency leaks in scaling SaaS (SaaS Capital, B2B SaaS metrics research).
  5. Lock thresholds and set a recalibration cadence. Record the threshold per band and the date. Recalibrate quarterly and after any onboarding, pricing, or packaging change.

The output is a small table that the routing layer consumes:

BandThreshold (of 100)OptimizationExpected precisionExpected recall
SMB55Recall~45%~75%
Mid-market65Balanced~55%~60%
Enterprise75Precision~70%~45%

These precision and recall figures are illustrative starting points; the calibration analysis produces the real ones from the team's own data.

Routing Logic and the Hold-for-Scoring Window

A score crossing a threshold should not always fire a route immediately. Routing logic adds two pieces of timing control: a hold-for-scoring window and a band-specific rep assignment.

The hold-for-scoring window is a deliberate delay before routing, sized to each band's activation speed. SMB accounts activate in days, so a short hold (24–48 hours) lets the score stabilize without losing the buying window. Enterprise accounts activate over weeks, so an early threshold crossing is more likely to be a single eager user than genuine account momentum — a longer hold (5–7 days) with a requirement that the score hold above threshold across the window filters out flukes.

The rep assignment routes by band and by trigger type. Not every PQL belongs in the same queue. The principle of routing different signals to different pools is the foundation of a clean PLG-to-sales-led handoff, and it applies here directly.

BandScore ≥ thresholdHold windowRoutes toTouch type
Self-serve(suppressed)n/aNo repAutomated nurture only
SMBYes24–48hSDR pool, round-robinTemplated email + optional call
Mid-marketYes3 daysInside salesPersonalized outreach
EnterpriseYes, sustained5–7 daysNamed AE for accountChampion-focused, multi-thread
Any bandHard intent (demo request)NoneFast laneImmediate human response

The bottom row is the override: an explicit, unambiguous buying signal — a demo request, a contact-sales click — bypasses the hold window in any band. These are not behavioral inferences; they are direct asks, and they deserve a fast lane.

Score Decay: Keeping Signals Fresh

Behavioral signals are perishable. An account that hit a usage limit yesterday is hotter than one that hit it three weeks ago and went quiet. Without decay, scores ratchet up monotonically and never reflect cooling interest, which leads to sales outreach landing on accounts that have already moved on.

Apply an exponential decay to each behavioral signal based on time since the event:

weighted_signal = raw_signal × e^(−λ × days_since_event)

The decay constant λ should differ by band, mirroring the activation timelines. SMB signals decay fast because the buying window is short — a half-life of around 7 days. Enterprise signals decay slowly because the evaluation is long — a half-life of around 30 days. A signal's half-life is the number of days for its weight to fall to 50%; convert to λ with λ = ln(2) / half_life.

BandSignal half-lifeImplied λRationale
SMB~7 days0.099Short buying window, stale fast
Mid-market~14 days0.050Moderate cycle
Enterprise~30 days0.023Long evaluation, signals persist

Decay also serves a hygiene function: it naturally retires accounts that crossed a threshold once and never re-engaged, preventing the pipeline from clogging with stale high scores. ProfitWell's analysis of expansion and retention motions repeatedly finds that timing of outreach — not just targeting — separates effective from ineffective sales-assist programs (ProfitWell / Paddle research). Decay is the mechanism that encodes timing into the score itself.

Common Failure Modes

Even a well-designed model fails in predictable ways. Watch for these:

  1. Global normalization. Scaling signals against a global distribution rather than within band crushes enterprise depth and inflates self-serve breadth. Always normalize within band.
  2. A single threshold across bands. The most common error. A threshold tuned to the average over-fires for enterprise and under-fires for SMB.
  3. No decay. Scores ratchet up and route stale accounts. Outreach lands after the window has closed.
  4. Confusing fit with intent. A high-fit, low-PQL account is a nurture target, not a sales-ready lead. A high-PQL, low-fit account is a self-serve customer, not an opportunity. Keep the two scores separate and combine them only at the routing step.
  5. Routing all bands to one rep pool. Sending a $5K SMB lead to a field AE wastes an expensive resource; sending a $75K enterprise account to an SDR templated-email queue squanders the largest deal. Assign by band.
  6. Never recalibrating. Onboarding and pricing changes shift the behavior distribution. A threshold set against last quarter's distribution drifts out of tune within a release cycle.
  7. Scoring before band prediction is possible. Firing a band-specific model before any band signal exists forces a default band — usually the average — reintroducing the global-model problem for brand-new accounts. Hold scoring until at least a coarse band prediction is available.

Frequently Asked Questions

What is a PQL scoring model?

A PQL (product-qualified lead) scoring model assigns a numerical score to each user or account based on in-product behavior that predicts purchase readiness. Unlike a marketing-qualified lead score, which weights demographic and marketing-engagement signals, a PQL score weights product usage signals — feature adoption, usage depth, usage breadth, and intent events like viewing pricing or hitting a usage limit. The output is a routing decision: who gets a sales touch, when, and from which rep pool.

Why should a PQL score change based on ACV band?

Buying behavior differs structurally by deal size. A $5K-ACV SMB buyer often evaluates and decides in days, with a single decision-maker exploring a broad set of features. A $75K-ACV enterprise buyer evaluates over weeks, with multiple stakeholders going deep on a narrow set of workflows. A single global model optimized for the average will fire too late for fast SMB buyers and too early for slow enterprise buyers who have not yet built internal consensus.

How do you determine the ACV band for a lead before they have purchased?

Use a predicted band, not a confirmed one. Combine firmographic enrichment (company size, industry, funding stage) with early product signals (seats provisioned, integrations connected, data volume imported). A 5-person company importing 200 records is almost certainly SMB; a 2,000-employee company connecting SSO in week one is almost certainly enterprise. Refine the prediction as more signal arrives.

What signals belong in a PQL score versus a fit score?

Keep them separate. A fit score answers whether an account is worth selling to at all, using firmographics and ICP match. A PQL score answers whether the account is ready to buy now, using behavioral signals. Routing multiplies them: high-PQL/low-fit is a self-serve customer, high-fit/low-PQL is a nurture target.

How often should PQL thresholds be recalibrated?

Per band, at least quarterly, and immediately after any change to onboarding, pricing, or packaging. Thresholds drift because the behavior distribution drifts — a new onboarding flow that pushes more users to connect an integration inflates scores, and a threshold set against the old distribution over-fires.

What is score decay and why does it matter?

Score decay reduces a signal's weight as time passes since the event. A user who hit a usage limit yesterday is hotter than one who hit it three weeks ago. Without decay, stale signals accumulate and trigger outreach after the buying window has closed, wasting rep time and annoying prospects.

Can a PQL scoring model work without a data warehouse?

A basic model can run inside a product analytics tool or a CRM with computed fields, but ACV-band-aware scoring with decay functions and per-segment calibration is far easier to maintain in a warehouse where event data, firmographic enrichment, and CRM outcomes live together. The warehouse also enables the retrospective calibration analysis that sets thresholds correctly.

See Your Growth Ceiling Now

Calculate when your SaaS growth will plateau — free, no signup required.

Calculate Your Growth Ceiling

Conclusion

A PQL score is a routing instrument, and a routing instrument tuned to the average routes the average well and everything else poorly. ACV band is the axis that fixes this, because it maps directly to how the buyer evaluates and which sales resource — if any — should respond. The build is concrete: predict the band, weight depth and breadth in proportion to it, normalize within band, calibrate a separate threshold per band against real conversion outcomes, route through a hold window sized to activation speed, and decay every signal so the score reflects current interest rather than accumulated history.

The discipline that makes it work is empirical calibration, not model complexity. Start by separating the enterprise threshold from the SMB threshold and measuring the lift; that single change typically recovers more misrouted pipeline than any amount of additional signal engineering. From there, the natural next step is rolling individual user scores up to the account level — the subject of the companion guide on product-qualified account rollup design — and instrumenting the sales-assist triggers that turn a high score into a well-timed human conversation.

Frequently Asked Questions

What is a PQL scoring model?
A PQL (product-qualified lead) scoring model assigns a numerical score to each user or account based on in-product behavior that predicts purchase readiness. Unlike a marketing-qualified lead score, which weights demographic and engagement-with-marketing signals, a PQL score weights product usage signals — feature adoption, usage depth, usage breadth, and intent events like viewing pricing or hitting a usage limit. The output is a routing decision: who gets a sales touch, when, and from which rep pool.
Why should a PQL score change based on ACV band?
Buying behavior differs structurally by deal size. A $5K-ACV SMB buyer often evaluates and decides in days, with a single decision-maker exploring a broad set of features. A $75K-ACV enterprise buyer evaluates over weeks, with multiple stakeholders going deep on a narrow set of workflows that match a procurement requirement. A single global scoring model optimized for the average will misfire at both ends: it fires too late for fast SMB buyers and too early for slow enterprise buyers who have not yet built internal consensus.
How do you determine the ACV band for a lead before they have purchased?
Use a predicted ACV band, not a confirmed one. Combine firmographic enrichment (company size, industry, funding stage) with early product signals (seat count provisioned, integrations connected, volume of data imported). A 5-person company importing 200 records is almost certainly an SMB band; a 2,000-employee company connecting an SSO provider in week one is almost certainly an enterprise band. Refine the prediction as more signal arrives.
What signals belong in a PQL score versus a fit score?
Keep them separate. A fit score answers whether an account is worth selling to at all, using firmographics and ICP match. A PQL score answers whether the account is ready to buy now, using behavioral signals. The routing decision multiplies them: a high-PQL, low-fit account is a self-serve customer, not a sales opportunity; a high-fit, low-PQL account is a nurture target, not a sales-ready lead.
How often should PQL thresholds be recalibrated?
Recalibrate per ACV band at least quarterly, and immediately after any change to the product onboarding, pricing, or packaging. Thresholds drift because the underlying behavior distribution drifts: a new onboarding flow that pushes more users to connect an integration will inflate scores across the board, and a threshold set against the old distribution will over-fire.
What is score decay and why does it matter?
Score decay reduces the weight of a behavioral signal as time passes since the event occurred. A user who hit a usage limit yesterday is a hotter lead than one who hit it three weeks ago and went quiet. Without decay, stale signals accumulate and trigger sales outreach long after the buying window has closed, which wastes rep time and annoys prospects who have moved on.
Can a PQL scoring model work without a data warehouse?
A basic model can run inside a product analytics tool or a CRM with computed fields, but ACV-band-aware scoring with decay functions and per-segment calibration is far easier to maintain in a warehouse where event data, firmographic enrichment, and CRM outcomes live together. The warehouse also enables the retrospective calibration analysis that sets thresholds correctly.

Related Posts