Pricing

Rolling Out a Pricing Experiment Without Cannibalizing Existing ARR

How to architect pricing experiments that expose only new cohorts, define grandfathering mechanics before rollout, avoid segment-mix confounds, and measure results with the correct observation window.

SaaS Science TeamJune 14, 202613 min read
pricing experimentpricing rolloutpricing testingsaas pricinga/b testingexisting customers

Rolling Out a Pricing Experiment Without Cannibalizing Existing ARR

Key Takeaways

  • Pricing experiments on existing customers generate churn risk and NRR distortion — new-cohort-only exposure is the required architecture.
  • Grandfathering mechanics must be defined before the experiment launches, not as a follow-up after it succeeds.
  • Segment mix confound is the most common reason pricing experiments produce uninterpretable results: the experimental condition attracts a different buyer segment, not a better-converting version of the same segment.
  • The correct observation window is 90 days minimum — 30-day conversion improvements can reverse at first renewal.
  • Revenue per new signup (conversion rate × ACV) is the right primary metric, not conversion rate alone.

Pricing is the highest-leverage variable in SaaS growth, and pricing experiments have a uniquely high potential to both generate revenue and destroy it simultaneously. The same experiment that increases new customer conversion rate by 15% can simultaneously trigger churn among existing customers who discover the new lower price and demand to be moved to it. The experiment that tests a premium tier may drive up-tier conversions while suppressing the number of customers who convert to the base product.

Running pricing experiments safely requires an architecture that isolates the experiment from existing revenue, a statistical framework that produces interpretable results, and a post-experiment plan that is designed before the first data point is collected.

See Your Growth Ceiling NowTry Free

The New-Cohort-Only Architecture

The foundational rule of pricing experiments in SaaS is: existing customers must never be exposed to experimental pricing. The reasons are both commercial and statistical.

Commercial: Existing customers who discover they are paying more than new customers under an experimental pricing condition will demand the lower price. Depending on the gap and the customer's relationship with the vendor, this demand can be delivered as a CS inquiry, a renewal threat, or an immediate downgrade request. In enterprise contexts, it can surface in contract renegotiation. Any of these outcomes generates revenue impact from the existing ARR base that is not part of the experiment's intended scope.

Statistical: If existing customers are included in the experimental condition, the treatment group mixes the effects of new-customer conversion (where pricing is most impactful) with the effects of existing-customer reaction (which reflects relationship dynamics, switching costs, and historical value perception — very different from conversion dynamics). The combined signal is uninterpretable.

The new-cohort-only architecture:

  1. Define the experiment start date. All customers who signed up before this date stay on current pricing unconditionally, for the duration of the experiment and beyond.
  2. Randomize new signups at account creation. Each new account is randomly assigned to the control condition (current pricing) or the experimental condition (new pricing) at the moment of signup. The assignment is stored in the account record and applied consistently to all pricing page views, upgrade flows, and billing events for that account.
  3. Serve pricing pages conditionally. The pricing page renders based on the account's experiment assignment. A new account assigned to the experimental condition sees experimental pricing on every visit. An existing account assigned to the control condition (or signed up before the experiment start date) sees current pricing.

The assignment must be account-level (not session-level or user-level) and must persist for the lifetime of the account, not just the experiment period. A customer who was assigned to the experimental condition and converted at experimental pricing must continue on experimental pricing until a deliberate pricing change is made.

Defining Grandfathering Mechanics Before Launch

The most common mistake in pricing experiments is treating grandfathering as a follow-up decision that happens after the experiment produces results. If the experiment succeeds and the new pricing becomes the default, what happens to existing customers? If this question does not have an explicit answer before the experiment launches, the answer will be made under time pressure after the results are in — and the answer made under time pressure is rarely the optimal one.

The three grandfathering scenarios to define before launch:

Scenario A: Experimental pricing is higher than current pricing (price increase test). If the experiment succeeds and higher pricing is adopted: existing customers are grandfathered at current pricing for a defined period (typically until their next annual renewal), after which they migrate to the new pricing with appropriate notice. The grandfathering period should be stated in the experiment design document, not determined post-hoc.

Scenario B: Experimental pricing is lower than current pricing (price decrease test). If the experiment succeeds and lower pricing is adopted: existing customers on the current (higher) pricing present a customer success risk — they are paying more than new customers for the same product. The options: (1) automatically migrate all existing customers to the new lower pricing at their next renewal (most customer-friendly, highest ARR impact); (2) proactively migrate immediately with a credit for the overpaid period (best for customer trust, highest operational complexity); (3) do not migrate existing customers — grandfathered at current pricing indefinitely (protects revenue but creates two-tier pricing that is difficult to maintain). Most teams choose option 1, planned in advance.

Scenario C: Experimental pricing restructures the tier model (not just a price level change). If the experiment changes the plan structure (e.g., testing a new Growth tier at a different feature composition), existing customers on plans that no longer exist must be mapped to equivalent new plans at renewal. The mapping must be explicit (not "we'll figure it out at renewal") and must be disclosed to customers before migration.

For context on how grandfathering decisions interact with longer-term pricing strategy, the framework in SaaS grandfathering pricing strategy provides the relevant principles — the decisions made during a pricing experiment's grandfathering design are a smaller-scale version of the same strategic choices.

The Segment Mix Confound

The most underappreciated threat to pricing experiment validity is segment mix confound: the experimental pricing attracts a different customer segment than the control pricing, making it impossible to attribute results to pricing rather than segment composition.

The confound mechanism: a lower-priced experimental condition attracts more price-sensitive customers. These customers may have higher conversion rates in the first 30 days but lower retention rates at 90 days, lower expansion rates, and lower LTV. The experiment appears to show a pricing improvement (higher conversion) but actually shows a segment shift (lower-quality customers converting at a rate that will not sustain).

The detection method: track not just conversion rate and initial ACV, but also customer characteristics (company size, ICP score, self-reported use case, channel attribution) across experimental conditions. If the experimental condition customers are systematically different from the control condition customers in ways that predict LTV, the pricing effect is confounded with the segment effect.

The mitigation method: stratified randomization. Before randomizing accounts to conditions, define the strata that matter for LTV (company size tier, channel, product line, geography) and ensure that randomization is balanced within each stratum. A balanced stratum means that 50% of large-company signups go to control and 50% go to experimental, 50% of inbound channel signups go to control, and so on. This prevents accidental imbalances that would confound the segment and pricing effects.

Stratified randomization is more complex to implement than pure randomization but produces cleaner results in markets where conversion volume is moderate (hundreds per month rather than thousands). In high-volume self-serve products with thousands of monthly signups, pure randomization typically achieves balance by volume alone.

Statistical Power and Sample Size

Pricing experiments are often run without a sample size calculation — the team collects data until the results look significant (or until patience runs out) and calls it done. This approach has a name: it is called optional stopping, and it reliably produces false positive results.

The sample size calculation for a pricing experiment requires:

  • Baseline conversion rate (current: what percentage of free/trial accounts convert to paid under current pricing)
  • Minimum detectable effect (MDE: what is the smallest conversion rate change that would be practically significant enough to justify a pricing change)
  • Statistical power (1 - β: typically 80–90%, meaning an 80–90% probability of detecting the effect if it exists)
  • Significance level (α: typically 5%, meaning a 5% probability of a false positive)

For a baseline conversion rate of 5% and an MDE of 1 percentage point (detecting a change from 5% to 6%), with 80% power and 5% significance, the required sample size is approximately 4,000 accounts per condition — 8,000 total. If the product acquires 200 new accounts per month, the experiment requires 40 months to reach significance. This is impractical.

The practical resolution is often to increase the MDE (accept that smaller effects are undetectable) or to run the experiment on a higher-traffic pricing page element rather than end-to-end conversion. The statistical power framework for SaaS pricing tests covers the full calculation methodology and the tradeoffs between effect size, sample size, and experiment duration.

The 90-Day Observation Window

The most common error in interpreting pricing experiment results is stopping data collection too early. A 30-day window captures conversion rate — which is useful — but misses the most important revenue signal: first-renewal behavior.

Price-sensitive customers who convert under a lower-priced experimental condition often convert at rates that look identical to the control condition at 30 days. At first renewal (day 31 for monthly billing, day 61 for the second renewal), a subset of these customers churn at higher rates than the control. The experiment that showed a 15% conversion rate improvement at 30 days may show a 2% net revenue improvement at 90 days, or no improvement at all, once first-renewal churn is accounted for.

The 90-day window is the minimum for experiments involving free-to-paid conversion with monthly billing. For annual billing (where first renewal occurs at month 12–13), the observation window must be extended accordingly — which makes annual billing pricing experiments effectively multi-year commitments.

The 90-day requirement is not academic. ProfitWell's analysis of SaaS pricing experiment results found that 40% of pricing experiments that showed positive results at 30 days showed neutral or negative results at 90 days, primarily due to first-renewal churn effects. Running experiments to 90 days before making production pricing decisions is a quality standard, not optional.

Interpreting Results and Making the Production Decision

When the experiment period concludes and the observation window closes, the primary decision metric is not conversion rate — it is revenue per new account at 90 days.

Revenue per new account at 90 days = (conversion rate) × (average ACV at conversion) × (90-day retention rate)

This metric integrates conversion, pricing level, and early retention into a single number that reflects the actual revenue value of a customer acquired under each condition. An experimental condition with higher conversion rate but lower ACV or lower retention may or may not produce higher revenue per new account at 90 days — the calculation is the only way to know.

Secondary considerations before making the production decision:

  • Support load: Does the experimental condition generate more or fewer support tickets? A lower-priced condition that attracts less sophisticated customers may increase support cost per customer.
  • Expansion trajectory: Are customers in the experimental condition expanding (adding seats, increasing usage) at the same rate as the control? If not, the 90-day revenue comparison understates the long-term value difference.
  • Channel effects: Is the experiment result consistent across acquisition channels? A pricing change that improves conversion for organic traffic but worsens it for paid traffic may need channel-specific pricing or rollout sequencing.

The connection to pricing page design is direct: the pricing experiment results determine which pricing page variant goes live. The mechanics of pricing page conversion experiments and the statistical rigor of pricing A/B test design both apply to the production rollout decision.

Frequently Asked Questions

What is the correct experimental unit for a SaaS pricing A/B test?

The experimental unit is the account (or company), not the individual user. Pricing in SaaS is a company-level decision — the plan price is the same for every user within an account. Randomizing at the user level creates situations where two users in the same company see different prices, which is commercially untenable. All randomization for pricing experiments must occur at the account level.

How do you prevent pricing experiment leakage?

Leakage mitigation: limit the geographic or channel scope of the experiment to reduce cross-contamination; do not publicize pricing changes until the experiment concludes; ensure all direct pricing page URLs carry the correct experiment condition; monitor for complaints from control customers about pricing discrepancies, which indicate leakage.

What metrics should be tracked in a pricing experiment?

Primary metric: revenue per new account at 90 days (conversion rate × average ACV × 90-day retention rate). Secondary metrics: conversion rate, average plan tier at conversion, trial-to-paid conversion rate, time-to-convert, and churn rate at first renewal. Avoid using total ARR as a primary metric — it conflates pricing effect with business growth rate.

How do you handle a pricing experiment that produces significant conversion rate improvement but lower average contract value?

Calculate revenue per new signup (conversion rate × average ACV) for both conditions. If the product has high expansion potential, project LTV by including the expansion revenue trajectory of each cohort. A condition with higher conversion at lower initial ACV may have higher LTV if expansion rates are proportionally higher — but this requires empirical measurement, not assumption.

What is the minimum observation window for a SaaS pricing experiment?

The minimum observation window is 90 days from the start of new customer acquisition under each condition. This window captures: conversion rate within the standard trial period, the first billing cycle, any immediate post-conversion churn, and (for monthly billing) the second renewal where price-sensitive churn often surfaces. Experiments measured at 30 days frequently show conversion improvements that reverse at 90 days.

How do you design the grandfathering policy for a successful pricing experiment?

The grandfathering policy must be decided before the experiment launches. For price increase experiments: grandfather existing customers at current pricing until next annual renewal. For price decrease experiments: options include automatic migration at next renewal, proactive migration with credit, or indefinite grandfathering at current pricing. The choice affects both revenue and customer trust.

Can you run a pricing experiment on enterprise customers?

Enterprise pricing experiments are significantly more complex because prices are often individually negotiated. A "pricing experiment" in enterprise contexts more accurately describes structured sales motion testing — presenting different initial offer structures to different prospective accounts and measuring close rates and ACV. Statistical significance requires large sample sizes that enterprise deal volumes rarely support within a reasonable timeframe.

See Your Growth Ceiling Now

Calculate when your SaaS growth will plateau — free, no signup required.

Calculate Your Growth Ceiling

Conclusion

Pricing experiments are one of the highest-leverage activities in SaaS revenue growth, and they are also one of the easiest to get wrong. The failure modes — exposing existing customers, stopping too early, misinterpreting segment confounds, skipping grandfathering design — all share a common cause: treating pricing experiments as informal tests rather than rigorous scientific investigations with defined protocols.

The protocols are not complex. New-cohort-only exposure prevents existing ARR damage. Pre-defined grandfathering mechanics prevent post-experiment policy chaos. Stratified randomization prevents segment confound. A 90-day observation window prevents premature conclusions. Revenue per new account as the primary metric prevents conversion-rate optimism that hides retention problems.

Products that apply these protocols run pricing experiments that produce actionable, trustworthy results. Products that skip them run pricing experiments that produce interesting-looking data and uncertain decisions.

Frequently Asked Questions

What is the correct experimental unit for a SaaS pricing A/B test?
The experimental unit is the account (or company), not the individual user. Pricing in SaaS is a company-level decision — the plan price is the same for every user within an account. Randomizing at the user level within an account creates situations where two users in the same company see different prices, which is commercially untenable and will surface in sales conversations. All randomization for pricing experiments must occur at the account level, which typically means randomizing at first account creation (signup) or first pricing page visit.
How do you prevent pricing experiment leakage?
Pricing experiment leakage occurs when a customer assigned to the control condition discovers the experimental pricing (e.g., through a friend who signed up under the experiment, through a social media post, or through a referral link that carries experiment parameters). Mitigation: (1) limit the geographic or channel scope of the experiment to reduce cross-contamination; (2) do not publicize pricing changes until the experiment concludes; (3) ensure that all direct URLs to pricing pages that include experiment parameters carry the correct condition, not the default; (4) monitor for direct complaints or questions about pricing from control customers, which indicate leakage.
What metrics should be tracked in a pricing experiment?
Primary metric: MRR per new account within 30 days of signup (for conversion experiments) or MRR per account at 90 days (to capture first-renewal effects). Secondary metrics: conversion rate from free/trial to paid (if applicable), average plan tier at conversion, trial-to-paid conversion rate, time-to-convert, and churn rate at first renewal. Avoid using total ARR as a primary metric in a pricing experiment — it conflates the pricing effect with the natural growth rate of the business during the experiment period.
How do you handle a pricing experiment that produces significant conversion rate improvement but lower average contract value?
This is the classic conversion-vs.-ACV tradeoff in pricing experiments. A lower price converts more customers but at lower revenue per customer. The correct analysis: calculate the revenue per new signup (conversion rate × average ACV) for both conditions. If the product has a network effect or high expansion potential, also project LTV by including the expansion revenue trajectory of each cohort. A condition with higher conversion at lower initial ACV may have higher LTV if expanded seats, usage, or upsell rates are proportionally higher.
What is the minimum observation window for a SaaS pricing experiment?
The minimum observation window is 90 days from the start of new customer acquisition under each condition. This window captures: (1) the conversion rate from free/trial to paid within the standard trial period (typically 14–30 days); (2) the first full billing cycle at the paid plan; (3) any churn that occurs immediately after conversion (the highest-risk churn window for price-sensitive customers); and (4) for monthly billing, the second renewal — which is where price-sensitive churn often surfaces. Experiments measured at 30 days frequently show conversion rate improvements that reverse at the 60–90 day mark as price-sensitive customers churn.
How do you design the grandfathering policy for a successful pricing experiment?
If a pricing experiment produces positive results and the experimental pricing is adopted as the new default, existing customers fall into three categories: (1) new-cohort customers who signed up under the experimental pricing — they are already on the new pricing; (2) existing customers whose current pricing is lower than the experimental pricing — they need a grandfathering decision; (3) existing customers whose current pricing is higher than the experimental pricing (if the experiment tested a price decrease) — they may be moved to the new lower pricing proactively or at renewal. The grandfathering policy should be decided before the experiment is launched, not after it succeeds.
Can you run a pricing experiment on enterprise customers?
Enterprise pricing experiments are significantly more complex than self-serve experiments because enterprise prices are often negotiated individually and the customer is aware of the pricing they were quoted. A 'pricing experiment' in enterprise contexts more accurately describes structured sales motion testing — presenting different initial offer structures to different prospective enterprise accounts and measuring close rates and ACV. This requires careful governance to ensure the same rep does not see wildly different approved offer structures, and requires a larger sample (enterprise deal volume is low) to achieve statistical significance.

Related Posts