Run Better Experiments

February 24, 2024

In this post, we’ll explore some useful frameworks to help you run better experiments at a startup. But first, let’s recap on what a startup actually is. As Steve Blank put it finely, “A startup is a temporary organization designed to search for a repeatable and scalable business model”. But 75% of these temporary organisations fail. How can you better your search abilities and increase your survival odds? By running better experiments. In other words, Build ⇒ Measure ⇒ Learn better.

The lean start-up approach turned the conventional wisdom of favouring planning, intuition, and ‘design up-front’ methods on its head. Now ventures often adopt the principles of failing fast and continually learning through experimentation. From Calliper’s work with +100 startups, we’ve seen first-hand that successful startups experiment aggressively with data. A lean startup is like a stack of assumptions and experimenting helps you to validate them in your search for a scalable and repeatable business model*.* Typically, for VC-backed startups, this assumption stack is:

Seed Stage

We can find a unique insight into an important problem
We can clearly define our target customer
We can build a 10x solution

Series A Stage

We can get customers to buy our product
We can find a sustainable and repeatable acquisition channel
We can generate profit

Growth Stage

We can fend off the competition
We can take a meaningful share of this market

Crucially, the stack will only balance if the previous stage’s assumptions provide a sturdy foundation. Generally, as a prioritisation rule, you should focus on the riskiest assumptions relevant to your stage. To validate these assumptions, you’ll have to do plenty of experimenting. But how should you go about this?

Experiment Process

Lay out your riskiest assumptions
Ideate experiments to test these assumptions
Prioritise experiments using ‘RICE’
Run a regular review using ‘GAINS’

The first two are self-explanatory – first, let’s dive into the third step: prioritising experiments.

Prioritising Experiments

One method that is useful for prioritising growth and product experiments is the RICE Framework. Developed by Intercom, RICE is a prioritisation model designed to improve internal decision-making processes. To use it, you score each experiment on Reach, Impact, Confidence, and Effort and then use a formula to produce an aggregate score. Here’s how to score each section:

Reach

First, estimate the number of people your experiment could reach within a set timeframe. You’ll need to define what constitutes ‘Reach’ and the timeframe over which you seek to measure it. For example, Reach could refer to the number of new customers, free-trial signups, or feature usage, and the timeframe could be one week, month, or quarter. If you choose to measure New Customers and you expect an experiment to generate 150 new customers, your Reach score is 150. Note that the base measure of Reach must be the same across experiments for them to be comparable. You could estimate New Customers for one experiment and Free-Trials for another, as long as you calculate the expected conversion from Free Trials to New Customers.

Impact

Next, estimate how significantly the experiment will affect outcomes. Impact is often based on a qualitative judgment about the extent of the change that the experiment will bring. It can be either tangible, like decreased CAC, or intangible, like improved customer sentiment. Accurately isolating one experiment’s impact amid other factors is extremely difficult. Simplify things by scoring potential impact on a basic ordinal scale. Intercom developed a five-tiered scoring system for that:

3 = massive impact

2 = high impact

1 = medium impact

.5 = low impact

.25 = minimal impact

Confidence

This component assesses how confident you are in your Reach and Impact estimates. It helps by taking into account how evidence-based the Reach and Impact scores are. Are they intuition-backed? Are they data-backed? For example, if your Impact score is data-backed but your Reach score is based on intuition, your Confidence score will help account for this. Be blunt in assessing how much hard data versus intuitive guesses contribute to your reach and impact scores. Intercom created a tiered set of discrete percentages to score confidence, so you don’t get lost in the weeds:

100% = high confidence

80% = medium confidence

50% = low confidence

The general rule of thumb is that if your confidence is less than 50%, you should sack the experiment and focus efforts elsewhere.

Effort

Effort estimates the amount of work required to implement the experiment. You want to minimise Effort, which is typically inversely correlated with priority and is the formula’s denominator. To estimate Effort, define your measurement unit and then estimate the total resources required to conduct the experiment. This unit could be money, time, or some other form of resource; it’s typically ‘person-months’. If you estimate an experiment to take five people two months, then the Effort score would be 5 people x 2 months = 10 person-months.

Once you’ve calculated a score for each component of the RICE framework, calculate the aggregate RICE score using this formula: (Reach x Impact x Confidence) / Effort. Then, you sort RICE scores in descending order to identify experiments that you expect to have the greatest overall impact relative to their effort. Now you have a shortlist of experiments, but how do you run one?

Running Experiments

In general, it’s important to adopt a scientific method when experimenting in business. Great experiments start with a strong hypothesis, the bones of which look like:

Because we saw that {data/feedback}
We expect that {change}
Will cause {impact}
We'll measure this using {metric}

An example growth experiment hypothesis using this structure:

Because we saw that “customers are cancelling their subscriptions”
We expect that “a pause subscription option”
Will cause “an increase in customer retention”
We'll measure this using “Churn”

Once you’ve created your hypothesis and started your experiment, evaluate the results regularly using Calliper’s GAINS Framework. It’s a great way to structure your evaluation:

Goal‍

What was our hypothesis? Remind yourself of the original experiment's expected change and impact It’s important to concentrate on the overall goal – it’s easy to get distracted by knock-on effects and more meta-level thinking.

Analysis

What did you do or change? Evaluate the actions that have been or are being taken in conducting the experiment. Has the ‘change’ been properly implemented? Is it behind, on track, ahead, or complete?

Insights

What did you learn? This relates to parts three and four of the hypothesis: the intended impact and measurement. Try to remain objective and be sure to document all expected and unexpected changes – this could be useful in future. Once you’ve extracted insights, be proactive. Double down on wins and cut losses quickly.

Next Steps

What are we doing next? Based on your analysis and insights, assign action items and owners.

‍

P.S. If you liked this, check out our article on ‘Why startups should be data-driven’

Run Better Experiments

Prioritising Experiments

Running Experiments

Other Content

Calliper joins Accoil: Exciting changes ahead

Master Your Metrics: Product KPIs & Dashboards

Join other fast growing SaaS startups using Calliper.