What's the difference between a comparison and a score?

A score tells you the user's current value (e.g. sleep score 0.65). A comparison tells you how that value relates to a reference point — your own 30-day baseline, peers in your age and gender bracket, or the global population. The score is the number; the comparison is the context that makes the number meaningful.

Why is the baseline percentile always null?

Baseline is 'you vs you' — there's no population to be at the 70th percentile of. Instead of a percentile, baseline comparisons return absolute and percentage differences from the user's own historical average. The state label (very_low to very_high) is derived from how unusual today's value is relative to the user's own variability.

How do I tell whether 'low' is bad?

Read state together with isHigherBetter. A 'low' resting heart rate is good (isHigherBetter is false). A 'low' HRV is concerning (isHigherBetter is true). Always combine the two — the state alone tells you the magnitude of deviation, not whether it's favorable.

Which lens should I show users by default?

Baseline. It's the most actionable, the most personal, and the least prone to feeling like judgment. 'You slept less than usual' lands better than 'You slept less than people in their 30s.' Demographic and global are great as on-tap context, but baseline should be the default surface for most apps.

What metrics support comparisons?

All five scores (activity, sleep, readiness, wellbeing, mental_wellbeing) plus six biomarkers today: steps, sleep_duration, heart_rate_resting, heart_rate_variability_sdnn, heart_rate_variability_rmssd, and vo2_max. The supported set continues to grow — the Sahha Comparisons docs page maintains the live list.

Comparison Insights Explained: Baseline deviations, Demographic percentile, Global percentile

Two users both slept 7 hours last night. One of them usually sleeps 8 hours; the other usually sleeps 6. The same value means two completely different things — and a daily score on its own can’t tell them apart. The first user is undersleeping; the second is having a strong night. The number is identical. The story isn’t.

Comparisons solve this. They benchmark today’s value against three reference points so the same number lands differently depending on context: against the user’s own 30-day baseline (“you vs you”), against people with similar age and gender (“you vs people like you”), and against the global population (“you vs everyone”). Each lens answers a different question, and together they turn raw values into something users actually want to read.

State	What It Means
Very high	Significantly above the reference group
High	Above the reference group
Average	Within the typical range of the reference group
Low	Below the reference group
Very low	Significantly below the reference group

The ladder describes magnitude, not value judgment. Whether low is good or bad depends entirely on the metric — low resting heart rate is favorable; low HRV is concerning. The polarity field isHigherBetter resolves this, and we’ll come back to it below.

At a glance: A comparison is computed daily for an eligible metric. Each comparison includes the user’s value, the reference group’s value, a percentile (where applicable), absolute and percentage differences, and a state label from very_low to very_high. Up to three lenses — baseline, demographic, global — are returned in a single response.

The Three Lenses

Each lens uses a different reference group, and each is suited to a different product question.

Baseline — You vs You

The personalization lens. Today’s value is compared against the user’s own historical average over the last 30 days.

This is the most actionable of the three because it’s grounded in the user’s own behavior. “You slept less than usual” is something the user can immediately situate — they know what their usual is. There’s no comparison group to feel judged against, no demographic assumption, no shaming.

Baseline is also the only lens that doesn’t return a percentile. There’s no population to be at the 70th percentile of — it’s just the user’s value vs the user’s average. Instead, baseline comparisons return:

value — the reference group’s average (i.e. the user’s own 30-day average)
difference — absolute difference between today’s value and the average
percentageDifference — relative difference
state — derived from how unusual today’s deviation is relative to the user’s own variability
properties.windowDays — the lookback window (currently 30)

Use baseline for: daily “how am I doing” cards, drift detection (“your usual is slipping”), per-user personalization in messaging.

Demographic — You vs People Like You

The context lens. Today’s value is compared against the average for users in the same demographic cohort — age range and gender, surfaced in the response.

Demographic is the right lens when users want to know what’s normal. “Is my resting heart rate normal for someone my age?” is one of the most common questions a health app fields. Demographic comparisons answer it directly. The cohort properties are returned alongside the comparison so you can render them honestly: “vs women aged 30–44”.

Demographic returns:

value — the cohort average
percentile — where the user sits within the cohort (0–100)
difference / percentageDifference — vs the cohort average
state — derived from percentile position
properties — ageMin, ageMax, gender

Use demographic for: contextual “is this normal?” framing, on-tap detail beneath a baseline comparison, personalized health context that respects the user’s age and gender.

Global — You vs Everyone

The novelty lens. Today’s value is compared against the global Sahha population average across all ages, genders, and demographics.

Global is the most conversational of the three — it’s the lens that makes for shareable moments and curiosity-driving copy. “Your sleep was longer than 73% of the population last night.” Use it sparingly. Without a demographic anchor, global comparisons can mislead (a healthy resting heart rate for a 60-year-old looks “high” against a global mean that’s dominated by younger users) and they can feel performative for users who’d rather not be ranked at all.

Global returns the same structure as demographic minus the demographic properties.

Use global for: novelty moments, social or shareable content, marketing copy. Avoid for sensitive metrics like mental_wellbeing where ranking against strangers can land poorly.

The State Ladder

Each comparison entry comes with a state label from very_low to very_high. The ladder is consistent across all lenses, which means you can write copy templates against state without branching on lens type.

For demographic and global lenses, state is derived from percentile position. Roughly: extreme percentiles map to very_low and very_high, the middle of the distribution maps to average, and the in-between bands map to low and high.

For baseline, there’s no percentile to bucket. Instead, state reflects how unusual today’s value is relative to the user’s own typical variability over the 30-day window. A 5% deviation from baseline is unremarkable for steps (which vary a lot day-to-day) but significant for resting heart rate (which is far more stable). The state label captures this metric-aware sense of “unusualness.”

In product copy, the state ladder is much friendlier than raw percentile. “Higher than usual” lands instantly; “73rd percentile” requires a beat of interpretation. Lead with state. Surface percentile as a secondary detail for users who want it.

Polarity: Reading State Together with `isHigherBetter`

high isn’t always good. low isn’t always bad. Each comparison includes isHigherBetter, which captures whether higher values are favorable for that metric.

`isHigherBetter`	Examples	`high` / `very_high` means…	`low` / `very_low` means…
`true`	sleep score, sleep duration, steps, HRV (SDNN, RMSSD), VO2 max	favorable	unfavorable
`false`	resting heart rate	unfavorable	favorable

This matters most for the cardiovascular biomarkers. A user with a very_low resting heart rate compared to global isn’t in trouble — they’re likely highly fit. A user with a very_low HRV compared to baseline is the concerning case. Same state label, opposite meaning.

In code: branch on the combination of state and isHigherBetter to decide whether a comparison warrants celebration, a check-in, or no action at all.

function isFavorable(comparisonEntry, isHigherBetter) {
  if (comparisonEntry.state === "average") return null;
  const isHigh = comparisonEntry.state === "high" || comparisonEntry.state === "very_high";
  return isHigh === isHigherBetter;
}

That same function works across all three lenses — state and isHigherBetter are all you need.

The Output

Every comparison is scoped to a single profile and a single metric. The response packages all three lenses into one data array, so a single API call gives you all the context you need for that metric:

{
  "type": "comparison",
  "category": "score",
  "name": "readiness",
  "value": 0.75,
  "unit": "index",
  "isHigherBetter": true,
  "periodicity": "daily",
  "startDateTime": "2026-04-29T00:00:00+10:00",
  "endDateTime": "2026-04-29T23:59:59+10:00",
  "data": [
    {
      "type": "baseline",
      "value": 0.84,
      "percentile": null,
      "difference": -0.09,
      "percentageDifference": -0.107,
      "state": "low",
      "properties": { "windowDays": 30 }
    },
    {
      "type": "demographic",
      "value": 0.71,
      "percentile": 58,
      "difference": 0.04,
      "percentageDifference": 0.056,
      "state": "average",
      "properties": { "ageMin": 30, "ageMax": 44, "gender": "female" }
    },
    {
      "type": "global",
      "value": 0.78,
      "percentile": 42,
      "difference": -0.03,
      "percentageDifference": -0.038,
      "state": "average"
    }
  ],
  "createdAtUtc": "2026-04-29T09:30:00Z"
}

The response above is a clear illustration of why all three lenses can be useful together. The user’s readiness today (0.75) is low against their own 30-day baseline (they’re slipping vs themselves), average against demographic peers (they’re a normal Tuesday for women 30–44), and average against the global population. The most actionable signal here is baseline — the demographic and global lenses are essentially saying “still in the normal range,” but the user’s own pattern shows a meaningful dip.

Three details worth noticing in the output:

value at the top is the user’s value. value inside each data entry is the reference group’s average. Don’t confuse them.
percentile is null for baseline, present for demographic and global.
properties carries the metadata for transparency — the demographic cohort or the baseline window. Surface this in UI when relevant (“vs your last 30 days” / “vs women aged 30–44”).

What’s Eligible

Comparisons today cover two categories:

Scores (5) — activity, sleep, readiness, wellbeing, mental_wellbeing. The same five scores trends covers, with the same daily availability.
Biomarkers (6) — steps, sleep_duration, heart_rate_resting, heart_rate_variability_sdnn, heart_rate_variability_rmssd, vo2_max. The cardiovascular biomarkers are particularly suited to comparison because they vary meaningfully across age and fitness levels — context is genuinely informative.

Note: comparisons cover scores and biomarkers, not factors. Trends covers factors; comparisons don’t (today). If you need cohort context for a specific factor, you’ll need to compute it yourself or wait for the metric set to expand.

The supported list will continue to grow. Refer to the Comparisons docs page for the live list and the latest schema details.

When to Use Which Lens

The lenses don’t compete — they layer. But for a given product surface, one of them is usually the right primary signal.

Use case	Best lens	Why
Daily personalized card (“how am I doing today”)	Baseline	Personal, actionable, no judgment
Drift detection / re-engagement triggers	Baseline	Catches the user slipping vs themselves before they’re alarming on cohort metrics
”Is this normal for my age?” UI	Demographic	Anchored in honest cohort data with `properties` for transparency
Health education and context	Demographic	Cohort framing is more informative than global for medical-style questions
Novelty moments, share cards, marketing copy	Global	Naturally conversational; high engagement when used sparingly
Sensitive metrics (mental_wellbeing)	Baseline only	Avoid demographic and global comparisons here — ranking can land badly

A good default for most apps: lead with baseline, offer demographic on tap, hold global for moments where the curiosity payoff justifies the comparison framing.

Daily Cadence

Comparisons are computed daily. That’s both a strength and a limit.

The strength: you get fresh context every morning, on the same cadence the user is opening the app to check their numbers. Daily is the natural cadence for “how am I doing today” UX.

The limit: a single day’s value can be noisy. The user who slept 5 hours last night because of a flight will look “very_low” against baseline — that’s true today, but it’s not a pattern. Don’t fire production logic on a single day’s very_low without a guardrail. Either require persistence (the same state for several days running), or pair with a trend (daily comparison + weekly trend reduces false positives substantially).

This is part of why trends and comparisons complement each other rather than overlap. A trend filters noise to give you direction; a comparison gives you context for today. The combination — “decreasing trend AND very_low baseline” — is far more reliable than either alone.

Missing Data

Comparisons depend on the user having data and on Sahha having reference data for the lens.

No baseline yet. A new user without 30 days of history won’t get a baseline comparison. The value: null case can occur — handle it as “still building your typical pattern” rather than “no data.”
Demographic gaps. Users without demographic information on the profile won’t get a demographic comparison. The lens simply isn’t returned for that user — don’t assume an empty data array means no signal; check which lenses are present.
Sparse days. A day with no contributing data (no biomarker reading, no score) won’t generate a comparison for that metric on that day.

In product logic, treat each lens as independently optional. Render the ones that are present; gracefully suppress the ones that aren’t.

Production tip: Use the absence of a baseline lens as a signal in itself — it usually means the user is new and still building their personal baseline. A “you’ll start to see this in a few weeks” message in the UI sets expectations honestly and reduces the perception that the app is “missing data.”

Limits Today

Cadence: daily — comparisons are computed once per day, on the same schedule.
Baseline window: fixed at 30 days. Custom windows aren’t currently supported.
Lens set: three (baseline, demographic, global). Goal-based and cohort-based lenses are not on the immediate roadmap.
No customization: demographic cohort definitions and the state thresholds are managed by Sahha.

Comparison Insights Explained: Baseline deviations, Demographic percentile, Global percentile

The Three Lenses

Baseline — You vs You

Demographic — You vs People Like You

Global — You vs Everyone

The State Ladder

Polarity: Reading State Together with `isHigherBetter`

The Output

What’s Eligible

When to Use Which Lens

Daily Cadence

Missing Data

Limits Today

Further Reading

Share

On This Page

Comparison Insights Explained: Baseline deviations, Demographic percentile, Global percentile

The Three Lenses

Baseline — You vs You

Demographic — You vs People Like You

Global — You vs Everyone

The State Ladder

Polarity: Reading State Together with isHigherBetter

The Output

What’s Eligible

When to Use Which Lens

Daily Cadence

Missing Data

Limits Today

Further Reading

Share

On This Page

Polarity: Reading State Together with `isHigherBetter`