Understanding biomarkers: the building blocks of health data

A customer-friendly guide to Sahha Biomarkers—what they are, how they’re produced, what they cover, and how to use them to build product features, personalization, and reporting.

Biomarkers are standardized, deduplicated, and aggregated health metrics derived from raw data coming from HealthKit, Health Connect, and supported wearables. They’re designed to be the easiest “building blocks” for product features—dashboards, personalization, reporting, and engagement—without needing to manage raw samples yourself.

Key Takeaways

What biomarkers are: clean, consistent metrics (with units, aggregation method, and time window) you can store and use immediately.
Why they exist: to remove the heavy lifting—deduplicating overlapping sources and normalizing raw data into a consistent format.
How to use them: dashboards, weekly summaries, personalization rules, CRM/CDP enrichment, segmentation, and analytics.
When they’re most useful: when you want product-ready values (vs raw sample streams).
How you receive them: via API (on-demand) or Webhooks (push, real-time/interval-based).

Metric Spec

Item	Value
Product	Biomarkers
Data inputs	HealthKit, Health Connect, and wearables
Output format	Consistent JSON schema with `value`, `unit`, `aggregation`, `periodicity`, and a time window
Delivery	API + Webhooks
Best used for	Product UX, personalization, engagement automation, analytics and reporting
Raw alternative	Data Logs (webhook-only raw samples)

What Biomarkers Are (and what they aren’t)

Biomarkers are best thought of as product-ready metrics, not raw sensor streams.

Biomarkers are processed outputs: aggregated totals/averages/point-in-time values.
Biomarkers are deduplicated across overlapping sources (phone + watch + wearable).
Biomarkers have consistent units and definitions across sources.

If you need raw samples (timestamped, device/app provenance, per-record metadata), you want Data Logs, not biomarkers.

How Biomarkers Work

Sahha’s biomarker pipeline is intentionally simple:

Collect — raw samples from multiple sources
Deduplicate — remove overlapping records
Aggregate — daily totals, averages, or point-in-time values
Deliver — via API or webhooks in real time

This means you can build around stable “daily objects” rather than streaming, merging, and cleaning raw events.

What Biomarkers Cover

Biomarkers span common “building block” categories. The exact list is large and grows over time—use the Data Dictionary for the full inventory.

Typical categories include:

Activity (e.g., steps, active duration, active hours, floors climbed, energy burned)
Sleep (e.g., sleep duration, sleep latency, sleep efficiency, sleep debt, sleep regularity)
Vitals (e.g., resting heart rate, HRV, VO₂ max, etc. depending on device/source coverage)
Body (e.g., weight, height, BMI, body fat % where supported)
Engagement (platform-level signals that can support personalization)
Reproductive (documented as coming soon in product docs)

Biomarker Schema (What you receive)

Every biomarker uses a consistent shape—making storage and processing straightforward.

Core fields:

id — idempotent identifier (updates replace the previous entry with the same id)
type — biomarker type (e.g., steps, sleep_duration)
category — activity, sleep, vitals, body, engagement
value — string value (parse using valueType)
valueType — long, double, string, or datetime
unit — e.g., count, minute, bpm, percentage, kcal
aggregation — total, average, minimum, maximum, none
periodicity — daily, weekly, monthly, none (and some may update intraday as documented)
startDateTime / endDateTime — the measurement window (ISO 8601)
createdAtUtc — when the entry was created

Example:

{
  "id": "b7c8d9e0-f1a2-3456-bcde-f78901234567",
  "type": "steps",
  "category": "activity",
  "value": "8432",
  "valueType": "long",
  "unit": "count",
  "aggregation": "total",
  "periodicity": "daily",
  "startDateTime": "2024-09-03T00:00:00+05:00",
  "endDateTime": "2024-09-03T23:59:59+05:00",
  "createdAtUtc": "2024-09-04T05:30:00Z"
}

Why Biomarkers Are Useful

1) They’re the easiest path to product UX

Biomarkers are ideal for:

daily dashboards (sleep duration, steps, HRV)
“last 7 days” charts
weekly summaries and progress views

2) They simplify multi-device reality

Many users have multiple data sources. Biomarkers are designed to deliver one clean value per window, rather than making you decide which device “wins” each day.

3) They’re automation-friendly

Biomarkers are stable and easy to reason about in rules:

If sleep_duration drops below baseline → trigger a recovery prompt
If active_hours trends down → suggest “movement snacks”
If resting_heart_rate rises for several days → soften intensity messaging

(If you want deeper decision signals, pair biomarkers with Insights—trends and comparisons.)

4) They’re designed for storage + querying

Most biomarker types are perfect for a time-series table keyed by:

externalId
type
startDateTime / periodicity

When to Use Biomarkers (and when not to)

Use Biomarkers when you want…

daily/weekly totals and averages for UX
consistent units and schema
simplified ingestion and storage
to avoid building your own cleaning + dedup pipeline

Use Data Logs when you need…

raw samples and full provenance (device/app, recording method)
custom analytics at sub-daily resolution
research/clinical auditability

Data Logs are webhook-only and higher-volume by design.

How to Use Biomarkers in Your Product

Pattern 1: Dashboards and “My Stats”

Fetch last 7–30 days of biomarkers for charts
Highlight the latest daily values
Provide simple explanations (“vs your baseline” or “week-over-week”)

Pattern 2: Personalization and engagement

Map biomarker types into feature flags and content routing
Use “baseline framing” (user vs their usual) to avoid shame-based comparisons
Add guardrails (e.g., act only after 3 days or a weekly trend)

Pattern 3: CRM/CDP enrichment

Store a small set of “core biomarkers” as user attributes:
- sleep: duration, regularity, debt
- activity: steps, active hours, active duration
- vitals: resting HR, HRV (if available)

Then drive:

lifecycle campaigns
onboarding paths
reactivation flows

Pattern 4: Reporting and analytics

cohort analysis by biomarker trends
retention vs movement/sleep patterns
A/B test outcomes tied to objective behavior signals

Delivery: API vs Webhooks

API (on-demand)

Use the API when you need to fetch:

profile pages
coach dashboards
backfills or batch jobs (via account token and externalId workflows)

Webhooks (push updates)

Use webhooks when you want your DB to stay current automatically.

Important delivery detail:

Scores and biomarkers support a configurable webhook interval that acts like a deduplication window (batching updates and sending only the final value within the interval).
Data Logs are always real-time (no interval batching), because each log is a new raw sample.

Implementation Suggestions for your Products

Store biomarkers with upserts
- Use id as an idempotency key (updates replace the previous entry with the same id).
Parse using valueType
- value is a string—always parse based on valueType to avoid type bugs.
Index by time window
- startDateTime/endDateTime define the window; use them for daily rollups and charts.
Start small
- Pick 8–15 biomarker types that directly power your product UX, then expand.
Design for missing coverage
- Some biomarkers require a wearable or consistent wear time. Handle null/missing gracefully (hide tiles, show “Not available”, or fallback to other metrics).
Avoid overreacting to one day
- Use 7–14 day baselines or Insights (Trends/Comparisons) before triggering heavier interventions.

Common Pitfalls

Treating estimates as precision: energy burned and some intensity durations vary by device—use “estimated” language.
Overloading users with metrics: most products perform better with a small set of “hero” biomarkers + context.
Not handling timezones: always respect the ISO 8601 timestamps in startDateTime/endDateTime.
Using raw logs for UX: if your goal is a daily chart, biomarkers are almost always the right tool.

FAQ

Do biomarkers require a wearable?

Some do, some don’t. Many core activity and sleep metrics can be derived from phone + OS health platforms, while certain vitals and sleep efficiency/latency metrics are more wearable-dependent.

How often do biomarkers update?

Biomarkers update based on their periodicity settings (daily/weekly/monthly), and some can update intraday depending on the metric and data availability.

Can I display biomarkers to end users directly?

Yes. Biomarkers are designed to be user-facing. If you want a faster UI path, Sahha Widgets can render common data displays with minimal build effort.

How do biomarkers relate to Scores and Insights?

Biomarkers: the building blocks (steps, sleep duration, HRV, etc.)
Scores: higher-level outcomes that combine multiple signals (with factors)
Insights: analytics on top (trends and comparisons for “what’s changing” and “how am I doing?”)

Notes

This guide is educational and intended for product building. It is not medical advice and should not be used to diagnose health conditions.

References

Biomarkers (product overview, pipeline, schema, list)
https://docs.sahha.ai/docs/products/biomarkers
Data Dictionary (full list of available outputs)
https://docs.sahha.ai/docs/get-started/data-dictionary
Webhooks (delivery model, interval behavior, event types)
https://docs.sahha.ai/docs/connect/webhooks
Event Reference (payload schemas, including BiomarkerCreatedIntegrationEvent)
https://docs.sahha.ai/docs/connect/webhooks/events
SDK Biomarkers (getBiomarkers examples)
https://docs.sahha.ai/docs/connect/sdk/biomarkers
Data Logs (raw sample alternative)
https://docs.sahha.ai/docs/products/logs
Widgets (pre-built UI components)
https://docs.sahha.ai/docs/products/widgets