When to Use Data Logs vs Biomarkers

A builder’s guide to choosing between raw Data Logs and processed Biomarkers in Sahha—what each is for, how they differ, and decision patterns that keep your integration simple and scalable.

If you’re building product features, dashboards, engagement journeys, or personalization, you usually want Biomarkers (clean, deduplicated, standardized metrics). If you’re building research pipelines, clinical analytics, provenance audits, or your own derived models, you may need Data Logs (raw, timestamped samples with full device/app provenance).

Key Takeaways

Choose Biomarkers when you want ready-to-use metrics that are deduplicated, standardized, and easy to store and query.
Choose Data Logs only when you specifically need raw samples (high granularity, provenance, custom analytics).
Data Logs are webhook-only and can generate a lot of data—plan ingestion, storage, filtering, and backpressure.
Biomarkers support both API and webhooks, making them better for most app/product use cases.

Quick Decision Guide

Use Biomarkers if you want:

Daily totals/averages you can show in UX (e.g., steps, active duration, sleep duration)
Consistent units and aggregation across sources
A simpler system (smaller payloads, fewer records)
Derived metrics like sleep debt or activity regularity (when available)

Use Data Logs if you need:

Raw samples exactly as recorded (timestamped, source-attributed)
Full provenance per sample (device/app/source, recording method)
Custom modeling or analytics at sub-daily resolution
Research/clinical pipelines where auditability matters

Comparison Table

Dimension	Biomarkers	Data Logs
What you get	Clean, standardized metrics	Raw, unfiltered samples
Granularity	Aggregated (daily/weekly/intraday depending on type)	Sample-level with start/end timestamps
Multi-source handling	Deduplicated and normalized	You handle duplicates/merging if you use multiple sources
Delivery	API + webhooks	Webhooks only
Best for	Product UX, engagement, personalization, analytics dashboards	Research, clinical audits, custom models, deep time-series analysis
Operational load	Low-to-medium	Medium-to-high (ingest + store + process + filter)
Common pitfall	Overfitting product logic to single-day changes	Using logs when you really wanted a daily metric

What Are Biomarkers?

Biomarkers are processed outputs that turn raw health data into metrics you can use immediately.

At a high level, Sahha:

Collects raw samples from multiple sources
Deduplicates overlapping records
Aggregates into totals/averages/point-in-time values
Delivers via API or webhooks

Biomarkers are designed to be:

Deduplicated (avoid double-counting across apps/devices)
Standardized (consistent units and formats)
Comprehensive (many metrics across sleep, activity, body, vitals)
Real-time (updates quickly after new data arrives)

Biomarker payload shape (example)

{
  "id": "b7c8d9e0-f1a2-3456-bcde-f78901234567",
  "type": "steps",
  "category": "activity",
  "value": "8432",
  "valueType": "long",
  "unit": "count",
  "aggregation": "total",
  "periodicity": "daily",
  "startDateTime": "2024-09-03T00:00:00+05:00",
  "endDateTime": "2024-09-03T23:59:59+05:00",
  "createdAtUtc": "2024-09-04T05:30:00Z"
}

What Are Data Logs?

Data Logs are raw health samples, streamed as they are recorded. They are best for use cases where you need:

high-resolution time series
per-sample provenance (device/app/source metadata)
custom analytics beyond Sahha’s standard outputs

Data Logs are:

Raw & unfiltered
Timestamped
Source-attributed
Webhook delivered

Data Log payload shape (example)

Top-level envelope:

[
  {
    "logType": "activity",
    "dataType": "steps",
    "externalId": "ext-789",
    "receivedAtUtc": "2023-06-26T12:34:56+00:00",
    "dataLogs": [
      {
        "id": "123e4567-e89b-12d3-a456-426614174003",
        "parentId": null,
        "value": 10000,
        "unit": "count",
        "source": "iPhone X",
        "recordingMethod": "RECORDING_METHOD_AUTOMATICALLY_RECORDED",
        "deviceType": "iPhone13,2",
        "startDateTime": "2023-06-25T00:00:00+00:00",
        "endDateTime": "2023-06-25T23:59:59+00:00"
      }
    ]
  }
]

When Biomarkers Are the Right Choice

1) You’re shipping user-facing UX

Biomarkers are the best default for:

dashboards (“your sleep duration”, “your steps”)
weekly summaries
progress charts
“driver” explanation UI for scores and journeys

2) You need a metric you can trust across devices

If a user has multiple sources (phone + watch + wearable), biomarkers are designed to merge sources without double counting.

3) You want simple storage and fast queries

A typical biomarker system stores:

one record per day (or periodicity window) per biomarker type
easy retrieval for profile pages and analytics

4) You want derived metrics

Some biomarkers represent derived constructs (e.g., consistency/regularity and debt-style metrics), which are far more work to build from scratch using samples.

When Data Logs Are the Right Choice

1) You need raw samples for custom models

Examples:

building your own sleep staging analysis
creating custom clinical risk indicators
detecting micro-patterns that don’t show up in daily aggregates

2) You need full provenance and auditability

Data Logs include device/app metadata and timestamps per sample, which matters for:

research protocols
clinical studies
validation and auditing workflows

3) You need sub-daily resolution

If your product needs “what happened at 2:17pm” or within-hour changes, you generally need samples (Data Logs) rather than daily totals.

4) You’re prepared for high volume + ingestion engineering

Data logs can produce lots of samples, so plan for:

webhook ingestion reliability
queuing/backpressure
time-series storage
deduplication/merging (if you consume multiple sources)
filtering by logType/dataType

Practical Builder Patterns

Pattern A: “Product UX + personalization”

Use Biomarkers (+ Scores/Insights when needed):

Store daily biomarkers in your DB
Use API queries for profile screens
Use webhooks to keep your DB current
Trigger journeys from biomarker changes or trends

Best for: consumer apps, retention systems, coaching UX, in-app dashboards.

Pattern B: “Research / advanced analytics”

Use Data Logs, optionally alongside Biomarkers:

Ingest raw samples into a time-series store (or data lake)
Keep provenance fields intact
Build your own transforms (e.g., rolling windows, custom features)
Use biomarkers as a “sanity check” and a reference layer

Best for: clinical partners, labs, internal data science, validation pipelines.

Pattern C: “Real-time checks without a full log pipeline”

If you need the latest samples in-app without standing up full raw log ingestion, consider Samples (SDK) for realtime device samples.

This isn’t a replacement for Data Logs, but can be simpler for lightweight “current state” UI.

Common Mistakes (and how to avoid them)

Mistake 1: Using Data Logs for metrics you want to show in UX

If your actual goal is “daily steps” or “weekly sleep duration,” choose biomarkers first. Logs are rarely worth the operational load for standard product KPIs.

Mistake 2: Not filtering Data Logs

Subscribe only to the log types and data types you truly need, otherwise you’ll ingest far more data than your product will use.

Mistake 3: Treating log samples as deduplicated truth

Logs are raw samples. If you combine multiple sources, you may need your own merging/dedup logic.

Mistake 4: Overreacting to single-day biomarker changes

For product decisions, prefer:

a 7–14 day baseline
trend detection (Insights)
simple hysteresis rules (“only act after 3 days”)

Implementation Suggestions for your Products

Start with Biomarkers as your default
- They solve the majority of product use cases with far less engineering.
Add Data Logs only for specific advanced needs
- Write down the exact question you can’t answer with biomarkers first.
Use webhooks for freshness, API for on-demand
- Webhooks keep your system up to date, API supports user-driven views.
Store “latest + history”
- Keep the latest biomarker per type for fast UX, plus historical windows for trends.
Be explicit about nulls and coverage
- If a value is missing, degrade gracefully and avoid asserting precision.

FAQ

Can I use both Biomarkers and Data Logs?

Yes. Many builders use biomarkers for product UX and logs for research or advanced analytics.

Are Data Logs better because they’re “more accurate”?

Not necessarily. Logs are raw. Biomarkers apply cleaning, normalization, and deduplication, which is usually what product builders want.

Can I get Data Logs via the API?

Data Logs are designed for webhook streaming. If your use case is “fetch on demand,” use Biomarkers (or consider Samples in the SDK for realtime sample fetch).

Do Biomarkers include provenance like device type and recording method?

Biomarkers are optimized for standardized outputs. If you need per-sample provenance and device/app attribution, that’s a Data Logs use case.

Sahha Data Dictionary (all available data points)
Using Webhooks (best practices for delivery)
Using Biomarkers for engagement and personalization
Insights (trend detection on top of biomarkers)

References

Data Logs (product overview and schema)
https://docs.sahha.ai/docs/products/logs
Biomarkers (product overview and pipeline)
https://docs.sahha.ai/docs/products/biomarkers
Samples (SDK realtime sample fetch)
https://docs.sahha.ai/docs/connect/sdk/samples
Webhooks (delivery and events)
https://docs.sahha.ai/docs/connect/webhooks