Circadia — Methodology
A description of how Circadia captures and analyzes free-running sleep data, written for circadian-rhythm researchers evaluating whether the app's outputs can be cited or used in secondary analysis.
For the patient-facing version of this content, see
HOW_IT_WORKS.md. For the patient-clinician handoff,
see the Doctor's Report PDF exported from the app.
Last revised: 2026-05-25 (post-particle-filter deployment).
1. What Circadia measures
Circadia is a self-tracking app for sighted Non-24-Hour Sleep-Wake Disorder (N24SWD) and adjacent free-running phenotypes. Each user logs sleep onsets and wakes (and optionally sleepless gaps). The app derives:
- Daily drift — the shift in sleep onset from one cycle to the next, in hours.
- τ (tau) — the estimated period of the user's observed
sleep-wake rhythm (a behavioral period), in hours. Computed as
24 + mean_drift. Under the limitations described in §13 (no DLMO, self-report, behaviorally derived), this approximates — but is not a direct measurement of — the intrinsic circadian period reported in lab forced-desynchrony protocols (Czeisler 1999; Duffy 2011). - σ_obs — observed variability of the daily drift.
- σ_τ — standard error on τ.
- |drift| and |drift all| — unsigned drift magnitudes (cohort-side diagnostic, see §10).
- Cumulative debt and Process S — homeostatic measures, included for completeness; not the primary contribution.
The primary research contribution is this behavioral τ estimate and its uncertainty, derived from continuous at-home logging over multi-week windows. It is intended as an ambulatory approximation of the intrinsic period measured under controlled lab conditions (Czeisler 1999; Duffy 2011), with the limitations of self-reported onset timing that that implies. See §13 for the full caveat list.
2. Drift between two sessions
For two consecutive clean sleep onsets onset_prev and onset_curr:
gap_h = (onset_curr − onset_prev) in hours
d_clock = (hour_of_day(onset_curr) − hour_of_day(onset_prev))
drift_h = d_clock wrapped to (−12, +12]
drift_h is the per-cycle shift in onset relative to a 24-hour day. A
positive drift indicates phase delay; negative indicates phase advance.
The modular wrap to (−12, +12] is the bounded representation. For users with true per-cycle drift exceeding 12 h, this representation aliases: a real +15 h shift records as −9 h. The wraparound-unfold pass in the Smart estimator (§3, step 6) detects and corrects this when the signature is strong enough.
A pair is flagged "post-sleepless" if gap_h exceeds a configurable
threshold (default 30 h; per-user adjustable). Post-sleepless pairs are
excluded from the Smart and Clean drift averages because the algorithm
cannot distinguish a true large drift from a normal-length cycle plus a
skipped sleep. They are still surfaced in the per-row log (with an
alternate-direction interpretation displayed when |drift| > 8 h) and
contribute to the Raw drift estimator and to the cohort-side
|drift all| diagnostic.
The reference function is calcDrift() in
src/lib/utils.js.
3. τ estimator (Smart)
The default τ estimator is an EWMA-weighted regression with a fading
prior, plus modular-wrap unfolding and bidirectional-drift detection.
It is implemented as estimatePeriod() in
src/lib/utils.js.
Inputs
- All sleep entries where neither the
xDriftflag (user-excluded) nor thefragmentedflag (auto-flagged short session) is set. - A configurable window: either a rolling
windowDayscutoff or an explicit[rangeStart, rangeEnd]range. - The user's own
postSleeplessThresholdH(cloud-synced setting; see §9).
Step 1 — collect transition pairs
Adjacent entries in the filtered, time-sorted list are paired. Pairs are
kept if 0 < gap_h ≤ max(postSleeplessThresholdH, 1.3 × τ_estimate).
The current τ estimate is used to relax the gap threshold for users
whose true τ is far from 24, preventing systematic loss of valid cycles.
Each surviving pair contributes drift_h with a weight
w_i = exp(−ln(2) × (t_now − t_onset_i) / halfLife)
with halfLife = 28 days. Recent data is weighted more heavily than
old data; the half-life is comparable to the timescale on which
self-reported τ has been observed to shift in N24 patients (Emens 2013).
Step 2 — prior
A weak prior centers the estimator on the published N24 population
mean. The prior contributes priorPseudoCount = 3 pseudo-observations
at drift = priorTau − 24 = 0.7 h with variance priorSigma² = 1.0
(Hayakawa 2005 N=57; Kitamura 2013 N=28).
The prior weight fades linearly with sample size:
effective_prior = max(0, priorPseudoCount − N_pairs / 3)
By ~9 clean pairs the prior is gone entirely. This protects cold-start users from wildly miscalibrated estimates without dragging established users toward the population mean if their individual τ differs.
Step 3 — weighted mean and variance
W = Σ w_i + effective_prior
mean_drift = (Σ w_i · drift_i + prior_contribution) / W
σ_obs² = (Σ w_i · (drift_i − mean_drift)² + prior_var) / W
An adaptive σ floor prevents pathologically small variance estimates in low-jitter datasets:
σ_floor = max(1.0, 0.5 × √(max(τ − 24, 0.5)))
σ_obs = max(σ_obs, σ_floor)
Step 4 — τ and its uncertainty
τ = 24 + mean_drift
N_eff = (Σ w_i)² / Σ w_i² # Kish effective sample size
σ_τ = σ_obs / √N_eff
N_eff is computed with the prior pseudo-count included.
Step 5 — two-pass refinement
The whole procedure is run twice. The first pass uses priorTau to set
the gap-keep threshold; the second pass uses the τ estimate from pass 1.
This is a stability fix for users with very long or short τ where the
default 30-hour threshold would systematically drop valid cycles.
Step 6 — wraparound unfolding
Modular drift is bounded to (−12, +12]. Users whose true per-cycle drift exceeds 12 h cannot be represented in this range without sign aliasing. A third pass detects the aliasing signature and unfolds it.
The unfolding pass fires only when both of the following hold after pass 2:
mean(|drift_i|) > 5 h
mean(|drift_i|) > 2 × |mean(drift_i)|
The first condition selects unusually high-magnitude users. The second detects sign cancellation under the mean — the signature of modular wraparound (real consistent forward shifts recording as alternating ±values).
When triggered, the algorithm:
- Picks a bootstrap seed direction from where the large-magnitude pairs cluster (positive vs negative count among |drift_i| > 6 h).
- Unwraps each pair's drift to whichever of
{d, d+24, d−24}is closest toseed_direction × median(|drift_i|). - Re-computes mean and variance on the unwrapped pairs.
- Accepts only if
σ_obs(unwrapped) < 0.7 × σ_obs(original)— i.e., the unwrap materially tightens fit. If σ does not tighten, the heuristic is rejected and the original mean is preserved.
The σ-tightening guard prevents false-positive unfolding on genuinely
chaotic data. The function returns flags wrapDetected and
unwrapApplied plus tauOriginal and tauUnwrapped for diagnostic
surfaces (see §10).
Step 7 — bidirectional-drift detection
When sign reversals across consecutive cycles exceed 40% and at least 4 pairs are available, the estimator returns:
bidirectional = true
driftMedian = median(drift_i)
tauMedian = 24 + driftMedian
The median is robust to the cancellation that the weighted mean suffers when forward and backward shifts roughly balance. Consumers (predict tab, banners, doctor's report) display the median path when this flag is set.
4. Alternative drift estimators (for comparison)
Two simpler estimators are also exposed in the UI:
- Clean — unweighted arithmetic mean of
drift_hover pairs wheregap_h ≤ postSleeplessThreshold. Equivalent to the EWMA estimator withhalfLife → ∞, no prior, no σ floor, no unfolding, no bidirectional fallback. - Raw — unweighted mean over ALL transitions, including
post-sleepless ones. Post-sleepless transitions are computed as
gap_h mod 24(forced positive). Included for transparency; biases τ upward when users have many skipped sleeps.
When evaluating cohort data, the Smart estimator is the recommended reference. The Clean estimator is appropriate for short, stable windows. Raw should not be used directly for inference but is useful as a comparator — when Raw and Smart diverge by > 3 h on the same user, that user is likely a low-cycle-rate user (most transitions filtered as post-sleepless) and Smart silently undersells their real shift magnitude. See §10 for the cohort-side diagnostic.
5. Adaptive forecast (Particle Filter V2)
The Smart estimator above returns a single best-fit τ and its
uncertainty. The Adaptive prediction engine is a separate
sequential-Monte-Carlo model that maintains a full state distribution
for each user. Deployed May 25, 2026. Reference implementation:
particleFilterPredict() in
src/lib/model-lab.js.
Architecture
The user's current sleep state is represented as a swarm of 64 particles, each carrying a complete hypothesis:
| Field | Meaning |
|---|---|
anchorH | current phase (hours-since-epoch) |
velocityH | current per-cycle drift (h/cycle, soft-clamped) |
accelerationH | second-order drift change |
durationH | particle's typical sleep duration estimate |
pressure | homeostatic sleep pressure (0–1) |
lastWakeH | last observed wake (for awake-duration math) |
uncertaintyH | this particle's self-reported confidence |
weight | probability mass relative to the swarm |
Update loop
For each historical entry in chronological order:
-
Predict. Every particle produces a per-row prediction (onset, duration, pressure). Predictions include Gaussian process noise (
processNoiseH ≈ 1.2), per-particle velocity jitter, and a pressure-driven onset adjustment. -
Score five hypotheses per particle. A softmax over per-hypothesis loss functions assigns weights to:
Hypothesis Triggered by mainnormal cycle, residual small, pressure mid-range napshort sleep, low evidence for phase recoveryhigh pressure, longer-than-predicted duration shiftresidual indicates genuine phase change skipgap from prior sleep suggests skipped cycle Softmax temperature (
hypothesisTemp ≈ 0.44) controls how sharply the model commits — low temp behaves like a hard classifier; high temp blends many possibilities softly. -
Apply likelihood. Particle weight is multiplied by the probability of the observed onset and duration under each particle's prediction (Gaussian likelihood with
observationSigmaH ≈ 2.2). -
Normalize weights across the swarm.
-
Update each particle's state from the observation. The effective learning rate is a hypothesis-weighted blend (
mainLR · w_main + recoveryLR · w_recovery + …), so confidently- recognized main sleep nights nudge state gently while recognized phase shifts let the model move faster. -
Resample when the effective particle count (Kish formula on weights) drops below
resampleThreshold ≈ 0.55 × particles.length. Higher-weight particles get cloned with small jitter; lower-weight particles are pruned. This preserves diversity without letting the swarm collapse to a single answer prematurely.
Change-point detection
When the residual for an entry exceeds changePointResidualThresholdH
(default 4.5 h), the algorithm temporarily elevates the velocity
learning rate from velocityLearningRate ≈ 0.08 to
changePointVelocityLearningRate ≈ 0.22 and widens uncertainty by
changePointUncertaintyBoostH ≈ 1.1. This is the mechanism behind
the disruption-recovery improvement (see §6).
Context inputs
Self-reported covariates feed directly into per-particle prediction via small additive deltas to predicted pressure, onset, duration, and uncertainty. Tuned production weights:
| Covariate | Pressure boost | Notes |
|---|---|---|
stress | +0.05 | |
illness | +0.07 | |
medication | +0.04 | |
social | −0.02 | |
screensBefore | +0.03 | |
blackout | −0.03 | |
mood (1–5) | −0.015 × scaled | |
cognition (1–5) | −0.015 × scaled | |
customTag (per active tag) | +0.02 | Bounded by per-user prior with contextTagPriorCount ≈ 12 — model learns which of YOUR tags actually predict |
Light timing (lightOutdoor) and tag-presence indicators also feed
into separate onset / duration / velocity channels with smaller weights.
The full parameter list is 40 named numbers, exposed in source at
PARTICLE_FILTER_ALPHA_PARAMS. All are
inspectable in the public Model Lab (#model-lab route — non-admins
can browse the catalog and re-score any candidate model against their
own log).
6. Tuning and validation
Training corpus
The production parameter set was fit via offline hyperparameter
optimization against voluntarily-shared sleep histories from Circadia
alpha users as captured in the tuning corpus on 2026-05-17
(n ≈ 18 active sharers at that snapshot, ranging from a few weeks to
~13 months of logged sleeps) plus two longer-form externally shared
datasets. These training-corpus figures are deliberately frozen to
that snapshot — they describe what the model was actually fit on, not
the current cohort (see §12 for current state). Run artifacts at
research/runs/soft-hypothesis-deep-2026-05-17T00-37-13Z.json (prior
soft-hypothesis tuning) and research/runs/particle-* (current).
Model registry: research/model_registry.json.
Held-out validation
1,600+ sleep records reserved across multiple users for held-out scoring. The model never sees these during fitting.
No-time-travel rule
At each prediction step the model has access only to data preceding the predicted entry. For multi-day forecast tests, the model is frozen at a split point and asked what it would have predicted over the next 7 or 14 days without learning from those future sleeps.
Disruption-slice testing
Average scores over a whole month can hide failure modes users
actually feel — a forecast that becomes useless right after one
skipped night. We separately score predictions on the rows
immediately following a disruption (defined as a residual >
disruptionThresholdH ≈ 4 from the prior fit).
Reported improvements vs prior adaptive (soft-hypothesis)
- 2.7× recovery within 2 h of a sudden shift: 36.7% vs 13.3%
- Next-sleep error after disrupted sleep: 18.6 h → 14.5 h (≈ 22% reduction)
- One long-history dataset (≈347 entries): frozen 7-day forecast error 9.36 → 4.09 (≈ 56% reduction)
Honest qualifiers
The model is mixed elsewhere. It beats older baselines on disruption response and on long-history datasets; it can trail the previous adaptive on calm, very-steady patterns where the simpler model has nothing to fix. The training corpus shape (a few long histories shape the model disproportionately) matters; users whose sleep is unlike anything in the pool may take more of their own data before fits converge.
7. Exclusion rules
A session is excluded from drift math (but still counted toward sleep totals) under three conditions:
- Fragmentation — a session starts within
fragmentationThresholdhours (default 6 h) of the previous wake. Treated as a continuation of the prior episode, not a new cycle. Threshold is per-user configurable; polyphasic / ME-CFS / split-sleep users typically lower to 3–5 h, while clean monophasic N24 users can raise to 8 h. - Nap auto-flag — a session shorter than
napThresholdhours (default 4 h) is auto-flagged as a crash nap. User can override per-entry; manual choice wins. - Post-sleepless —
gap_hfrom the prior onset exceeds the user'spostSleeplessThresholdH(default 30 h). Excluded from Smart and Clean averages; included in Raw (modular wrap) and in the |drift all| cohort diagnostic.
Manual exclusions via the per-entry xDrift toggle override both
fragmentation and nap auto-flag in either direction. xDriftManual
is preserved separately so threshold changes after the fact don't
clobber the user's explicit choice.
The reference function is markFragmented() in utils.js.
8. Confidence and forecasting
Forward prediction at n cycles uses a random-walk variance model:
σ_prediction(n) = σ_obs × √n
This is the standard model for accumulated jitter in a free-running oscillator and is what the Predict tab displays. It is not a calibrated frequentist interval; it is presented to users as a guide-rail, with documented caveats that real-world predictions degrade beyond ~7 cycles due to compounding tau drift and zeitgeber perturbations.
The default forecast reports the probability mass inside the user's
one-cycle σ_obs tolerance. The adaptive predictor uses the same display
contract, but compares its learned residual band against the default
one-cycle tolerance. That keeps the percentage comparable across modes and
prevents the adaptive model from always showing the same confidence decay
sequence merely because both numerator and denominator came from its own
sigma.
The Analysis tab also surfaces a Phase Position scatter (Position
or Residual mode) that lets users compare predicted vs actual onset for
each historical entry. In Adaptive mode the per-entry predictions are
the particle-filter's weighted-mean predictions made before observing
each row (out.fitted on the particle-filter output) — i.e. genuine
forward-in-time fits, not retrospective.
9. Co-variates captured per session
For research purposes, each sleep entry can carry:
Core covariates
| Field | Type | Meaning |
|---|---|---|
q | 1–5 | Self-rated sleep quality |
wakeType | natural / forced | Whether the user woke spontaneously |
stress | bool | Self-reported stress affecting this session |
illness | bool | Self-reported illness |
medication | bool | Self-reported medication change/use |
social | bool | Social obligation affected timing |
mood | 1–5 | Post-wake mood |
cognition | 1–5 | Post-wake "brain fog → sharp" rating |
lightOutdoor | comma-separated subset of {morning, midday, evening, none} | Bright outdoor light timing (multi-select May 2026; legacy single-string rows parse to a 1-element set) |
screensBefore | bool | Screen exposure in the 2 h before onset |
blackout | bool | Full darkness during sleep |
customTagIds | string[] | References into the user's custom-tag table |
adHocTags | string[] | Embedded one-off tags (max 10, capped 32 chars each) |
Zeitgeber bundle (May 2026)
Stored as JSONB on the zeitgebers column of circadia_sleep_entries.
All fields optional; missing means "not tracked" (not "false"):
| Field | Type | Meaning |
|---|---|---|
morningLight1h | bool | Bright outdoor or 10,000-lux exposure within 1 h of waking |
firstFood2h | bool | First food within 2 h of waking |
workout | none / morning / afternoon / evening | Exercise timing bucket |
workoutTime | HH:MM | Optional precise workout time |
caffeine | bool | Any caffeine intake on this day |
caffeineTime | HH:MM | Optional time of last caffeine |
lastFood3h | bool | Last food at least 3 h before sleep onset |
melatonin | bool | Took melatonin on this day |
melatoninTime | HH:MM | Time melatonin was taken |
alcohol | bool | Alcohol on this day |
alcoholTime | HH:MM | Time of last drink |
All co-variates are optional, user-reported, and intended as exploratory signals — not ground-truth zeitgeber measurements. Per-user hide controls let users opt out of any zeitgeber they don't track; hidden fields don't appear in either the log form or the Analysis correlation panels.
Per-session derived covariates (computed, not user-reported)
| Field | Meaning |
|---|---|
postSleepless | Whether the gap to the prior onset exceeded user's post-sleepless threshold |
fragmented | Whether this session started within the user's fragmentation threshold of the prior wake |
driftAmbiguous | Per-row marker that modular drift on a clean transition lands beyond ±8 h (likely wrap artifact; surfaced but not used by the Smart estimator) |
Sleepless periods
Sleepless periods (intentionally skipped sleeps, sometimes lasting
30–48 h in free-running patients) are logged in a separate
circadia_wake_periods table to preserve the actual onset/wake timeline.
Drift math treats them as gaps; sleep-debt math counts them as ordinary
sustained wakefulness.
Per-user settings (cloud-synced)
| Setting | Default | Range |
|---|---|---|
postSleeplessThresholdH | 30 | 18 – 72 |
fragThresholdH | 6 | 1 – 24 |
napThresholdH | 4 | 1 – 8 |
ambiguousThresholdH | 8 | 4 – 14 |
Settings sync across devices via the user_settings JSONB column on
circadia_user_profiles. The Smart estimator and all derived stats
honor the user's own thresholds, not a global default.
10. Cohort-side diagnostics
For users analyzing the shared cohort, several additional aggregates
are computed in AdminPanel.computeUserStats:
Per-user, recomputed at view time
| Stat | Meaning |
|---|---|
tauH, sigmaTau, sigmaObs | From the Smart estimator |
driftMean (Smart), driftClean, driftRaw | The three estimator outputs |
driftMagnitude | Unsigned mean of per-cycle drift magnitude (abs(drift_i)) over clean transitions only |
driftMagnitudeAll | Unsigned mean across raw drifts (includes post-sleepless wraps). For users whose pattern is dominated by long awake stretches, this is closer to lived per-cycle shift than driftMagnitude |
lowCycleRate | Boolean flag: Smart and Raw drift diverge by > 3 h. Indicates most transitions are filtered as post-sleepless and Smart silently undersells. Surfaced with ⚠ in the cohort table |
unwrapApplied, wrapDetected | From the Smart wraparound-unfold pass (§3 step 6) |
tauOriginal, tauUnwrapped | τ before vs after unfolding, when applied |
postSleeplessCount | Number of pairs filtered as post-sleepless |
Cohort-level views
- τ distribution — histogram (0.25 h bins), mean, median, range
- Profile aggregates — self-id, treatments, entrainment status counts
- Median |drift| (unsigned shift magnitude across cohort)
- Median σ_obs (cycle jitter across cohort)
- % with any covariate flag — coverage indicator
- Sleepless gaps — count + mean / max / total hours across cohort
Cohort-vs-individual toggle
The cohort table can be computed two ways:
- Generic defaults (default view) — every user's Smart estimator re-run with the same thresholds (30 h post-sleepless, 6 h fragmentation). Useful for apples-to-apples comparisons.
- Per-user settings — each user's stats computed with the thresholds they chose. Useful for "what the user actually sees."
Drill-down
Admin can load any sharing user's anonymized dataset into the normal Log / Chart / Predict / Clock / Calendar / Analysis views (read-only; the admin's own data is untouched). This is the recommended way to inspect individual users.
11. Data structure and sharing
A user who has opted in via "share my data" appears in the
admin/research view under an anonymousId (UUID-style). Their
identity-linked user_id is never surfaced to admin or research consumers.
Per-session co-variates and onset/wake timestamps are exposed in full,
but unlinked from any account-level identifier.
Free-text fields not exposed by the shared API:
- Sleep notes
- Wake-period notes
- Profile free-text fields (region, comorbidities_other, treatments_other)
Custom and ad-hoc tag names are a separate opt-in (per-user "share my tag content"). Users sharing data can keep tag content private.
The opt-in is reversible. Revoked sharing deletes the anonymous-share linkage immediately. The underlying sleep data remains under the user's account and is not auto-deleted.
There are three consent items in Circadia, each grantable and withdrawable independently:
- Simple sharing — for the Circadia developer and internal product improvement only. Tag-tuning, model fitting, bug investigation.
- Research-level sharing — same anonymized data as simple, plus pre-consent for future academic research collaborations under a data-use agreement (DUA), plus a structured research profile (age bucket, sex at birth, country, comorbidities, treatments).
- Publication consent — a separate, stricter additive gate that permits a user's anonymized data to be included in publicly-deposited or publication-bound datasets. Publication consent is never implied by simple or research consent; it must be granted explicitly per item. Without it, a user's data is shareable under DUA but not publishable.
Important: Research-level pre-consent does not authorize ad-hoc data transfer. As of May 2026 no academic research collaborations have been initiated. The maintainer will reach out to research-tier sharers individually before any specific collaboration begins.
The DUA / research export (the JSON bundles produced by
scripts/export-circadia-shared.mjs) draws from the simple ∪ research
union and is appropriate to share with a named researcher under DUA.
It is not a publication snapshot — public deposits require the
publication tier and a frozen, dated snapshot pipeline that does not yet
exist. Do not treat simple-tier OR research-tier data as available for
public deposit. See docs/circadia-data-dictionary.md for the full
consent-tier table and re-identification caveats.
The reference endpoints are:
GET /api/circadia/admin/cohort/overview— cohort-level aggregatesGET /api/circadia/admin/cohort/shared-entries?anonymousId=…— per-user anonymized sleep log, includes covariates + zeitgebers + custom tag IDs + (when shared) tag names
12. Current cohort state (as of 2026-05-27)
- 51 signups, 48 email-verified. (Excludes one operational admin
account used for testing imports —
auth_users.exclude_from_cohort_statsflag; any data logged there is test data, not user behavior.) - 23 users sharing data (18 with research-tier consent, 5 with simple-tier only). 15 of the 23 sharing users have additionally granted publication consent. 18 sharing users have logged at least one sleep entry (5 are zero-entry sharers); 16 of those are τ-ready (≥10 clean transition pairs).
- Largest individual logs in the shared cohort (counting distinct
sleep onsets, not raw rows — Circadia exports from Fitbit / Sleep As
Android can re-import the same file and create byte-identical duplicate
rows; we report the honest distinct count plus the duplicate-import
count where it applies):
- 2,447 sleeps across 3,042 days (~8.3 years) — the longest log in the cohort by a wide margin.
- 1,296 sleeps across 1,582 days (~4.3 years).
- 989 sleeps across 854 days (~2.3 years).
- 849 sleeps across 1,834 days (621 duplicate-import rows excluded — user appears to have imported the same Fitbit file 6× in one session).
- Several other users in the 100–400-entry range; a long tail of newer users with weeks to a few months of data.
- These long-history datasets are the strongest single contribution to the τ-estimation work.
- Cohort-wide entry totals: 7,145 raw / 6,507 distinct (638 duplicate imports, concentrated in a small number of users).
- Observation spans: 3 to 3,042 days per user (median 71 days).
- Sleepless gaps logged: 61 events cohort-wide (mean 28.9h, median 26.6h, max 62.7h) — distinguishable from typical awake stretches (per-user median typically ~16h).
- Cohort skews adult sighted N24SWD and DSWPD; some ME/CFS overlap; some self-described irregular-sleep-wake / polyphasic patterns.
- Geographic distribution: US-majority but not US-exclusive. The research profile collects country plus an optional free-text region field; these fields are not exported in the shared dataset (see re-identification caveat in the data dictionary).
These numbers move daily — they describe the cohort on the date stamped in the section header. The training corpus described in §6 is a frozen earlier snapshot (2026-05-17), deliberately not updated here. Email if you want a current snapshot for a specific analysis.
13. Known limitations
- Self-report bias. Onsets and wakes are user-entered; some users log via memory after the fact. There is no actigraphy or PSG ground-truth.
- No DLMO. Melatonin onset is not measured; τ is inferred from onset timing alone.
- Cycle counting in long gaps. When
gap_hexceeds the post-sleepless threshold, the algorithm cannot infer how many 24-hour cycles elapsed. These pairs are dropped from Smart/Clean drift math rather than imputed. Raw and |drift all| diagnostics include them with modular-wrap math. - Sleep-debt model is rough. The 14-day cumulative debt is a linear shortfall vs target — it does not implement the allostatic slow variable of McCauley 2009. Process S uses Borbély 1982 parameters with no individual calibration.
- Co-variates are correlational only. The dataset does not support causal inference about, e.g., evening screens shifting tau, because exposure is self-reported and unblinded.
- Particle filter is fit on a small corpus. Long histories shape the model disproportionately. Patterns dissimilar to anything in the training pool may take longer for fits to converge. See §6.
- Tag-correlation panels use
n ≥ 3per group as their reporting floor. These are exploratory; they should not be treated as statistically calibrated.
14. Key references
Estimator priors and α
- Borbély AA. 1982. A two process model of sleep regulation. Hum Neurobiol 1(3):195–204.
- Czeisler CA et al. 1999. Stability, precision, and near-24-hour period of the human circadian pacemaker. Science 284(5423):2177–81.
- Daan S, Beersma DGM, Borbély AA. 1984. Timing of human sleep: recovery process gated by a circadian pacemaker. Am J Physiol 246(2 Pt 2):R161–83.
- Duffy JF et al. 2011. Sex difference in the near-24-hour intrinsic period of the human circadian timing system. PNAS 108 Suppl 3:15602–8.
- Emens JS et al. 2013. Circadian misalignment in major depressive disorder. Psychiatry Research 207(1–2):37–43.
- Hayakawa T et al. 2005. Clinical analyses of sighted patients with non-24-hour sleep-wake syndrome: a study of 57 consecutively diagnosed cases. Sleep 28(8):945–52.
- Kitamura S et al. 2013. Validity of the Japanese version of the Munich ChronoType Questionnaire. Chronobiology International 30(7):918–25.
Homeostatic / debt
- McCauley P et al. 2009. A new mathematical model for the homeostatic effects of sleep loss on neurobehavioral performance. J Theor Biol 256(2):227–39.
- van Dongen HPA et al. 2003. The cumulative cost of additional wakefulness. Sleep 26(2):117–126.
Zeitgeber correlation backing (used by Analysis tab panels)
- Khalsa SB et al. 2003. A phase response curve to single bright light pulses in human subjects. J Physiol 549(Pt 3):945–52.
- Damiola F et al. 2000. Restricted feeding uncouples circadian oscillators in peripheral tissues from the central pacemaker in the suprachiasmatic nucleus. Genes Dev 14(23):2950–61.
- Stokkan KA et al. 2001. Entrainment of the circadian clock in the liver by feeding. Science 291(5503):490–3.
- Burke TM et al. 2015. Effects of caffeine on the human circadian clock in vivo and in vitro. Sci Transl Med 7(305):305ra146.
- Chang AM et al. 2015. Evening use of light-emitting eReaders negatively affects sleep, circadian timing, and next-morning alertness. PNAS 112(4):1232–7.
- Mason IC et al. 2022. Light exposure during sleep impairs cardiometabolic function. PNAS 119(12):e2113290119.
- Youngstedt SD et al. 2019. Human circadian phase-response curves for exercise. J Physiol 597(8):2253–68.
- Ebrahim IO et al. 2013. Alcohol and sleep I: effects on normal sleep. Alcohol Clin Exp Res 37(4):539–49.
Modeling family
- Kalman RE. 1960. A new approach to linear filtering and prediction problems. J Basic Eng 82(1):35–45. (Lineage for the state-space update.)
- Gordon NJ et al. 1993. Novel approach to nonlinear/non-Gaussian Bayesian state estimation. IEE Proc F 140(2):107–13. (Particle-filter founding paper.)
- Doucet A et al. 2001. Sequential Monte Carlo Methods in Practice. Springer. (General reference for the particle-filter approach used in Adaptive V2.)
15. Contact
Research-level anonymized data may be made available to researchers under a written data-use agreement and the product's current privacy terms. Simple-tier data is developer-only.
Contact: Dayah Dover, dayahdover@gmail.com
Please reach out to discuss before any data flows. Research-tier consent only pre-approves the act of sharing in principle; specific collaborations require a separate conversation.
If you publish using Circadia-derived data, please cite as:
Dover D. Circadia: free-running sleep tracking for N24SWD. Open alpha, 2026. https://circadia.owlandkestrel.com
A formal DOI deposit on Zenodo is planned.