How does Circadia know a new forecast is actually better?

Walk-forward validation — "no time travel." It predicts only from past data, is graded against what really happened, and a change is kept only if it survives testing on data it was never tuned on.

What is "no time travel"?

Replaying history in order so the model can never peek at the answer before predicting — the opposite of making a model look smart by letting it see the future.

What does Circadia do with data I share — is it an "AI pile"?

No. Shared data feeds the forecast's machine-learning loop — predict, compare against what actually happened, adjust the model's parameters, validate on data it wasn't allowed to peek at — to improve the general engine. It's classical machine learning (tuning a forecasting model, not a chatbot), not a faceless "AI pile," and shared datasets are anonymous in the research tools.

No. The app works fully if you never share anything; sharing just gives the team another real case to test against.

Is my shared data anonymous?

Yes — shared datasets aren't attached to your name in the research/admin tools (sometimes inconveniently so for support).

Blog

May 17, 2026Jon Meriwether Circadia

What Happens When You Share Sleep Data With Circadia

Shared sleep data does not disappear into a mysterious AI pile. It helps us replay history, grade forecasts, tune the recipe, and keep only changes that make Circadia more honest and useful.

TL;DR

Shared data enters a permission-based improvement loop: replay your history in order, predict, grade the misses, adjust the forecasting "recipe," validate on unseen data, and keep only changes that genuinely improve predictions.
The forecast is a recipe with adjustable knobs (parameters); "training" just means tuning those knobs to predict sleep that hasn't happened yet — not fitting the answers in hindsight.
Every prediction gets a "wrongness score" (loss function); a better forecast means less wrongness on data it wasn't allowed to peek at.
Testing follows "no time travel" (walk-forward validation): the model predicts only from what it knew at that moment, gets graded, then learns what actually happened. Bigger tests freeze it and ask for 7/14/30 days forward.
Good forecasts are calibrated — "wide enough to trust, narrow enough to help," and appropriately unsure when nights are messy.
Shared data improves the general engine (not just your personal forecast), helps catch failures for specific subgroups, and once even helped find a display bug. It's anonymous in the research tools. Sharing is optional — the app works fully without it.

"We use shared data to improve sleep predictions" is one of those sentences that sounds reassuring until you think about it for more than three seconds.

What data? Improve how? Who looks at it? Is this one of those giant AI systems that eats everyone's private life and emits a slightly shinier product?

For Circadia, the answer is smaller, more boring, and more careful - in a good way.

When you choose to share sleep data with Circadia, it does not disappear into a mysterious AI pile. It enters a permission-based improvement loop: replay history, make predictions, grade the misses, adjust the sleep-forecasting recipe, test against future data, compare against the forecast people are currently seeing in the app, and only keep changes that make our predictions better.

This post is about what that means.

Circadia is trying to answer a deceptively simple question

The basic user question is simple:

Given what my sleep has been doing lately, when am I likely to sleep next - and when am I likely to wake up naturally?

That sounds like it should be easy. For many people, it often is. If you usually sleep around midnight and wake up around 8 AM, tomorrow probably looks a lot like yesterday.

But Circadia is especially useful for people whose sleep is not that tidy: delayed sleep, non-24-hour rhythms, irregular schedules, naps, skipped nights, recovery sleep, chaotic weeks, travel, illness, medication changes, and all the other ways human sleep refuses to behave like a calendar event.

A four-hour sleep in the afternoon might be a nap. It might be recovery sleep. It might be the beginning of a real phase shift. It might be a weird one-off. The answer matters, because each explanation implies a different forecast for the next sleep.

If a forecast has to immediately classify that sleep event - deciding right away that "this was definitely a nap" or "this was definitely the new main sleep" - it becomes inflexible. That inflexibility is punishing: if it settles on the wrong explanation too early, it messes up future predictions.

A better forecast should be able to say something more honest: this looks mostly like a nap, but there is some evidence it might be a phase shift. Let's not settle on a single explanation until we've seen more evidence.

That is the heart of our current forecasting work. Not magic. Not a giant black box. Just a simple forecasting system that knows when to say, "I'm not sure yet."

The forecast is a recipe with adjustable knobs

Circadia starts with a basic set of rules about how human sleep can move over time. In machine learning, this set of rules is called a model.

You can think of these rules like a recipe. It has adjustable assumptions:

How quickly does this person's sleep schedule seem to drift?
How much should a short nap reduce your need for sleep later?
How quickly should we trust a sudden change in your schedule?
How wide should the forecast window be after a skipped night?
How unsure should we be about when you'll wake up versus when you'll fall asleep?

You can think of these assumptions as adjustable knobs. The technical word for these knobs is parameters.

Training just means tuning those parameters so the recipe makes better predictions. Not better in hindsight. Not better because it got to see the answers first. Better at predicting sleep that had not happened yet.

Every forecast gets graded

To improve a forecast, we need a way to tell whether one version is better than another. So every prediction gets graded.

If the forecast predicts sleep at 2 AM and you actually fall asleep at 5 AM, that prediction gets a worse grade than one that guessed 4:45 AM. If it predicts a sleep window that never happens, that's bad too. If it completely misses a sleep period, that's also bad. If it gets bedtime roughly right but wake time badly wrong, that really matters, because natural wake time is one of the main reasons Circadia exists.

We just call this the forecast's wrongness score. In data science, this grading rule is called a loss function.

A better forecast means less wrongness. Training is the process of turning the knobs so this wrongness score goes down on data it wasn't allowed to peek at.

No time travel

It is easy to make a model look smart if it gets to see the answers first. That is not forecasting. That is hindsight.

When we test a model, we replay a sleep history in order. At each step, the model has to predict from what it would have known at that moment. Only after the prediction is graded does it get to learn what actually happened.

Our rule for this is simple: no time travel. The technical name for this kind of testing is walk-forward validation.

For example:

The forecast sees Monday and Tuesday.
It tries to predict Wednesday.
We compare the prediction with what actually happened on Wednesday.
Only then does the forecast get to learn from Wednesday's data.
Then it tries to predict Thursday.

This matters because we don't want a system that can redraw the past beautifully. We want one that can look at Tuesday night and make a useful prediction about Wednesday.

For bigger tests, we go further. We freeze the forecast at a point in time and ask what it would predict over the next 7, 14, or 30 days without learning from those future days first. That is closer to what the app actually has to do for you.

Wide enough to trust, narrow enough to help

Circadia does not get extra credit for pretending to be precise.

A forecast that says "you'll probably sleep between 1:00 and 2:00" feels nice. But if your actual sleep lands at 5:30, that precision was fake.

A forecast that says "tonight is messy; sleep is likely somewhere between 1:00 and 6:00" is less tidy, but it may be much more useful.

That window is the model's uncertainty band. The goal is not the narrowest possible window. The goal is the narrowest window that is still honest.

A good forecast should know when to be confident and when not to be. If your sleep has been steady, the window should usually be narrower. If you just skipped a night, took a long nap, or suddenly shifted, the window should widen instead of bluffing.

A good forecast doesn't just ask, "Was the predicted time close?" It also asks, "Was the forecast appropriately unsure when things were messy?" The technical word for this honesty is calibration.

A forecast that is confidently wrong should be penalized more than one that admits the night is confusing. But a forecast that always draws huge windows isn't useful either.

So the standard is: wide enough to be trustworthy, narrow enough to help.

A tiny example of training

Imagine a user usually sleeps once per day. Then one afternoon they log a four-hour sleep. The next main sleep arrives much later than usual.

What was that four-hour sleep?

Maybe it was a nap. Maybe it was recovery sleep. Maybe it was the beginning of a phase shift. As we talked about earlier, an inflexible forecast has to pick one explanation too early. If it guesses wrong, the next prediction can jump in the wrong direction.

A more flexible forecast keeps multiple explanations alive. It might say, in effect: this looks 70% like a nap and 30% like a true phase shift. Then it makes a prediction that blends those possibilities.

When trying to improve the forecast, we might adjust one knob: for example, how strongly short daytime sleeps should reduce your need for sleep later. If the forecast keeps predicting sleep too early after naps, we try increasing that effect. If it starts delaying sleep too much, we turn it back down.

But we do not keep a change just because it helped one example. We replay the new rules across many consented histories, always moving forward in time, and then test it on data it was not tuned on. A tweak only matters if it survives that process.

What shared data is for

Your own forecast can adapt to your own recent sleep history. That is personal to your experience of the app.

Shared data is different.

When you opt into in-app data sharing, you help us improve the general forecasting engine: the default rules, model comparisons, validation tests, and safeguards against making things worse.

Shared data helps us answer questions like:

Does this new forecasting recipe predict future sleep better than the forecast people are currently seeing in the app?
Does it still work on people it was not tuned for?
Does it improve wake predictions, not just bedtime predictions?
Does it handle naps and skipped nights without falling apart?
Does it draw uncertainty windows that are honest, not just convenient?
Does it fail badly for any subgroup of users, even if the average score looks better?

That last question is important. We do not want to hide new mistakes behind a high overall score.

A forecast can get better on average while getting worse for a specific type of sleep. That is why we check our math in multiple ways: bedtime accuracy, wake accuracy, missed sleeps, fake sleeps, and how well it handles data it hasn't seen before.

Shared data also helps us debug

Sometimes shared data helps in a more ordinary way: troubleshooting.

A user may send us a screenshot and say, "This forecast looks wrong" or "my chart looks strange." With an anonymized, read-only shared dataset, we can load the same kind of sleep history into our research tools, compare the live app forecast against new forecasting recipes, and see whether the problem is the forecasting system, the chart, or the way the app is displaying future predictions.

This matters because a forecast can be wrong in more than one place. The math can be wrong. The visualization can be wrong. The live app and the research tools can be using the same forecast function in subtly different contexts.

That actually happened recently. A chart appeared to show a strange missing week in the forecast. The raster view made the gap obvious, and model comparison helped us identify the real issue: it was a display bug in our research tools, not the algorithm deciding to skip a week of sleep.

That is the kind of failure we want to find before it becomes invisible.

The model gets better because people let us see where it fails.

Anonymous, sometimes inconveniently so

In-app shared data is anonymous by design. More precisely: shared datasets used in our research and admin tools are not attached to your name in those tools.

This is sometimes inconvenient. If someone sends us a screenshot for troubleshooting, we cannot simply search the shared-data pool for their name. Our research tools do not show names. That can make support slower, but it is the right inconvenience to have.

Support should not require turning every shared sleep log into an identity record.

There are cases where people voluntarily share richer datasets with us through other channels, or where we know someone because they are active in feedback, testing, email, Discord, or support conversations. That is separate from anonymous in-app sharing, and we keep that distinction explicit.

The people behind the better forecasts

Some of Circadia's biggest improvements have come from volunteers.

A few people have shared unusually long sleep histories - in some cases, years' worth of data. We are deeply grateful for that.

Long histories are precious because they show us things short test datasets never could: slow circadian drift, chaotic weeks, naps that may or may not be naps, skipped nights, recovery sleep, sudden changes, and all the ways real sleep breaks tidy assumptions.

Those contributions directly help improve predictions across the app.

Not because we pour everyone's logs into a mysterious AI pile, but because real sleep histories let us test the model honestly.

Would this version have predicted the next sleep better? Would it have handled the weird week? Would it have admitted uncertainty instead of drawing a confident-but-wrong window?

Circadia is also a very small indie company. Right now, we have two developers (1 owl + 1 kestrel). A lot of the app has been shaped by people we actually talk to: people who send bug reports, screenshots, feature requests, confusing forecasts, thoughtful complaints, and volunteered data.

Some people help by explaining what felt wrong. Some help by testing new features. Some help by opting into anonymous data sharing. Some help by sharing longer histories through other channels.

It all matters.

Circadia should still work if you never share a thing.

But when you do choose to share, you give us another real-world case to test against. Another chance to catch a bad assumption. Another chance to tune an uncertainty window. Another chance to find a failure before it reaches more users.

Sharing your data helps us help you - and it helps us help the next person whose sleep refuses to fit neatly into a calendar.

That is what "using shared data to improve Circadia" means.

Not magic. Not mystery.

A careful loop: predict, compare, adjust, validate, and only keep changes that make predictions better.

FAQ

How does Circadia know a new forecast is actually better?: Walk-forward validation — "no time travel." It predicts only from past data, is graded against what really happened, and a change is kept only if it survives testing on data it was never tuned on.
What is "no time travel"?: Replaying history in order so the model can never peek at the answer before predicting — the opposite of making a model look smart by letting it see the future.
What does Circadia do with data I share — is it an "AI pile"?: No. Shared data feeds the forecast's machine-learning loop — predict, compare against what actually happened, adjust the model's parameters, validate on data it wasn't allowed to peek at — to improve the general engine. It's classical machine learning (tuning a forecasting model, not a chatbot), not a faceless "AI pile," and shared datasets are anonymous in the research tools.
Is sharing required?: No. The app works fully if you never share anything; sharing just gives the team another real case to test against.
Is my shared data anonymous?: Yes — shared datasets aren't attached to your name in the research/admin tools (sometimes inconveniently so for support).