Blog

Why I'm Asking You to Tap One More Button

I went down a rabbit hole this week about how to share your data with actual researchers without doing something stupid. Here's what I learned, what I built because of it, and a small ask at the end.

Here's something I've been thinking about for a while: collecting sleep data is the easy part. Designing how to share it — with researchers, in a paper, in a public dataset — so it stays genuinely useful AND nobody gets accidentally identified down the line is the part with teeth. None of that sharing has happened yet, and I want the rails fully built before any of it does.

So I spent a good chunk of this week building those rails. I came out the other side with a system I'm proud of, a new respect for everyone who's ever written a privacy policy, and one new opt-in I'd love to tell you about. Here's how it all fits together.

The thing I got wrong in my head

I'd been treating "your data" as one thing. It's not. It turns out there are two completely different versions of your data, and almost every mistake comes from confusing them.

There's your live data — the stuff in the app right now, the log you add to every morning. That's yours. You can edit it, export it, or nuke it entirely, and when you delete it, it's gone. That's how it should be, and that's how Circadia already works.

Then there's published data — a frozen snapshot that a paper or a public dataset is built on. And this one works differently in a way that surprised me: once it's published, you can't pull your data back out of that specific version. Not because I'm hoarding it — because that's how science stays honest. If a published finding rests on data that can quietly vanish later, nobody can ever check the work. The whole point of publishing data is that someone else can reproduce what you did. A dataset that evaporates on request isn't reproducible; it's just a rumor with a graph.

So the fix wasn't "pick one." It was building a wall between the two. Your live data stays fully yours, deletable forever. Published snapshots are separate, frozen, dated, and — this is the key part — you only end up in one if you specifically said yes to that. Which brings me to the button.

"I can see why other apps ask for forty permissions now"

I used to roll my eyes at apps that make you tap through six consent screens. Three days into building this, I sent Jon a text that was just: "I owe every privacy policy an apology."

Because here's what I ran into. I want to do this granularly — let you say yes to "help me improve the app" without also saying yes to "put my data in a public scientific dataset forever." Those are really different things and bundling them into one "I agree ✓" would be gross. But the honest version means more than one switch, and more than one switch means more explaining, and more explaining means… I am now one of those apps. Except I promise the switches are real, they do what they say, and there are two, not forty.

And then there's GDPR, which is — I'm just going to say it — a lot. It's a genuinely well-intentioned piece of law, and it is also a labyrinth. There's a specific research exemption that lets datasets like this exist at all, but it comes with a stack of conditions, and "is what I'm doing compliant?" turns into a real question pretty fast. Days of reading. Two separate "oh" moments where I had to redesign something I thought was already fine. I'm not going to pretend I emerged a lawyer, but I emerged with way more respect for the people who write this stuff for a living, and a system that takes the rules seriously instead of going "eh, we're small, it'll be fine."

I drew the line at two real choices, because four would've been me showing off how careful I am at the expense of you actually understanding what you're agreeing to. Careful isn't the same as complicated. I had to keep reminding myself of that.

The genuinely hard problem: you are not anonymous enough, and neither am I

This is the part I can't fully engineer my way out of, so I'm just going to be straight about it.

Anonymizing data normally works by hiding you in a crowd. Strip the name, blur the details, and you're one of ten thousand 30-somethings — good luck picking you out. That works great when there's a crowd.

N24 does not have a crowd.

The condition is rare enough that some of you are unique on details I can't strip without throwing away the entire reason the data is valuable. If someone has an eight-year continuous log, there might be — what, a handful of people on Earth who've ever recorded that? "Anonymous person, eight years of data, this exact drift pattern" can still be one specific human. I can blur your age into a bucket and drop your city, but I cannot blur away the thing that makes your data scientifically precious, because that thing is also the thing that makes you identifiable. They're the same thing.

So I stopped pretending I could make everything safely public. Instead:

  • The public dataset only gets things that genuinely anonymize — mostly big-picture summaries, and the less-distinctive individual logs with the identifying edges sanded off. The truly unique long histories don't go in the public pile, because I can't make them safe there.
  • The detailed stuff only ever goes to a specific researcher who signs an actual agreement — a real document where they promise not to try to re-identify anyone and not to pass it around. The protection there isn't "you're hidden in a crowd," it's "this person is legally on the hook." That's how rare-disease registries have always done it, turns out. I thought I was inventing a workaround; I was just discovering the standard.

(For what it's worth: I personally do not care if people know which logs are mine. I'd put my name on it. But that's my call to make about my data, and the whole point is that it stays everyone else's call about theirs — especially the quiet folks who just log every day and never post anywhere. The protections aren't for me. They're for them.)

What I actually built

  • Two real sharing choices, not a maze. Help improve the app; or share for research. Plain language, each does exactly what it says.
  • A "published means published" line in the research option — so if you opt in, you know up front that a public snapshot can't be un-published, even though you can stop sharing and delete your live data anytime.
  • A wall between live data and frozen snapshots, so deleting your log always works and never quietly fails because "it's in a dataset somewhere."
  • A two-channel system — anonymized summaries for the public, detailed data only to researchers under a signed agreement — because rare conditions can't be crowd-anonymized and pretending otherwise would be a lie.
  • A promise to tell you first. If your research-shared data is ever going to a specific researcher or study, I'll let the whole research-sharing group know before it happens — in the notifications inbox, in the app — so you can opt out before any snapshot is taken.
  • A promise to tell you when it lands, too. If a paper or a public dataset goes out and your data is in it, I want you to hear about it — because honestly, as a science nerd, I'd be so hyped to know my data was in an actual published study, and I assume some of you would be too. Realistically this depends on me keeping tabs on what comes out and on researchers telling me when they publish (I'll be writing that into the agreements going forward). If something ever slips past me, please yell at me — I'd rather be told than miss it.

Why I'm doing all this now

Because — and this still makes me a little dizzy — Circadia might genuinely produce the largest dataset on sighted N24 that exists. The biggest one in the literature is from twenty-plus years ago and had 57 people. I had 49 sign up in my first eleven days. There is a real chance this turns into something researchers cite, and I am not going to fumble your trust on the way there.

I'd rather over-build the consent and privacy machinery now, while it's a few dozen of us who mostly talk to each other, than bolt it on later when it matters more and I have less room to get it right.

You may have noticed me grinding pretty hard the last couple of weeks — new features, blog posts, fixes, more blog posts, the occasional 11pm "oh actually we should also…" There's a reason. The native iOS app is about to land in the App Store, and once it does, the population of Circadia is going to look very different — more people, fewer of them folks I already know by name, and a lot less margin for "oh I'll just fix that next week." Everything I can lock in before the app is out — the privacy posture, the consent model, the way the math handles weird weeks, the explanations of what we're doing and why — is one less thing I have to try to retrofit while a hundred strangers are signing up at the same time. So if it looks like I'm cramming, I am. Cheerfully. On purpose.

The ask

If you previously opted into research-level sharing, thank you — genuinely. But your original yes was to something slightly narrower than where this is heading, and I'm not going to stretch an old consent to cover a new thing. That's exactly the move I'd be mad at another app for making.

So I'm asking again, cleanly: if you're comfortable with your anonymized data possibly being part of a published scientific dataset — the kind that can't be withdrawn once it's public, with all the protections above — there's a new opt-in waiting for you in Settings → Data sharing. It's one tap. Saying no changes nothing about your current sharing; you keep everything exactly as it is. Saying yes means you might, down the line, be part of the first serious dataset this condition has had in two decades.

No pressure, no nagging, no dark patterns. Just one honest button, and you decide.

Quick side-confession on the "no nagging" part: I am extremely aware that I have been making a lot of banners lately. Like, an embarrassing number of banners. So I also built a little notifications inbox in the app, so I can stop slapping a new strip across the top every time something happens and instead just… put it in the inbox, where you can read it when you feel like it (or not). Same goes for things like "a paper just came out using shared data" — that'll show up there. Less yelling-from-the-top-of-the-app, more letting you choose when to look.

As always — keep telling me what's broken, and keep telling me what you wish it did.

— Dayah

FAQ

What's the difference between my live data and "published" data?
Live data is in the app and fully under your control (edit/export/delete). Published data is a frozen, dated snapshot in a paper/dataset that can't be un-published, so a finding can always be checked.
Can I delete my data or withdraw from a published dataset?
You can delete your live data anytime and stop sharing. A snapshot already published can't be withdrawn from that version — which is why opting in is explicit.
How is my data protected if N24 is so rare?
Truly unique long histories aren't put in the public pile (they can't be safely crowd-anonymized); they go only to a specific researcher under a signed no-re-identification agreement.
What are the two sharing choices?
"Help improve the app" and "share for research" — separate, plain-language opt-ins, not a bundled "I agree."
Why ask again if I already opted into research sharing?
Because the new published-dataset use is slightly broader than the original consent, and stretching an old yes to cover a new thing would be exactly the move to avoid.

Comments