Correlation Is Not Health. Why the ZOE “gut health” score is not what you think.
Right now ZOE is everywhere telling people they can finally “measure their gut health.” They have a big shiny Nature paper (https://www.nature.com/articles/s41586-025-09854-7#Sec23) behind them, which makes it sound untouchable.
Here is the uncomfortable truth.
The paper does not define a healthy microbiome.
It takes a very specific kind of person, a very specific set of blood tests, runs a big correlation exercise, then slaps the word “health” on top of it and turns that into a product.
Let me walk through what actually happened, in plain language, and where the science falls apart.
What the Nature paper actually did
Strip away the hype and this is the basic recipe.
• ZOE had stool samples and health data from about thirty five thousand adults in the US and UK
• Almost all of them are Western, relatively affluent, health app users, often ZOE customers
• They measured hundreds of gut microbes using DNA sequencing
• They also measured things like body mass index, blood lipids, blood sugar, inflammation markers and diet quality scores
Then they did this:
For each microbe, they asked
“When this bug is a bit more abundant, do people tend to have slightly better or worse numbers for these markers”
They did that for about thirty seven markers in total
Things like BMI, triglycerides, cholesterol, blood sugar after a meal, and several diet indices
For each marker they ranked the microbes from “most associated with good values” to “most associated with bad values”
They converted those ranks into scores between zero and one, then averaged everything together into one final number for each microbe
Microbes with low scores became “favourable”
Microbes with high scores became “unfavourable”
They took the 50 lowest and 50 highest and turned them into the famous lists.
That is the ZOE microbiome health ranking in simple English.
A giant correlation table, compressed into a single label.
They did not:
• Follow people over time to see who actually developed disease
• Prove that changing the abundance of any single microbe changes risk
• Show that this ranking works the same way outside US and UK ZOE style users
So the first key point is this.
This is not a map of “health” in general. It is a map of which microbes track with better or worse numbers in one kind of Western population.
That matters for what comes next.
Problem 1
“Health” is being quietly redefined
In the paper, “cardiometabolic health” is not a diagnosis. It is a bundle of surrogate markers.
• Body mass index
• Waist circumference and blood pressure
• Blood lipids
• Blood sugar before and after meals
• A few composite risk scores
• Diet quality scores built from food questionnaires
All of these were measured at a single point in time.
None of this is
“Did you actually have a heart attack”
“Did you develop diabetes”
“Did you die earlier”
You can absolutely use these markers for research. The problem starts when you jump from
“Microbes that correlate with better surrogate markers in our ZOE users”
to
“These microbes define a healthy microbiome”
That is a massive leap. The inputs do not justify that output.
A healthy microbiome for a 25 year old former athlete with slightly high LDL is not necessarily the same as a healthy microbiome for a 70 year old with arthritis and a completely different life history. Yet the ZOE score quietly presents one set of patterns as the standard for everyone.
So that is the first strike.
They quietly swapped “health in general” for “a composite of short term blood markers in ZOE users” and hoped no one would notice the difference.
Problem 2
The ranking method is arbitrary and fragile.
Now let’s talk about the way they built the ranking, because this is where it gets really shaky.
Their method can be summarized like this.
For each microbe and each health marker they calculate a correlation, controlling only for age, sex and BMI
For each marker, they rank microbes from best to worst
They convert those ranks into a score from zero to one
They average across all markers
They average across health categories
They average across cohorts
Then they slice out the 50 “best” and 50 “worst”
On the surface that sounds harmless. Under the hood it is messy.
a) The markers are highly redundant
Many of the markers move together.
• Different cholesterol measures track each other
• Fasting and post meal triglycerides track each other
• Several diet scores are built from similar food habits
In the ZOE paper, even the reviewers forced them to show that many of these markers are strongly correlated.
That means when you average ranks across all of them, some physiological dimensions get counted over and over, and others barely show up. A bug that tracks with “the lipid cluster” gets many votes. A bug that tracks with something more independent gets one or two.
They never correct for that. They simply accept the double counting and move on.
b) The score changes across countries
When they build separate rankings for US and UK participants, the agreement is only modest for health and much weaker for diet.
Translation.
The exact position of many microbes on this “health” ladder moves around once you change the country, even though both countries are Western, industrialized and relatively similar.
If the ranking is wobbly between Boston and London, what do you think happens when you try to apply it to Nairobi or Mumbai
c) The top 50 cut off is cosmetic
There is no magical gap between the 50th and 51st microbe. The final scores lie on a smooth gradient.
The decision to pick 50 “good” and 50 “bad” is a marketing choice, not a discovery. They could have picked 30, 75 or 100 and the science would not care.
Worse, the microbes around that cut point can easily move in and out if you slightly tweak the recipe. Many of them have nearly identical scores.
So when you see headlines like
“These 15 microbes define a healthy gut”
what you are really looking at is
“These are the handful of microbes that happened to sit in one section of a long, smooth ranking created by a chain of arbitrary choices in one company’s dataset”
Problem 3
The score barely adds anything beyond cheap clinical data
Here is the part nobody in this space likes to say out loud.
When you control for the obvious stuff, microbiome scores usually add very modest predictive power.
In their own paper, ZOE shows:
• Machine learning models using the microbiome can discriminate high versus low risk groups with AUCs usually in the mid 0.6 range.
• Correlations between microbiome based predictions and actual lab values are in the 0.3 to 0.4 four range
Those are not terrible. They are also not special.
Doctors already have cheap predictors.
• Age
• Sex
• BMI
• Blood pressure
• Basic lipid panels
• Simple diet scores or even a quick clinical history
A microbiome score has to significantly improve on those to justify turning it into a health test. ZOE never really demonstrates that.
They do not show a clean comparison like, “Here is the performance of standard clinical predictors alone” versus “Here is what happens if we add our microbiome health score on top”.
Without that, you cannot claim that you have a powerful new health measure. At best you have a noisy echo of BMI and diet, wrapped inside expensive sequencing and nice branding.
Put bluntly.
You are paying a company a lot of money for a score that mostly tells you that people who weigh less, have better blood markers and eat more plants also have a different set of microbes. We knew that already.
Why this matters for the public
If this were only an internal toy for researchers, it would be mildly annoying and that is it. The problem is where it goes next.
Nature puts its logo on the paper. ZOE turns around and uses that paper as proof that they can measure gut health. The nuance disappears.
Most people will never read the methods. They will see:
• Nature
• Big dataset
• “Health ranking”
• Slick marketing
and conclude that we finally know what a healthy microbiome looks like.
That is not true.
We have a better sense of which microbes track with certain blood markers in affluent Western app users. That is useful as one piece of the scientific puzzle. It is not a universal health yardstick.
The risk is very real.
• People outside that demographic get told their microbiome is “unhealthy” because it does not match a profile built on someone else’s life, diet and environment
• Clinicians and wellness influencers start treating those “good” and “bad” lists as gospel
• Companies design diets and supplements to chase movement in a score that is only loosely connected to real outcomes
This is how you end up with entire industries built on sand.
So what would a serious “gut health” metric need
If we were being honest about what it would take to define a true microbiome health score, the bar would look very different.
At minimum you would need:
• Long term follow up
Not just blood work today, but who actually develops disease, who stays well and who dies early
• Diverse populations
Not just ZOE customers in the UK and US, but people across different incomes, ethnicities, diets and environments
• Transparent confounder control
Medication use, smoking, physical activity, socioeconomic status, co existing conditions, all in the model, not just age, sex and BMI
• Clear incremental value
Explicit tests of how much a microbiome based score improves prediction beyond basic clinical measures.
• Honest language
Calling it what it is.
“A score that captures how similar your microbiome is to lower risk individuals in this specific population”
Not “Your gut health score”.
This respects reality.
The bottom line
The ZOE Nature paper is not fake science. There is real work in there. Big cohort, serious sequencing, proper statistics. The problem is not that the dataset exists.
The problem is what they and others are claiming on top of it.
• They treat a bundle of cross sectional surrogate markers in ZOE users as “health”
• They push a fragile, arbitrary ranking procedure as if it were a stable definition of a healthy microbiome
• They imply, in marketing and press, that this score can tell you how healthy your gut is in some general sense
That is not supported by their own data.
If you want to change your diet, improve your blood markers and feel better, you do not actually need an expensive gut health score that mostly rephrases “eat more plants and manage your weight.”
And if the field wants to be taken seriously by people who are tired of microbiome hype, we have to start calling this out clearly.
Correlation is not health.
A branded score is not a universal standard.
And no company gets to decide what a “healthy microbiome” looks like for the entire planet based on its customer base.


Very agreed. I have great concern when a company does not make all of the data being used freely available for others to look at in details. When I see "Mean" in a microbiome article, it is a tell that they really do not understand the statistics of the microbiome.
Often we see Mean(A) < Mean(B) with Median(A) >> Median(B) and Median being a better statistical measure. For a simple counter example see:
https://blog.microbiomeprescription.com/2025/12/07/odds-ratio-snapshot-depression/