The Global Microbiome Conservancy Cohort: Gorgeous Data, Partial World
Every few years the field drops a “big” microbiome paper that promises to finally answer the question of how lifestyle, industrialization, and genetics shape the human gut.
This Global Microbiome Conservancy (GMbC) preprint (https://www.biorxiv.org/content/10.1101/2025.10.20.683358v1.full.pdf) is one of those papers. It is ambitious. It is technically impressive. It is also a perfect example of how our field keeps reaching for “global” and “inclusive” while quietly side-stepping the people who are actually sick, food insecure, and structurally screwed.
In other words, it is a great resource for microbiome ecology. It is not the last word on global health or equity, no matter how nicely the abstract is written.
Let us unpack that.
What the authors claim they built
The GMbC team assembled a cohort of 1,015 “healthy” adults from 12 countries and 35 locations, with a mean age of 36.7 years and a mean BMI of 22.8. All were recruited without self reported infection or chronic disease. Sex is roughly balanced. They report 73 self-declared ethnicities, and they genotyped 913 of the participants, assigning them to 17 genetic admixture groups.
From each person they collected:
Shotgun metagenomes from stool at about 25.5 million reads per sample with roughly 85 percent mapping rate.
Fecal biomarkers such as IgA, IgM, calprotectin, and chromogranin, all measured centrally using standardized ELISA protocols.
IgA coating profiles of microbes through IgA based cell sorting and sequencing.
Human genotyping on an Illumina array, followed by PCA and admixture modeling to define genetic clusters.
Lifestyle and diet metadata including industrialization status, urbanization, subsistence strategy, basic environmental exposures, and food frequency data.
They then did what everyone does when they drown in metadata. They pushed lifestyle, diet, and genetics into principal components and gave those new capitalized names.
Lifestyle PCs, where PC1_Lifestyle is driven by industrialization status, electricity access, subsistence mode, and household floor type, and PC2_Lifestyle picks up population density and tobacco use.
Diet PCs, where PC1_Diet loads on dairy and industrial goods such as ice cream, and PC2_Diet loads on local staples such as macabo, guava, and wild mammals.
PC1_Lifestyle is used throughout the paper as a continuous proxy for “industrialized lifestyle.”
Their central claim is simple. Industrialization is associated with lower diversity, more homogeneous microbiomes, less stable ecological networks, and altered IgA and stress marker profiles, even after adjusting for genetics, diet, and geography. They also show that the famous Gut Microbiome Health Index (GMHI) and other disease predictors trained in Western cohorts do not generalize well across this global dataset.
On the surface, it reads like a win for “global, multimodal, integrative” microbiome science.
And parts of it are. But we need to talk about who is actually in this cohort and what that means.
Where the paper genuinely excels
Let us start with the good news, because there is a lot of it.
1. The lab work is outstanding
The wet lab and data generation are exactly what you want in a modern cohort.
Standardized stool collection and rapid processing on site, followed by cryopreservation and central storage at MIT.
Deep metagenomic sequencing with good mapping rates, plus a custom species level genome bin set combining isolates and MAGs, which massively improves taxonomic and functional resolution.
IgA, IgM, calprotectin, and chromogranin measured using commercial ELISAs with shared supernatants across assays to reduce technical variation.
This is not a slapped together “16S in three freezers” situation. The biological data are high quality.
2. They actually mean “global” in a geographic sense
The cohort covers 12 countries and 35 localities. Pairwise geographic distances between participants range from almost zero to 14,920 kilometers. Population densities range from 0.4 to 15,000 inhabitants per square kilometer.
Within 11 of the 12 countries, they recruited multiple populations to capture fine scale variation instead of treating “one village” as “the country.”
So they did not just grab a few Americans, one Italian farming village, and one African pastoralist group and declare victory. There is real geographic spread and regional structure.
3. They treat key terms with unusual care
The authors do something I wish more microbiome papers did. They acknowledge that labels such as “hunter gatherer,” “traditional,” and “industrialized” are loaded, historically contingent, and often wrong. They point out that these categories can embed colonial biases and erase the dynamism of the communities they work with.
They explicitly define industrialization, urbanization, and subsistence strategies in the Methods, along with their limitations, and say they want to support more critical and inclusive frameworks for thinking about global microbiome change.
That move matters. They know the language is political and they at least try to be transparent about it.
4. They hit the “Western models do not generalize” point hard
When they push the Gut Microbiome Health Index (GMHI) across the GMbC cohort, the results are exactly what many of us have suspected.
People who are clinically healthy in non industrialized or semi industrialized contexts often look “dysbiotic” or “unhealthy” according to GMHI. Meanwhile, the index tracks US inflammatory bowel disease patients pretty well.
Translation. We built “health scores” in rich, industrialized disease cohorts and then quietly pretended they applied to everyone. They do not.
This paper nails that point with data. That is a genuine contribution and a useful warning for anyone trying to commercialize microbiome diagnostics globally.
So far, so good. Now let us step into the messier side.
Who is actually in this cohort?
This is where the marketing language and the underlying reality start to diverge.
1. Healthy adults, not the people who are sick
Every participant was recruited without self reported signs of infection or chronic disease. Mean BMI is within the normal range, with a minimum down in the twelve point nine range and a maximum of forty four point seven.
There are no children. No infants. No adolescents. No frail elders in nursing homes. No multi morbid adults juggling diabetes, COPD, depression, and food insecurity.
If you are asking “how does industrialization reshape the baseline microbiome of relatively healthy adults,” this is a reasonable design.
If you are asking “how do structural racism, diet, and environment produce the chronic disease burden we see in modern cities,” this cohort is almost irrelevant.
2. Convenience sampling in disguise
The recruitment strategy is explicit. They designed it to enable “comparisons across multiple geographic scales” while “accounting for differences in lifestyle, diet, and host genetics.”
That sounds nice, but underneath it is exactly what it looks like. This is a convenience sample built around places where the GMbC team has collaborators, ethics approvals, and enough trust to ask people for stool and saliva.
There is no population sampling frame. No census linkage. No attempt to represent each country’s socioeconomic distribution. They pick communities that are practical to work with and that collectively span specific lifestyle categories.
Again, that is fine for an ecological resource. It is not a global health surveillance system.
3. The “industrialized” end of the gradient is not who you think
Industrialization status is defined using the Human Development Index (HDI). Populations in localities with HDI below the national median are labeled “non industrialized.” Those above are “industrialized.”
Urbanization is defined using a simple population density threshold. Above one thousand inhabitants per square kilometer is urban. Below that is rural.
Subsistence strategy is defined into eight broad categories such as Foraging, Farming, Pastoralism, Fishing, Foraging and Industrialism, and Industrialism.
Put those choices together and think about what “industrialized, urban, industrialism subsistence” really means in practice.
It means you live in a locality with HDI above the national median. You are in a relatively dense area. Your primary subsistence mode is wage income in an industrial or service economy.
That describes a wide range of people. It includes wealthy professionals in Helsinki. It includes middle class office workers in Bangkok. It includes some low income but relatively stable urban populations.
It does not distinguish:
Wealthy residents in safe, green neighborhoods with farmers markets and medical care.
Working poor families in food deserts who live between a gas station, a dialysis center, and a fast food strip.
People in informal settlements with overcrowding, poor sanitation, and unstable income.
From the perspective of the microbiome, those are radically different environments. In the GMbC framework, they are all just “high PC1_Lifestyle.”
There is no measure of:
Household income or wealth.
Education.
Food insecurity.
Neighborhood density of grocery stores versus dollar stores.
Exposure to policing, violence, or chronic stress.
So the “industrialized” side of this gradient is not the urban poor of Detroit, Rio, or Johannesburg. It is the subset of people in industrialized contexts who are reachable by academic researchers and who self identify as healthy.
That is a very different thing.
Ethnicity, ancestry, and the missing structural piece
The authors are admirably honest about the colonial baggage of terms such as “hunter gatherer.” They note that labels for subsistence and lifestyle can distort reality and that some terms have been used to freeze communities in time and erase recent history.
They also:
Collect self declared ethnicity.
Acknowledge that these labels have meaning for communities.
Recognize that no terminology is neutral.
Then they punt.
For the main analyses, they treat “Genetics” as a set of PCs from the genotype data, and they avoid using ethnic group labels as analytic units. They justify this as a way to sidestep arbitrary and colonial categories.
There is some wisdom in that. The last thing we need is another paper claiming “the X people have more Prevotella” without context.
The problem is that ancestry is not just genetics. Ethnicity is not just a cluster in PCA space.
Ethnic and racial identity maps onto:
Segregated neighborhoods.
Different schooling and job opportunities.
Differential policing and incarceration.
Discrimination in housing, banking, and health care.
Cultural food practices that are sometimes resilient and sometimes forcibly disrupted.
Those social realities shape diet, stress, pathogen exposure, and medication use, which then shape the microbiome. When you collapse everything into “Genetics PCs” and “Lifestyle PCs,” you erase that social structure.
So we get a cleaner statistical model and better p values. What we do not get is any serious insight into how being, for example, a racialized minority in an industrialized city changes your microbiome compared to your genetic cousins in a different structural position.
It is all “Genetics versus Lifestyle versus Diet” and almost no attention to the systems that create those distributions in the first place.
What about low resource populations and food deserts?
This is the part that really matters for Better Microbiome Thinking. We talk a lot about food deserts, structural poverty, and global inequities in gut health. Does GMbC help here, or does it mostly skate over the surface?
What they do capture
To their credit, they worked with communities that span:
Rural and semi rural subsistence lifestyles including foraging, farming, pastoralism, and fishing.
Mixed subsistence where people combine traditional practices with industrial work.
Low infrastructure settings with limited access to electricity and improved flooring. These features load heavily on PC1_Lifestyle.
So yes, they have true low resource rural populations in the dataset. They are not pretending that “global” means “Boston plus one village.”
What they completely miss
The cohort does not have the variables you need to talk about food deserts and within city inequity in any serious way.
There is no:
Direct measure of food insecurity.
Standardized instrument for hunger, such as the Household Food Insecurity Access Scale.
Household income.
Education.
Job type or work schedule.
Neighborhood food environment.
Exposure to violence or chronic psychosocial stress.
Diet is captured through FFQs that mix a common template and site specific items. Those get harmonized and then collapsed into PCs. PC1_Diet and PC2_Diet tell you about dominant food categories, not about what choices were available or what people could afford.
HDI is a crude blend of life expectancy, education, and income at national or sub national level. It does not tell you if someone lives two bus rides from the nearest fresh vegetable.
Population density just says “lots of people per square kilometer” or “not many.”
You can have:
A dense, wealthy downtown with gyms and salad bars.
A dense, poor neighborhood where the only vegetables are in a can on a dusty shelf.
Both end up as “urban, high HDI” in this framework.
So yes, the GMbC cohort lets you say something about rural subsistence versus urban industrial living. It does not let you separate “rich urban” from “poor urban” in any meaningful way. It does not let you test how food insecurity within industrial contexts shapes the microbiome.
If you want to make claims about food deserts or structural poverty from these data, you are basically projecting your own assumptions on top of a nice PCA plot.
The Lifestyle PC problem
The heart of the paper is PC1_Lifestyle. That axis is used as the continuous measure of industrialization and lifestyle throughout the analyses.
PC1_Lifestyle combines:
Industrialization status based on HDI.
Access to electricity.
Subsistence strategy.
Household floor type.
It also correlates with latitude, longitude, and genetic PCs.
The authors acknowledge that correlation and then proceed as if lifestyle, diet, genetics, and geography can be neatly separated with variance inflation factors and linear models.
This is where the conceptual scaffolding starts to wobble.
Industrialization is not an independent knob you can turn separate from geography and ancestry. It is the product of centuries of colonial history, resource extraction, trade routes, and policy. The fact that PC1_Lifestyle is collinear with geography and genetic PCs is not a statistical nuisance. It is the whole story.
By forcing those into separate terms (“Lifestyle,” “Genetics,” “Geography”) and then “adjusting” for confounding, the models give the illusion that we can cleanly attribute a fraction of microbiome variation to each. That might be mathematically convenient. It is not how history works.
So when the paper says that industrialization shapes the “metaorganism” independently of host confounders and that these effects “manifest as shifts in microbiome diversity, composition, and stability,” take that independence with a generous pinch of salt.
What they really show is that PC1_Lifestyle, which encodes a bundle of correlated historical and infrastructural features, is strongly associated with microbiome variation and immune markers even after running a specific set of adjustment models.
Useful, yes. Causal, no.
The missing disease context
Another big limitation is the complete absence of clinical endpoints.
Every participant is “healthy” by self report. There is no deep phenotyping for:
Diabetes or insulin resistance.
Hypertension.
Cardiovascular disease.
Autoimmune disease.
Depression, anxiety, or trauma history.
They do have fecal biomarkers such as calprotectin and chromogranin, and they show that industrialization links to higher levels of these markers and to IgA changes.
That is interesting biology. It does not tell you whether those subtle shifts translate into actual disease risk.
When they cite previous work where microbiomes from different lifestyle contexts triggered different transcriptional responses in colon organoids, they are trying to bridge that gap. That is fine as a piece of the puzzle. It is not proof that industrialization “causes” specific disease outcomes through microbiome induced immune tuning.
In other words. This cohort is built to map patterns and generate hypotheses. It is not built to tell you which aspect of industrial living actually makes you sick.
A Necessary Detour: Calprotectin Is a Terrible North Star for “Gut Health”
Before we let this paper’s biomarker story skate by unchallenged, we need to talk about calprotectin. Because calprotectin is treated in the GMbC paper (and in far too many microbiome studies) as if it were a clean, stable, universally interpretable indicator of “gut inflammation.” It is not. It is one of the messiest, most context-dependent, most overinterpreted biomarkers in gastroenterology.
And that matters, because the GMbC authors fold calprotectin into their “industrialization shifts the metaorganism” narrative as if it were a reliable physiological anchor. It is not an anchor. It is a weather vane.
The healthy range is absurdly wide
Healthy adults can have calprotectin levels anywhere from below 10 µg/g to over 200 µg/g depending on:
age
sex
recent exercise
dietary fiber
NSAID or PPI use
sample handling
assay manufacturer
and even time of day.
Healthy individuals can exceed thresholds often used to diagnose IBD flare. That has been documented repeatedly:
Rudzki et al., 2020 report healthy adult FC levels ranging from <10 to >200 µg/g, with no clinical disease present.
Konikoff & Denson, 2006 show extensive overlap between high “normal” FC and mild IBD.
Sampietro et al., 2007 demonstrate that FC in healthy adults routinely exceeds thresholds used for “moderate inflammation.”
There is simply no such thing as a single “healthy” cutoff.
IBD versus “healthy” has huge biological overlap
Even in the diseases where calprotectin is supposed to shine, the distributions overlap like bad Venn diagrams.
Supporting examples:
Otten et al., 2008 found that adults with quiescent IBD often have FC levels within the “healthy adult” range.
Mosli et al., 2015 (meta-analysis) show wide dispersion of FC values in IBD patients and poor discriminative power in borderline ranges.
The GMbC results interpret higher calprotectin in industrialized populations as suggestive of altered host physiology and increased inflammation. But with the overlaps this wide, you could interpret anything in any direction.
The cutoff problem is a disaster
Different labs use:
50 µg/g
100 µg/g
150 µg/g
250 µg/g
as cutoffs for “abnormal,” “borderline,” or “possible inflammation.”
There is no universal consensus. Even the European Crohn’s and Colitis Organisation (ECCO) acknowledges that:
“Cutoffs for clinical utility vary widely and should not be interpreted in isolation.”(ECCO Guidelines, 2017)
More specific literature:
D’Incà et al., 2008 show cutoff accuracy is assay-dependent and varies dramatically across populations.
Van Rheenen et al., 2010 (BMJ) emphasize that FC is highly sensitive but poorly specific, and that false positives are extremely common.
In short. You cannot treat calprotectin as a clean physiological readout across twelve countries and thirty five localities unless your goal is to measure noise with great confidence.
Calprotectin is too sensitive to be meaningful across lifestyle contexts
Industrialized lifestyles correlate with:
higher NSAID use
higher stress
more access to medical care
more asymptomatic GI infections
more processed food intake
more exercise among certain groups
all of which independently elevate FC.
This means that GMbC’s observation of “higher calprotectin in industrialized populations” may say nothing about underlying immune tone. It may simply reflect higher NSAID use or subtle sample handling differences.
What this means for the GMbC interpretation
The paper uses calprotectin to support the claim that industrialization is associated with elevated gut inflammatory tone. The logic is tempting. The biology is messy. With a biomarker that varies wildly even in healthy populations, and overlaps extensively with mild disease, and has no stable international cutoff, the ability to draw cross-population conclusions is severely limited.
For a dataset as beautifully constructed as GMbC’s, relying on calprotectin as a physiological anchor is like building a marble palace and then placing a wobbling IKEA table in the center.
It does not break the whole paper.
But it seriously weakens any attempt to use calprotectin as evidence for lifestyle driven immune dysregulation.
So what is this paper really good for?
If you strip the hype and focus on what the GMbC cohort is designed to do, you get something like this.
It is very good for:
Showing how microbial diversity, composition, and network structure vary across broad lifestyle gradients anchored in HDI, rural versus urban living, and subsistence mode.
Demonstrating that industrialized adults, on average, have lower diversity and more homogeneous gut microbiomes compared to adults in non industrialized or mixed subsistence settings.
Exploring how IgA coating patterns and fecal markers of inflammation and stress shift across those same gradients.
Stress testing the portability of microbiome based “health indices” and disease classifiers that were trained in Western clinical cohorts.
It is not good for:
Understanding how structural racism, poverty, and food deserts within industrialized countries shape the microbiome.
Disentangling the effects of income, education, housing conditions, and occupational exposures from the broader “industrialization” signal.
Making strong causal claims about how industrialization produces disease through microbiome changes in real human populations.
If you keep those boundaries in mind, the paper is a valuable resource. If you ignore them, you end up repeating the same mistake this field makes over and over again, which is to treat a highly structured convenience sample as if it were a lens into global public health.
How this fits into Better Microbiome Thinking
From a “Better Microbiome Thinking” standpoint, the GMbC cohort is a case study in both progress and blind spots.
Progress
The wet lab quality and multi omics integration are excellent, which gives us a richer picture of the “metaorganism” than we had before.
The authors are open about terminology, colonial histories, and the risk of stereotyping. They actually try to wrestle with language instead of pretending it is neutral.
The portability analysis of GMHI and other disease models is a needed corrective to Western centric biomarker enthusiasm.
This is all movement in the right direction.
Blind spots
“Global” here is about geography, not about structural position. The cohort systematically misses the people who are most affected by industrialization linked disease in rich countries. That includes those living in food deserts, under chronic stress, and with limited health care access.
Lifestyle is treated as a tidy variable that can be separated from the history that created it. That might keep the models happy but it dulls our understanding of how power, policy, and inequality shape biology.
Ethnic labels are acknowledged, then discarded in favor of genetic PCs. That avoids one level of bias and introduces another, where social identity and structural discrimination evaporate into “Genetics versus Lifestyle.”
This is where Better Microbiome Thinking has to be blunt.
If we want to move beyond pretty diversity plots and into real public health, we cannot keep pretending that HDI and a population density threshold are adequate proxies for lived experience.
We need cohorts that:
Oversample the structurally marginalized rather than the conveniently reachable.
Include explicit measures of income, education, food insecurity, neighborhood environment, and structural discrimination.
Combine microbiome data with clinical outcomes and longitudinal follow up.
Involve communities as partners in question setting, not just as stool providers.
GMbC is a strong foundation for ecological questions about industrialization and the gut. It is not the final word and it does not solve our equity problem.
Take home
If you are a scientist.
Use GMbC to explore large scale ecological patterns and to challenge the external validity of your favorite Western trained models. Respect the lab work. Be ruthlessly honest about what the sampling frame can and cannot tell you.
If you are a clinician or policymaker.
Do not wave this paper around as evidence that “industrialization causes X disease through microbiome Y” or that “we now have globally representative data.” You do not. Not yet.
If you are building tools or therapeutics.
Treat GMbC as a stress test for portability, not as a complete map of the human gut. You still need cohorts that are anchored in the real social and economic gradients where your patients live.
And if you are serious about Better Microbiome Thinking.
This paper is both a milestone and a mirror. It shows what we can achieve technically when we coordinate across continents. It also reflects our habit of flattening people into tidy axes and PCs that erase the most uncomfortable parts of the story.
We can do better than that. We will have to, if we want microbiome science that actually matters for the people who are not sitting in a university clinic filling out a “healthy volunteer” form.
Key references for readers who want to dig deeper
Konikoff & Denson, 2006. “Role of fecal calprotectin as a biomarker of intestinal inflammation” Am J Gastroenterol.
D’Incà et al., 2008. “Calprotectin and clinical activity in IBD.”
Van Rheenen et al., 2010. “Faecal calprotectin for screening of patients with suspected IBD.” BMJ.
Mosli et al., 2015. Meta-analysis of FC accuracy for endoscopic inflammation.
Rudzki et al., 2020. FC variability in healthy adults.
ECCO Clinical Guidelines, 2017.


Couldn't agree more. It's like training AI with partial datasets you just miss the real picture of human health equity.