Why I’m Building This

I’ve seen how AI can accelerate biomedical research—when the data is there.

The microbiome field has produced promising leads for years. But turning those leads into clinical impact is still slow. One of the main blockers: the lack of high-quality, standardized datasets that are usable at scale.

Public data is abundant, yet underused. Because it’s fragmented, inconsistently processed, and poorly annotated. Metadata is messy or missing. Multicentric cohorts are difficult to compare. Reproducibility is a major issue.

Biotechs and academic groups often have to build internal infrastructure just to make the data usable. That’s a massive waste of time and expertise.

I’m starting with something simple but foundational: a clean, structured metadata database of all publicly available human gut microbiome shotgun studies.

This is the first building block. The goal is to make multicentric data a strength—not a hassle.

What comes next will be driven by real-world needs: standardized processing pipelines, visualization and analysis tools, support for private data integration, and training of microbiome-specific AI models.

If you’re working on microbiome data and facing these challenges, I’d love to hear from you.
Let’s build the tools you wish you had!

Browse curated data →

Want early access? Join the waitlist.

I'm Kathryn Schutte, I'm a data scientst working at the intersection of AI and medical research. I spent the last 6 years advancing research in medical oncology at Owkin. Now I want to bring all those wonderful tools to the field of microbiome research.