GeMMA reconciles the metabolic pathways encoded in a microbial community's genomes with the pathways observed in the metabolome — and tells you, at pathway, taxon, and reaction resolution, where they disagree. Built for paired microbiome–metabolome studies in human health, agricultural, and environmental contexts.
Capacity — fraction of a pathway's reactions present in the community model, per sample. Activity — fraction of that pathway's metabolites detected in the metabolome, per sample. Every pathway in every sample gets one of four labels: active, silent (capacity without activity), exogenous (activity without capacity), or absent.
Silent pathways are the interesting ones. They mark places the community could be doing something but isn't — regulation, substrate limitation, or a pathway the measurement panel missed.
The minimally-connected reaction–metabolite subgraph linking every measured metabolite through the community's reaction network, with edges weighted toward actively-carrying reactions. Transport, exchange, demand, and currency metabolites (ATP, NAD, H₂O, CoA, …) are excluded from the underlying graph — they're biochemical plumbing that would turn every network view into a hairball.
The result is a dense but readable backbone of the metabolism that actually reaches the measured compounds.
Per-taxon scoring that contrasts composition change with metabolic-activity change between two sample groups. Each taxon lands in one of four quadrants: driver (enriched + more active), passenger (enriched but not more active), protective (depleted but still more active per-capita — keystone loss), depleted (less of both).
Protective-quadrant taxa are the ones standard differential-abundance methods miss entirely — the single clearest source of keystone-candidate hypotheses in microbiome studies.
Every analysis is repeated at each rank from strain through phylum. Scores aggregate correctly up the taxonomy, so a signal that looks weak at genus can reveal itself as a coherent family-level shift — and vice versa.
GTDB and NCBI name-shard ambiguity (Clostridium / Clostridium P / Clostridium Q) is collapsed at the parent genus before scoring, so the table doesn't split a single biological signal across three rows.
42 post-gastrectomy patients vs 54 healthy controls. Total gastrectomy removes the stomach's acid barrier, exposing the colonic community to substrates that don't normally reach it. The phenotype should be a community-wide metabolic activation, not a few-taxon bloom. GeMMA surfaces exactly that — and names the taxa doing it.
A mix of short-chain-fatty-acid producers and mucin specialists — consistent with the substrate-exposure story.
Numbers above were reproduced end-to-end on platform commit 5683ce3 with default UI parameters (genus rank, punch permutation FDR ≤ 0.2). Validation record in docs/case_studies/erawijantari_validation.md; figures embedded in the case-study guide are the ones produced by this run.
Dataset is a built-in demo on the Forge platform. Running the same analysis on your own paired microbiome–metabolome cohort takes less time than rendering a single Manhattan plot and is fully reproducible (RNG seed, pinned VMH vocabulary, container checksums captured on every run).
Pitched for the sceptical bioinformatician. GeMMA isn't a replacement for your differential-abundance or pathway-prediction toolchain; it answers a question the others don't ask. Where another tool does something GeMMA doesn't, we say so.
| Capability | GeMMA | MaAsLin2 | HUMAnN3 | MIMOSA2 | MelonnPan |
|---|---|---|---|---|---|
| Compositional differential abundance (feature ↔ metadata) | ✓ via CLR+Wilcoxon | ✓ GLM / mixed models (broader) | — | — | — |
| Pathway-level abundance from shotgun reads | community-weighted reactions | — | ✓ MetaCyc, read-derived | — | — |
| Joint microbiome ↔ metabolome framing | ✓ mechanistic (GSMM) | — | — | ✓ CMP-regression framing | ✓ elastic-net prediction |
| Identifies keystone-candidate taxa composition misses | ✓ protective quadrant | — | — | via residuals | — |
| Concordance between capacity and observation | ✓ silent / active / exogenous / absent | — | — | partial | — |
| Mechanistic subnetwork (not just pathway list) | ✓ Steiner tree over measured metabolites | — | — | — | — |
| Requires paired metabolomics | yes (core framing) | no | no | yes | training cohort only |
MaAsLin2 and HUMAnN3 answer composition and functional-prediction questions without using your metabolomics; MIMOSA2 and MelonnPan model the microbiome ↔ metabolome coupling statistically. GeMMA is the only one that uses a genome-scale metabolic reconstruction of the actual community — and therefore the only one where a silent pathway is mechanistically distinguishable from an unmapped one.
Upload your data, configure the run, read the results. Suited to teams with their own bioinformatics capacity who want the tool under their control.
Request Forge access →We run the analysis end-to-end against your cohort and deliver interpretation, figures, and methods text ready for your manuscript or internal decision. Typically 4–8 weeks from data receipt to deliverable.
Schedule a scoping call →For PIs who want GeMMA in a paper. Co-authorship, methods writing, figure preparation, reviewer-response support.
Discuss a collaboration →Every GeMMA run captures an RNG seed, is anchored to a pinned VMH metabolite vocabulary (SHA-verified on load), and records its parameters on the analysis record. Benchmarks across 14 curated microbiome–metabolome studies and a sensitivity sweep on the core parameters ship as management commands — intended as a reviewer's first stop.
Preprint in preparation. For the method's underlying work and related essays see insilijo.github.io.
code
github.com/insilijo/GeMMA
platform
github.com/insilijo/forge