What can metagenomics do for you?

Microbiome data is leading to innovative solutions in diverse industries, from human and animal health to agriculture and the built environment. Next-generation sequencing has allowed researchers new insights into the microbial world with high levels of resolution—that is, they can precisely identify many of the bacteria and other microorganisms present. Not only that, but these technologies have enabled higher throughput than ever before. Foundational technologies, such as amplification and sequencing of phylogenetic markers, including the 16S rRNA gene, have become standard tools for understanding how microbial communities are structured and how they respond to changes in their environment.

However, amplicon sequencing does have some limitations in the type and resolution of the information it provides. This is where metagenomics — the direct recovery of total genomic information from the environment — can make a difference. Amplicon sequencing readily provides information at roughly the genus level; with care, it can identify microbial species and strains only under specific circumstances. Metagenomics reliably provides up to strain-level resolution (Figure 1). It also provides information about function—what the microorganisms’ genes equip them to do.

Figure 1. Species level classification of Staphylococcus species in skin samples recovered from amplicon sequencing and metagenomics. Metagenomics was able to resolve the taxonomy up to species and show that different body types select for different Staphylococcus species. Data from patient HV07 from Oh et al. 2014 (doi: 10.1038/nature13786).

Functional information is useful to understand the mechanisms underlying the changes in the microbial community, to reconstruct the metabolism of the community as an entity, and to discover new genes and pathways (Figure 2). The addition of functional information is also helpful to understand what groups provide what functions and how much redundancy exists for that function, which can have implications for the degree of resilience of the community (how it can bounce back after perturbations).

Figure 2. A. Changes in the abundances of key carbohydrate active enzymes in the soil ten years after forest harvesting. Differences were present in enzymes involved in the degradation of plant carbohydrates such as cellulose and hemicellulose. Modified from Cardenas et al. 2014 (doi:10.1038/ismej.2015.57) B. Metabolic reconstruction of the aerobic n-alkane degradation by partially-recovered genome from a metagenome of an oil reservoir. Expression levels are represented in blue barplots. Modified from Liu 2018 (DOI 10.1186/s40168-017-0392-1)

A second advantage of metagenomics is that it recovers data from all microbial community members, so the information will not be limited to bacteria (as when using a 16S rRNA) but also include data for fungi, viruses, and other groups. One example: using metagenomics, Oh et al. 2014 (Figure 3) mapped the abundance of bacterial and fungal species, and viral groups to different skin locations, identified functional gene differences across sites, and recovered 67 partial genomes (bacterial, viral, and eukaryotic). When samples have low diversity (e.g. enrichments), metagenomics can recover high quality draft genome sequences from community members. The genome of Kuenenia stuttgardiensis, one of the first characterized anaerobic ammonia oxidizers, was obtained from a metagenome of a bioreactor sample (see doi:10.1038/nature04647) without the need for cultivation.

Figure 3. (A) Average multi-kingdom relative abundances for 15 healthy adults stratified by skin characteristics. (B) Detailed phyla-level composition for two of those patients. Data from Oh et al. 2014 (doi: 10.1038/nature13786).

Metagenomics also comes with its own limitations. Since sequencing is done for the whole community, analysis can be challenging if too much host DNA is present or for samples with very low biomass. In the first case, most of the data will be of little interest since the host is not the target. In the second case, only a small part of the community will be reflected in the data, leading to a biased understanding of the microbiome. Finally, the applications of metagenomics depend on the depth of sequencing (Figure 4). Having higher sequencing coverage allows for recovery of data from more community members, assembly of short reads into larger contigs, and the use of those contigs to reconstruct genes, pathways, and genomes.

Figure 4. Effect of sequencing effort on the range of possible analyses of metagenomes.

In addition to these challenges, the public databases which are used for data comparison are constrained. These databases contain sequence information as well additional data such as the organism the sequences came from, the location and date of sampling, functional annotation, and links to related publications. Databases link sequence information with taxonomy and function and represent the historic efforts of researchers worldwide (and consequently their biases). These databases are limited first because most genes in any genome, even those from well-studied groups, lack biochemical characterization; and second, databases are biased towards human-related and pathogenic groups. Poorly represented groups in the databases include the archaea, fungi, viruses, and small eukaryotes; poorly represented environments include soils. Yet, this may not be a roadblock, but a challenge that will lead us to a better understanding of the microbial world.

“Both the cost and complexity barriers to metagenomic and metatranscriptomic sequencing have been greatly reduced, meaning these shotgun approaches are now practical ways to very precisely profile the human microbiome and other microbial communities,” says Curtis Huttenhower, Microbiome Insights Scientific Advisory Board member and Associate Professor of computation biology and bioinformatics at the Harvard T.H. Chan School of Public Health (Boston). “Metagenomics can now easily provide strain tracking and functional information that is difficult to obtain using amplicon sequencing, and these can further be integrated with metatranscriptomics, metabolomics, or other culture-independent molecular data to understand microbial community bioactivity.”

Microbiome Insights provides a full suite of services, including both amplicon sequencing and metagenomics. We can help you answer the question: where will metagenomics can take you?


From the human genome to the human microbiome: Toward clinical applications

Genetics versus Genomics

In 1991 the Human Genome Project—a collaborative effort to map the whole human genome—was established. A 5-year plan was put in place addressing the initial framework for the efforts including reliable testing methods, validated protocols, and milestones along the way. This marked a different path from previous studies of genetics—that is, the study of genes, or rather the identification of a particular gene that may be instrumental in a phenotypic outcome. Much of the work in this field had previously been exploratory in nature, with a growing body of evidence linking certain genetic variations or single nucleotide polymorphisms (SNPs) to disease states.

In mid-2000 it was announced that the Human Genome Project had published their results of the almost completely sequenced human genome. While the results were interesting, the data were a far cry from being applicable. What it did do was spur further interest in developing better technologies that would allow cheaper and faster sequencing to add onto these initial findings.

In the years spanning 2004 to 2014 a multitude of companies were competing to churn out faster and better technologies such as the Roche 454 and the Illumina sequencing systems. The technologies were proving to be advantageous in many ways; for example, iterations of these technologies were serving to advance the microbiological sciences.

Awareness of the Microbiome

While the whole genome studies were mushrooming during this decade, the study of microbes was still largely based on culture dependent techniques and there was very little information or interest in communities of microbes residing in the body. Basic microbiology was built on the identification of single pathogenic microbes that were instrumental in disease states, while the non-pathogenic microbes were believed to lie dormant. However, certain areas of research focused on how microbes might influence host, or vice versa.

It was becoming widely accepted that microbes in the gut had a part to play in localized gut related diseases such as Crohn’s but it was less understood how the commensal bacteria shifted in abundance, and what caused these ideal growth conditions. This curiosity began to blossom, largely due to the advances in technology brought about by the human genome project, that would allow these growing questions (and concerns) to be addressed affordably and quickly. In 2007 the Human Microbiome Project was born.

“The recent emergence of faster and cost-effective sequencing technologies promises to provide an unprecedented amount of information about these microbial communities, which will bolster the development and refinement of analytical tools and strategies.”

– NIAID Director, Anthony S. Fauci

Microbial Snapshots

Once it was established that the microbiome was of interest, and of importance to the host, researchers developed new methods for studying it by taking advantage of the high throughput sequencing technologies that came to market during the genomics boom. First amplicon sequencing methods and later shotgun metagenome methods were the gold standard in microbiome research. But scientists began to acknowledge several factors as information about different ecosystems was being compared; first, that microbiomes were specific to their locations and diverse in nature, making them quite different from one body site to another. This was a paradigm shift as many had not considered this level of diversity in commensal and pathogenic bacteria, but also as compared to receiving the same genetic information from every host cell regardless of its location in the body. Secondly, the microbiomes are ever shifting and, upon collection, must be stabilized in such a way that the ‘snapshot’ is maintained at time zero. This means that factors such as temperature and moisture could quickly change a microbial profile if the sample is not treated with care. This opened the doors to a wide variety of collection devices and stabilization buffers with specific media to help maintain these profiles while being interoperable to laboratory procedures.

Clinical Applications

We already see clinical and diagnostic applications for microbiome findings. Although we are still working towards scientifically validating these applications we seem to be on a similar trajectory as we saw with genomics research in terms of diagnostic applications, publications, and consumer-friendly offerings. Interestingly, a singular ‘omics’ (i.e. proteomics, metabolomics, genomics, microbiomics) is informative on its own, but combining multiple features to define functionality of systems in the body will prove to be more fruitful in the long run. Understanding the complex nature of these systems and how they interact will enable us to see how changes or shifts in one system can have effects in other systems. This multi-omics approach is the basis for personalized medicine and furthermore can apply in other domains such as plants, animals, and environmental ecosystems.

At Microbiome Insights we are working with researchers to elucidate synergistic effect of multiple ‘omics’ at work. With this approach we are focused on the skin microbiome, the gut-brain axis, pharmacology, and other areas of science that bring together the genome and microbiome for a better understanding of human health.