Taxonomic and functional profiling of the microbiomes using MG7

Metagenomics has brought new challenges to bioinformatics. Cloud computing solves the problem of massive data analysis providing scalable, real time, on demand computing for metagenomics data analysis. From 16S rRNA analysis to functional profiling of complex metagenomics samples from shotgun sequencing approaches, we offer a quick analysis of massive data.

Era7 Bioinformatics is especially committed to provide services for microbiome analysis. We adapt our metagenomics data analysis service to the requirements of each project and design solutions for specific goals such as detection of genes with specific functionalities or enzymatic activities, the comparison between communities, microbiome differences before and after a treatment, study of the abundance of microbial species under varying environmental conditions or analysis of metabolic pathways.

What is MG7?

MG7 is a complete metagenomics data analysis tool developed by Era7 Bioinformatics oriented to provide taxonomic assignment results for big sets of sequences. MG7 pipelines of analysis are continuously being updated with the newest approaches for metagenomics analysis.

Our Reference database DB7

We have built our reference database DB7 of 16S and 18S sequences based on the complete RNAcentral release 5 . RNAcentral is a general database for all the types of non coding RNA maintained by RNAcentral Consortium: http://rnacentral.org/expert-database

DB7 database is based on RNAcentral

 

RNAcentral includes the 16S and 18S sequences from the most important databases for metagenomics data analysis:

  • Silva
  • GreenGenes
  • RDP
  • ENA (all non coding RNA included at ENA)
  • RefSeq (all non coding RNA included at RefSeq

We have manually curated the database and have designed systematic curation approaches that allow us doing a rapid curation of the next RNAcentral releases.

Exhaustive taxonomic assignment for each read

We compare each read against all the sequences in our DB7 database (see above). The taxonomic assignment for each read is based on an exhaustive BLASTN of each read against our DB7 database of 16S and 18S sequences.

We do a specific taxonomic assignment for each read avoiding a previous step of binding and clustering. Some methods of assignment compare the sequences only against a small rRNA database or avoid computational cost clustering or binning the sequences first, and then doing the assignments only for the representative sequence of each cluster. Assignment based on direct similarity of each read, one by one, compared against a sufficiently wide database is a very exhaustive method for assignment [Segata-2013] [Morgan-2012].

We use two different taxonomic assignment approaches: Best Blast Hit (BBH) and Lowest Common Ancestor (LCA).

Best Blast Hit (BBH) approach

The taxonomic assignment is based on the Best BLAST Hit obtained in the BLASTN of each read against DB7 database.

Each read is assigned to the taxon corresponding to the Best Blast Hit. Only the hits over a threshold of similarity (perc_identity) and with the aligned region over a threshold of percentage of the query length (qcovers) are considered. After that filtering we select for each read only those hits reaching the maximum percentage of identity obtained for that read and, among them, we select the hit with the higher bitscore as the BBH. 

The parameters can be adapted for each project depending on the length of the reads, on the error rate of the sequencing technology, and even on the rareness of the organisms that are expected in a sample.

Lowest Common Ancestor (LCA) approach

In this case the taxonomic assignment is based on the Lowest Common Ancestor paradigm. We follow the same filtering protocol that for BBH assignment selecting only the hits over a threshold of similarity and over an alignment length (qcovers). We select for each read only those hits reaching the maximum percentage of identity obtained for that read. Then we obtain their taxonomic assignments and we search on the taxonomy tree the node including all the assignments, which is their Lowest Common Ancestor taxon. Some reads could not find sequences with enough similarity in the database and then they would be classified as reads with no hits.

The LCA approach has been adopted by advanced tools of metagenomics analysis as the last version of MEGAN [Huson–2013]. We have adopted an assignment algorithm very similar to the algorithm used in MEGAN.

Our LCA algorithm for taxonomic assignment

The goal of the algorithm is to assign each read to a node of the taxonomy tree. For each read these are the steps that we do:

  • Select all hits with qcovs (percentage of the query sequence aligned to the subject sequence) over the defined threshold
    • Select the hits with the maximum perc_identity for each read
      • For each read to calculate (sensu stricto) the Lowest Common Ancestor (LCA) for the taxonomic assignments of the selected reads with the maximum perc_identity
Lowest Common Ancestor Algorithm of assignment used in MG7

MG7 steps in the process of analysis of the reads

The reads are analyzed following a complex process in the cloud using advanced methods of parallelization working with the possibilities that offer the Amazon Web Services (AWS).

In this process each read will be assigned to a taxon based on sequence similarity to DB7 database sequences. Massive BLASTN tasks were performed to achieve this using MG7 developed by Era7. (See MG7 preprint in http://biorxiv.org/content/early/2015/09/28/027714).

 

Steps of the taxonomic assignment process in MG7 microbiome analysis tool developed by Era7

Microbiome Applications

Traditional microbial genome sequencing relies upon cultivated clonal cultures but the new era of genomics is facing a new challenge: the metagenomics analysis.

Genomics analysis of the microbial communities contained in an environmental sample is one of the applications of Next Generation Sequencing Data both, in the case of 16S or 18S metagenomics or in the case of shotgun metagenomics.

The number of publications about metagenomics is exponentially growing.

Click on each area of the panel to see some recent publication about metagenomics in that field:

MICROBIOME APPLICATION AREAS © Era7 Bioinformatics
Microbiome in human health Microbiome and diseases MIcrobiome and infections Microbiome in drug discovery
MIcrobiome in agrifood Microbiome and agriculture Microbiome and environment Microbiome in veterinary

 

Interactive visualizations

We provide complete reports with charts and interactive visualizations of the results:

Contact us by e-mail:

  • info@era7.com

Or using this form



Please write de text that appears in the image
* Required fields.