Taxonomic and functional profiling of the microbiomes

Metagenomics has brought new challenges to bioinformatics. We have solved this challenge using cloud computing that allows us to analyze metagenomics massive data providing us scalable, real time and on demand computing.

We offer a quick and high quality analysis of massive data from taxonomic profiling of 16S rRNA samples to functional profiling of complex metagenomics samples using shotgun sequencing approaches.

We provide you a complete solution solving all the steps of your microbiome project:

  • The sample processing
  • The sequencing
  • The bioinformatics analysis
  • The interpretation providing you rich reports with charts and visualizations.

Era7 Bioinformatics is especially committed to provide services for microbiome analysis. We adapt our microbiome data analysis service to the requirements of each project and design solutions for specific goals such as detection of genes with specific functionalities or enzymatic activities, the comparison between communities, microbiome differences before and after a treatment, study of the abundance of microbial species under varying environmental conditions or analysis of metabolic pathways.

What sequencing technologies?

We provides sequencing and bioinformatics analysis of microbiomes using illumina and Pacbio sequencing technologies.

Our methods of microbiome bioinformatics analysis have a version specifically designed for illumina and another for PacBio sequences. 

Our databases of reference sequences

Our databases of reference sequences

We have built our reference database DB7 of 16S and 18S sequences based on the complete RNAcentral. RNAcentral is a general database for all the types of non coding RNA maintained by RNAcentral Consortium. We have manually curated the 16S database and have designed systematic curation approaches that allow us doing a rapid curation of the next RNAcentral releases.

See the description of our 16SDB7 database of 16S reference sequences here:

Exhaustive taxonomic assignment for each read

We compare each read against all the sequences in our DB7 database (see above). The taxonomic assignment for each read is based on an exhaustive BLASTN of each read against our DB7 database of 16S and 18S sequences.

We do a specific taxonomic assignment for each read avoiding a previous step of binding and clustering. Some methods of assignment compare the sequences only against a small rRNA database or avoid computational cost clustering or binning the sequences first, and then doing the assignments only for the representative sequence of each cluster. Assignment based on direct similarity of each read, one by one, compared against a sufficiently wide database is a very exhaustive method for assignment [Segata-2013] [Morgan-2012].

We use two different taxonomic assignment approaches: Best Blast Hit (BBH) and Lowest Common Ancestor (LCA).

Best Blast Hit (BBH) approach

The taxonomic assignment is based on the Best BLAST Hit obtained in the BLASTN of each read against DB7 database.

Each read is assigned to the taxon corresponding to the Best Blast Hit. Only the hits over a threshold of similarity (perc_identity) and with the aligned region over a threshold of percentage of the query length (qcovers) are considered. After that filtering we select for each read only those hits reaching the maximum percentage of identity obtained for that read and, among them, we select the hit with the higher bitscore as the BBH. 

The parameters can be adapted for each project depending on the length of the reads, on the error rate of the sequencing technology, and even on the rareness of the organisms that are expected in a sample.

Lowest Common Ancestor (LCA) approach

In this case the taxonomic assignment is based on the Lowest Common Ancestor paradigm. We follow the same filtering protocol that for BBH assignment selecting only the hits over a threshold of similarity and over an alignment length (qcovers). We select for each read only those hits reaching the maximum percentage of identity obtained for that read. Then we obtain their taxonomic assignments and we search on the taxonomy tree the node including all the assignments, which is their Lowest Common Ancestor taxon. Some reads could not find sequences with enough similarity in the database and then they would be classified as reads with no hits.

The LCA approach has been adopted by advanced tools of metagenomics analysis as the last version of MEGAN [Huson–2013]. We have adopted an assignment algorithm very similar to the algorithm used in MEGAN.

Our LCA algorithm for taxonomic assignment

The goal of the algorithm is to assign each read to a node of the taxonomy tree. For each read these are the steps that we do:

  • Select all hits with qcovs (percentage of the query sequence aligned to the subject sequence) over the defined threshold
    • Select the hits with the maximum perc_identity for each read
      • For each read to calculate (sensu stricto) the Lowest Common Ancestor (LCA) for the taxonomic assignments of the selected reads with the maximum perc_identity
Lowest Common Ancestor Algorithm of assignment used in MG7

MG7 steps in the process of analysis of the reads

The reads are analyzed following a complex process in the cloud using advanced methods of parallelization working with the possibilities that offer the Amazon Web Services (AWS).

In this process each read will be assigned to a taxon based on sequence similarity to DB7 database sequences. Massive BLASTN tasks were performed to achieve this using MG7 developed by Era7. (See MG7 preprint in


Steps of the taxonomic assignment process in MG7 microbiome analysis tool developed by Era7


Contact us by e-mail:


Request a quote or more information using this form:

PRIVACY POLICY: In compliance with the requirements of the current legislation on protection of personal data, Era7 Information Technologies SL informs you that the data collected through this form will be treated with the purpose of sending information about our products, services and activities. We also inform you that the data collected will not be communicated to third parties unless there is a legal obligation and that, in the case of authorizing the receipt of information by email, your data may be transferred to servers located in the United States whose owner company (The Rocket Science Group LLC - Mailchimp) is under the agreement of Privacyshield. You can exercise the rights of access, rectification, cancellation or opposition through the e-mail address, as well as through the means detailed in the additional information on our Privacy Policy.

Please write de text that appears in the image
* Required fields.