The Comparative Genomics Service CG7

CG7 is not just a common comparative genomics service to analyze SNPs, it is a new way of comparing genomes from different perspectives


1. In silico MLST


2. Analysis of SNPs in the core genome

  • Read mapping to a reference genome and SNP calling
  • Analysis of the functional effect of the detected SNPs
  • Phylogenetic analysis

3. Whole pair-wise genome comparison with Differences program

  • Detection of SNPs, Insertions and Deletions of any length across all the genome
  • SNPs, Deletions and Insertions are shown in their genomic context, allowing to explore the genes they affect

4. Orthologous table



One service for different scale projects

We have designed CG7 so that it can fit a wide range of types of projects: from large scale epidemiological projects to analyze thousands of genomes to really specific projects focused on functional differences among a small set of strains of interest.

It provides insights on the genomic differences at very different levels:

  • The smallest scale, the most specific level: one genome

    We provide a full characterization of each of the genomes individually. This includes a deep functional annotation with BG7 [Pareja-Tobes-2012], an in silico MLST and the comparison with one of the reference genomes available in databases to detect SNPs (See below).

  • Comparison of two genomes

    This level provides the client with really high value meaningful biological information on the genome differences between two strains. The strains are compared pairwise at a whole-genome level detecting any kind of differences between them, from SNPs and short indels to large genomics rearrangements. The functional implications of the differences are also analyzed giving the client valuable information on the functions altered in the genomes due to the observed changes.

  • Large scale comparative genomics projects

    CG7 is perfectly fitted for projects including hundreds or thousands of related genomes. In addition to the classical SNP detection for the core genes, CG7 offers the exhaustive genome pairwise comparison with Differences program and a genome comparison at a community level. The analysis of the genes found in the genome community, also known as the community pangenome, offers a bigger picture as well as insights on how strains are related from an evolutionary point of view. This is particularly useful for epidemiological purposes.


One service from different approaches


  • De novo assembled genomes comparative analysis

    The genomes under analysis are compared without using any external information that could bias the results.

  • Reference-guided comparative analysis

    Aligning the sequenced reads to the reference genome. An external reference is used to guide the detection of differences minimizing the impact of misassemblies or technology errors

Both approaches have their pros and cons and that is why we have decided to go for both of them and then integrate the results in a conservative manner. Those findings obtained with the two approaches have more confidence level.





1. In silico MLST

It consists in the in silico typing using the genome sequences. It is based on the sequence types (ST) defined for each species in the corresponding MLST database 

2. Analysis of SNPs in the core genome

The search of SNV (Single Nucleotide Variants) or SNP (Single Nucleotide Polymorphisms) will be focused on conserved genome, also known as core genome, avoiding the analysis of repetitive regions, mobile elements or phage regions. The strategy of analysis is similar to the carried out in the reference: PubMed ID: 24066741 

Focusing on the core genome and avoiding working with sequences likely to be the subject of horizontal gene transfer or recombination allows us to infer the evolutionary distance on the strains and build phylogenetic trees that could be interesting for epidemiological or evolutionary purposes.

Mapping and SNP calling

The reads of each genome are mapped to a reference genome and then the SNV detection is performed analyzing the alignment locally. The SNP calling will be done across all the mapped core genome sites.

Effects of the detected SNPs

The filtering and evaluation of the effect of the variants are performed providing data of the location of the SNPs with respect to the annotated genes of the reference genome.

Phylogenetic tree

A phylogenetic tree of the strains under study is generated based on the SNPs detected in the core genome.

3. Whole pair-wise genome comparison with Differences program

Detection of insertions and deletions of any length across all the genome

Differences program compares two genomes at a whole genome level. It is especially well-suited for the detection of substitutions, and insertions or deletions of any length and at any region of the genome (not only in the core genome).

Differences in the genomic context

The differences between the two compared genomes are also provided in the genomic context of the BG7 annotation [Pareja-Tobes-2012] for each genome. It allows us a better evaluation of their possible implications in phenotypic changes or in epidemiological identification. We use Mauve tool for the alignment of the two genomes and then we integrate the detected differences with the functional annotations obtained with BG7 [Pareja-Tobes-2012]. It allows analyzing the differences in gene sequences as well as in intergenic non coding regions.

4. Orthologous table

Firstly we build a “pangenome set of proteins” representing all the proteins encoded by all the genes from any of the genomes of the set to be compared. Secondly, we detect all the orthologous proteins in each genome and build the orthologous table.

A rich functional annotation for each protein of the "pangenome set of proteins" is provided.

Contact us by e-mail:


Or using this form

Please write de text that appears in the image
* Required fields.