One service for different scale projects
We have designed CG7 so that it can fit a wide range of types of projects: from large scale epidemiological projects to analyze thousands of genomes to really specific projects focused on functional differences among a small set of strains of interest.
It provides insights on the genomic differences at very different levels:
The smallest scale, the most specific level: one genome
We provide a full characterization of each of the genomes individually. This includes a deep functional annotation with BG7 [Pareja-Tobes-2012], an in silico MLST and the comparison with one of the reference genomes available in databases to detect SNPs (See below).
Comparison of two genomes
This level provides the client with really high value meaningful biological information on the genome differences between two strains. The strains are compared pairwise at a whole-genome level detecting any kind of differences between them, from SNPs and short indels to large genomics rearrangements. The functional implications of the differences are also analyzed giving the client valuable information on the functions altered in the genomes due to the observed changes.
Large scale comparative genomics projects
CG7 is perfectly fitted for projects including hundreds or thousands of related genomes. In addition to the classical SNP detection for the core genes, CG7 offers the exhaustive genome pairwise comparison with Differences program and a genome comparison at a community level. The analysis of the genes found in the genome community, also known as the community pangenome, offers a bigger picture as well as insights on how strains are related from an evolutionary point of view. This is particularly useful for epidemiological purposes.
2. Analysis of SNPs in the core genome
The search of SNV (Single Nucleotide Variants) or SNP (Single Nucleotide Polymorphisms) will be focused on conserved genome, also known as core genome, avoiding the analysis of repetitive regions, mobile elements or phage regions. The strategy of analysis is similar to the carried out in the reference: PubMed ID: 24066741
Focusing on the core genome and avoiding working with sequences likely to be the subject of horizontal gene transfer or recombination allows us to infer the evolutionary distance on the strains and build phylogenetic trees that could be interesting for epidemiological or evolutionary purposes.
Mapping and SNP calling
The reads of each genome are mapped to a reference genome and then the SNV detection is performed analyzing the alignment locally. The SNP calling will be done across all the mapped core genome sites.
Effects of the detected SNPs
The filtering and evaluation of the effect of the variants are performed providing data of the location of the SNPs with respect to the annotated genes of the reference genome.
A phylogenetic tree of the strains under study is generated based on the SNPs detected in the core genome.
3. Whole pair-wise genome comparison with Differences program
Detection of insertions and deletions of any length across all the genome
Differences program compares two genomes at a whole genome level. It is especially well-suited for the detection of substitutions, and insertions or deletions of any length and at any region of the genome (not only in the core genome).
Differences in the genomic context
The differences between the two compared genomes are also provided in the genomic context of the BG7 annotation [Pareja-Tobes-2012] for each genome. It allows us a better evaluation of their possible implications in phenotypic changes or in epidemiological identification. We use Mauve tool for the alignment of the two genomes and then we integrate the detected differences with the functional annotations obtained with BG7 [Pareja-Tobes-2012]. It allows analyzing the differences in gene sequences as well as in intergenic non coding regions.
4. Orthologous table
Firstly we build a “pangenome set of proteins” representing all the proteins encoded by all the genes from any of the genomes of the set to be compared. Secondly, we detect all the orthologous proteins in each genome and build the orthologous table.
A rich functional annotation for each protein of the "pangenome set of proteins" is provided.