CG7: COMPARATIVE GENOMICS PIPELINE for bacteria |
CG7 is not just a common comparative genomics service..........
it is a different service for detecting all differences between genomes
- Read mapping to a reference genome and SNP calling
- Effects of the detected SNPs
- Phylogenetic tree
- Detection of insertions and deletions of any length across all the genome
- SNPs, Deletions and Insertions are shown in their genomic context, with functional annotations
One service for different scale projects
We have designed CG7 so it fits in a wide range of project types: from large scale epidemiological projects for analyzing thousands of genomes to really specific projects focused on functional differences among a small set of strains of interest.
It provides insights on the genomic differences at very different levels:
We provide a full characterization of each of the genomes individually. This includes a deep functional annotation with BG7 [Pareja-Tobes-2012], an in silico MLST and the comparison with a reference genome available in databases for detecting SNPs (See below).
This level provides the client with really high value meaningful biological information on the genome differences between two strains. The strains are compared pairwise at a whole-genome level detecting any kind of differences between them, from SNPs and short indels to large genomics rearrangements. The functional implications of the differences are also analyzed giving the client valuable information on the functions altered in the genomes due to the observed changes.
CG7 is perfectly fitted for projects including hundreds or thousands of related genomes. In addition to the classical SNP detection for the core genes, CG7 offers the exhaustive genome pairwise comparison with Differences program and a genome comparison at a community level. The analysis of the genes found in the genome community, also known as the community pangenome, offers a bigger picture as well as insights on how strains are related from an evolutionary point of view. This is particularly useful for epidemiological purposes.
The genomes under analysis are compared without using any external information that could bias the results.
Aligning the sequenced reads to the reference genome. An external reference is use to guide the detection of differences minimizing the impact of misassemblies or technology errors
Both approaches have their pros and cons and that is why we have decided to go for both of them and then integrate the results in a conservative manner. Those findings obtained with the two approaches have more confidence level.
....................................................................................................................................................................................................................................................................
1. In silico MLST
It consists in the in silico typing using the genome sequences. It is based on the sequence types (ST) defined for each species in the corresponding MLST database
2. Analysis of SNPs in the core genome
The search of SNV (Single Nucleotide Variants) or SNP (Single Nucleotide Polymorphisms) will be focused on conserved genome, also known as core genome, avoiding the analysis of repetitive regions, mobile elements or phage regions. The strategy of analysis is similar to the carried out in the reference: PubMed ID: 24066741
Focusing on the core genome and avoiding working with sequences likely to be subject of horizontal gene transfer or recombination allows us to infer the evolutionary distance on the strains and build phylogenetic trees that could be interesting for epidemiological or evolutionary purposes.
Mapping and SNP calling
The reads of each genome are mapped to a reference genome and then the SNV detection is performed analyzing the alignment locally. The SNP calling will be done across all the mapped core genome sites.
Effects of the detected SNPs
The filtering and evaluation of the effect of the variants is performed providing data of the location of the SNPs with respect to the annotated genes of the reference genome.
Phylogenetic tree
A phylogenetic tree of the strains under study is generated based on the SNPs detected in the core genome.
3. Whole pair-wise genome comparison with Differences program
Detection of insertions and deletions of any length across all the genome
Differences program compares two genomes at a whole genome level. It is specially well-suited for the detection of substitutions, and insertions or deletions of any length and at any region of the genome (not only in the core genome).
Differences in the genomic context
The differences between the two compared genomes are also provided in the genomic context of the BG7 annotation [Pareja-Tobes-2012] for each genome. It allows us a better evaluation of their possible implications in phenotypic changes or in epidemiological identification. We use Mauve tool for the alignment of the two genomes and then we integrate the detected differences with the functional annotations obtained with BG7 [Pareja-Tobes-2012]. It allows analyzing the differences in gene sequences as well as in intergenic non coding regions.
4. Orthologous table: The community pangenome
Firstly we build a “pangenome set of proteins” representing all the proteins encoded by all the genes from all the genomes of the set to be compared. Secondly, we detect all the ortholous proteins at each genome and build the orthologous table.
A rich functional annotation for each protein representing a “pangenome protein” is provided.
Ask about our prices! Contact us: [email protected]