Complete de novo genome characterization of isolates from outbreaks by means of PacBio and Illumina sequencing technologies
The aim of this study was to test the benefits of the use of NGS technologies and a de novo assembly approach for the genome characterization of isolates from an outbreak. Six isolates from an outbreak of carbapenemase producing Klebsiella pneumoniae ST11 OXA-48 were sequenced with Illumina and one of them (F64) was selected to be sequenced with PacBio in order to have an internal genome reference for the outbreak.
The same ADN from the Klebsiella genome F64 was sequenced with PacBio and with illumina. PacBio reads were assembled using HGAP pipeline and independently illumina reads were assembled with SPADES. Both assemblies were compared and evaluated with QUAST.
The number of mismatches per 100,000 bp was 1.91.
- PacBio allows getting really high quality, closed genome to get a high quality internal reference
- NGS is the new gold standard in studies of transmission dynamics and strain relatedness
- Comparative genomics analysis allows the complete characterization of a set of isolates from an outbreak
Metapasta: scalable tool for microbial community proling
Presented at: "Exploring Human Host-Microbiome Interactions in Health and Disease" 29 June - 1 July 2015 Wellcome Trust Genome Campus, Hinxton, Cambridge, UK
What is Metapasta?
Metapasta is a cloud-based tool for microbial community profiling. It's designed to answer questions like:
- Which species are presented in the microbial sample?
- How many different species are presented in the sample?
- How many species from the given genus are presented in the sample?
1. Merging paired-end reads by FLASh.
2. Mapping reads mapped against the 16S database by BLAST (or LAST).
3. Assigning each e read to a taxon or signing it as unassigned.
Why cloud computing?
Mapping NGS reads against the 16S database is quite computationally expensive task. For example, even on a fast computer with a SSD and a big size of RAM mapping of one read against the database with BLAST takes more than 0.25 seconds.
Archivo: Metapasta Poster.pdf
Metapasta: a Fast Horizontally Scalable Tool based on Cloud Computing and Graph Databases for Microbial Diversity Community Profiling
Our 16S Reference Database is a curated subset of sequences from NCBI NT database selected by similarity with the sequences of the RDP database. Curation steps were performed to remove sequences with poor taxonomic assignments.
Metapasta is an open-source, fast and horizontally scalable tool for community diversity profiling based on the analysis of 16S metagenomics data. Metapasta generates the direct and cumulative frequencies for all the identified taxa in absolute and percentage values using the Lowest Common Ancestor paradigm for Taxonomic assignment. Metapasta is implemented in Scala and based on cloud computing (Amazon Web Services). The graph database platform Bio4j (www.bio4j.com) is used for retrieving taxonomy data.
For distributing and coordinating computational tasks it uses Nispero : http://ohnosequences.com/nispero.
- Metapasta allows the massive analysis of 16S metagenomics data in an efficient and scalable manner
- Metapasta analyzes alpha and beta diversity - It can be easily customized to different experimental designs
- Future work includes the adaptation of Metapasta for the analysis of shotgun metagenomics Metapasta is an open-source tool released under the AGPLv3 license and available at : https://github.com/ohnosequences/metapasta/
Sequencing, de novo assembly, annotation and comparative genomics for six carbapenemase producing ST11 Klebsiella pneumoniae genomes
The 6 isolates under this study were obtained from an outbreak of Klebsiella pneumoniae ST11 OXA-48 carbapenemase producing strains with a profile of antibiotic multiresistance.
One of the ST11 genomes, F64, was sequenced with PacBio technology. It allowed getting a finished genome (one chromosome and 3 plasmids), in only one sequencing experiment.
One of the F64 plasmids (contig 2) is very similar to E71T plasmid and bears the antibiotic resistance genes: blaOXA-48 and blaCTX-M-15.
F64 genome was also sequenced with illumina and both assemblies were aligned (see the first pair-wise MAUVE alignment in the figure). The F64 PacBio assembly is larger than the F64 velvet illumina assembly. The majority of the regions absent in the illumina F64 assembly (displayed as white regions within the color blocks) corresponds to different transposase and RNA operon copies that probably were collapsed in such assembly.
Five more ST11 genomes were sequenced with illumina, and de novo assembled with velvet. The figure shows the pair-wise MAUVE alignments to the F64-PacBio assembly.
Whole genome sequencing (WGS) is the new gold standard in studies of transmission dynamics and strain relatedness (David MZ, Daum RS. Clin Infect Dis. 2014)
WGS provides reliable, comparable, re-analyzable and genome-wide information allowing getting finished genomes with PacBio sequencing
Our new comparative genomics pipelines allow the detection of differences that are not detected by classical methods. It is helping clinicians and microbiologists to improve the knowledge about antibiotic resistance acquisition.
The 6 genomes will be publicly available within the next months including the genome assemblies and the Annotations done with BG7.
Exhaustive studies to decipher the evolutionary relationships between the 6 genomes, the intra-clonal diversity and the types of changes are in progress (Manuscript in preparation).
ECCMID 2014 - Sequencing, assembly and comparative genomics of six Enterococcus faecium ST117, an emergent multiresistant clone responsible for an increase of bacteremia and fecal carriage in Spain
An abrupt emergence of an AmpR ST117 Enterococcus faecium(Efm) clone associated with a dramatic increase in the rates of bacteremia and fecal carriage by Efm at different Spanish hospitals since 2009. This clone belongs to the human adapted ST78 Efm lineage. We analyze variation at genome level of phenotypically diverse Efm ST117from patients attending in Madrid area (2009-2012) and describe the first completed ST117 genome.
After assembly of PacBio sequences with RS_HGAP_Assembly.2 [Chin-2013] we get a finished E1 genome with one chromosome and 5 plasmids (1 MegaPL, 1 medium size PL and 3 small sizes PL).
PacBio assembly allowed to perfectly define the plasmids, even the small size ones. The existence and the size of these 5 plasmids were experimentally tested.
Independently we sequenced E1 genome with illumina and did the assembly with velvet. In the figure we show the pair-wise alignment of the PacBio assembly and the illumina assembly. The PacBio assembly has a significantly larger size. The additional sequence fragments in PacBio assembly (in the figure the white regions in the MAUVE blocks corresponding to the PacBio E1 genome) are main transposases and other mobile elements and RNA operon copies that are probably collapsed in the illumina velvet assembly. PacBio sequencing is especially useful for defining all the copies of each gene.
Whole genome sequencing (WGS) with PacBio allowed getting the first E. faecium ST117 finished genome.
Using PacBio we have been able to solve the elements with many different copies as MGE (transposons, Insertion Sequences). These repeated MGE elements frequently bear virulence and antibiotic resistance genes that are important to solve and analyze. Plasmids are difficult to assembly because they bear a high number of MGE.
Thus PacBio is a specially useful technology for working with bacterial plasmids.
Our new comparative genomics pipelines allow the detection of differences that are not detected by classical methods.
The 6 E. faecium genomes with their BG7 annotations will be publicly available within the next months.
Studies to get insight into the dynamic evolution of AbR and the pathogenicity of this clone in the hospital setting are in progress (Manuscript in preparation).