Using PacBio reads to get closed bacterial genomes
You can get very good results in the assembly sequencing bacterial genomes exclusively with PacBio. The length of the reads of PacBio allows solving the main problems of the short reads: the assembly of regions repeated along the genome (sequences with several copies), the non-homogeneous genome coverage (regions with low coverage) and also, thanks to the random distribution of base-calling errors, PacBio allows the correction of the final assembly. It is important to consider that when the errors are always located in the same regions a higher coverage never gets to improve the error rate.
The number of PacBio reads needed for reaching a sufficient coverage of a bacterial genome is very low due to the length of their reads and to the homogeneous coverage. These two features and the random distribution of errors allow applying new methods and algorithms to obtain a corrected consensus sequence providing a definitive assembly with a very low error rate.
Our experience with PacBio
In Era7 Bioinformatics we have experience analyzing PacBio data for clients. We have also led Research projects using PacBio technology to get completely assembled and annotated bacterial genomes. In our NEXTMICRO project we sequenced with PacBio and de novo assembled 2 genomes from bacteria involved in outbreaks at hospital environments. One of them was a multiresistant Klebsiella pneumoniae from the ICU of a Hospital and the other from a challenging strain of Enterococcus faecium that is a problem at the hospitals of several European countries. The 2 posters presented at the ECCMID 2014 about these genomes are included below.
We evaluated with QUAST the two assemblies from the same multiresistant Klebsiella pneumoniae F64 isolate, one of them sequenced with illumina and assembled with SPADES and the other sequenced with PacBio and assembled with HGAP. The poster (included below) presented to the ASM2015 shows the results.
In the following figure we represent the number of contigs obtained in the 2 experiments of sequencing (illumina and PacBio) and assembly using the same DNA sample (the length of the illumina contigs is not exactly proportional to the actual contig lengths obtained):
Bacterial genome annotation with BG7
BG7 is our method specifically designed for bacterial genome annotation of NGS data. During last years we have successfully annotated many bacterial genomes even unknown genomes without any sequenced close genome. See here more information about how it works.
BG7 method was published in PLOS ONE: http://dx.plos.org/10.1371/journal.pone.0049239
BG7 is a PacBio SMRT COMPATIBLE ANALYSIS PRODUCT
BG7, our bacterial genome annotation is recognized as a PacBio SMRT COMPATIBLE ANALYSIS PRODUCT.
Comparative genomics for PacBio genomes
We have a really complete comparative genomics pipeline. You can select the complete pipeline or select only the modules most interesting for your research.
In the set of genomes for comparative analysis you can include your new PacBio genomes and any other complete genome. In bacterial genomics the practically closed genomes that PacBio is able to provide is fundamental because the comparisons are more reliable and it is possible to detect changes in genes with several copies (RNA operon, transposases) or in repetitive regions.
You can find below a schematic figure showing the different modules of the comparative genomics pipeline for bacteria.
ASM2015 Poster: Complete de novo genome characterization of isolates from outbreaks by means of PacBio and Illumina sequencing technologies
The aim of this study was to test the benefits of the use of NGS technologies and a de novo assembly approach for the genome characterization of isolates from an outbreak. Six isolates from an outbreak of carbapenemase producing Klebsiella pneumoniae ST11 OXA-48 were sequenced with Illumina and one of them (F64) was selected to be sequenced with PacBio in order to have an internal genome reference for the outbreak.
The same DNA from the Klebsiella genome F64 was sequenced with PacBio and with illumina. PacBio reads were assembled using HGAP pipeline and independently illumina reads were assembled with SPADES. Both assemblies were compared and evaluated with QUAST.
The number of mismatches was 1.91 per 100,000 bp.
- PacBio allows getting really high quality, closed genome to get a high quality internal reference
- NGS is the new gold standard in studies of transmission dynamics and strain relatedness
- Comparative genomics analysis allows the complete characterization of a set of isolates from an outbreak