get a quote
our pdf brochure
statistical analysis of NGS data
Next Generation Sequencing data poses an exciting challenge for statistical data analysis. Why?
- error model it is not only substantially different from the one of classical Sanger technology, but also specific of
each sequencing technology (Roche 454, Solexa Illumina,ABi SOLiD, ...)
- data size due to short read lengths, you end up with big sample sizes, thus requiring considerable computing power.
and, why would statistical analysis be useful? here are some examples
- data description
- read length distribution, quality scores ... or, nucleotide composition along an assembled sequence could
lead you to plasmid detection.
- refine base-calling
- detecting and filtering base-calling errors, or applying alternative base-calling methods could also make
the difference when working with, for example, SNPs.
- clustering and classification
- with respect to parameters such as sequence composition, motifs, higher-order markov dependencies, ... this is of fundamental importance for viral quasispecies or metagenomics studies.
- your data vs a probabilistic model
- testing your data against a technology-specific probabilistic model could lead to possible sequence
composition bias, unexpected number of equal reads, and thus to act as a quality control; furthermore, this kind of process could easily add value to sequencing providers, when implemented and deployed as a workflow.
If you need information (and a quote) for an specific project, contact us.
Also, more details are available in our statistical data analysis brochure.