We participated in another example of exciting scientific results using Bio4j and MG7 that has been published in Nature Scientific Reports
Pablo Pareja-Tobes, part of Era7’s team, participated in the scientific paper “Important biological information uncovered in previously unaligned reads from chromatin immunoprecipitation experiments (ChIP-Seq)” that has been recently published in Nature Scientific Reports.
Chromatin immunoprecipitation followed by massively parallel sequencing (ChIP-Seq) methods provide genome-wide transcription factor binding sites (TFBSs). From all the short reads generated an average of 40% of reads fail to align to the corresponding genome. A good number of the unaligned reads from animals and plants corresponds to sequences of bacterial and metazoan origin. Irrespective of the source, 30%–40% of unaligned reads were actually alignable, and it was found that additional TFBSs can be identified from the previously unaligned ChIP-Seq reads. The assignment of the unaligned reads into their respective taxa was made possible using an integrated version of Era7's projects Bio4j and Metagenomics7 (MG7), an open source system for massive analysis of sequences from metagenomics samples. MG7 carries out taxonomic classification of short reads by associating reads and their blast hits with NCBI's taxonomy tree and taxon GI index.