Taxonomy Driven Gene Prediction

Posted on October 2, 2017 by Arne Kutzner

Introduction

Gene predication is an important technique of modern genetics for inferring structure and extension of genes from existing genome data, i.e. DNA level information. It is vital for postulating gene orthologs in the case that there are no laboratory sequenced gene information (e.g. mRNA). The orthologs of a given gene G can be used for inferring significant information about G, e.g. anomalies with respect to the gene’s taxonomic behavior or insights into the gene’s evolutionary development. Such inferring can be achieved on the foundation of gene decomposition (Exons/Introns) followed by alignments and structural post analysis inspections.
However, the quality of information obtainable from a set of gene orthologs depends on the quality of the orthologs itself, i.e. the reliability and precision of the prediction process delivering these orthologs. A popular approach for getting predictions is the training of a Hidden-Markov-Model (HMM) or Hidden-Semi-Markov-Models (HSMM) by using gene knowledge of a given species. I.e. Gnomon from NCBI relies on structural properties of human genes for training. As negative side effect of this strategy the quality and precision of predications deteriorates seriously with increasing taxonomic distance from the origin (species) used for HMM training. These negative effects can be eliminated by additionally integrating taxonomy (a phylogenetic tree) into the prediction process. More precise: The prediction scheme alters for different taxonomic areas and information kept on inner node level of the taxonomic tree is integrated into the prediction process.

Ongoing Research

Students with interest in the above topic should contact Prof. Kutzner for more information about this research project and opportunities of research work participation.

Posted in Computational Genomics, Research