Research Article, J Appl Bioinforma Comput Biol Vol: 6 Issue: 1
Genetic and Proteomic Sequence Analysis for the Theoretical Prediction of O-Glycosylation Sites in Proteins
1Department of Physics, Manonmaniam Sundaranar University, Tirunelveli, Tamilnadu 627 012, India
2School of Advanced Sciences, VIT University, Vellore, Tamilnadu 632014, India
Corresponding author : Kasinadar Veluraja
Senior Professor, School of Advanced Sciences, VIT University, Vellore, Tamilnadu 632 014, India
Tel: +91 9486133130
E-mail: veluraja.k@vit.ac.in
Received: March 07, 2017 Accepted: April 11, 2017 Published: April 19, 2017
Citation: Jasmine A, Veluraja K (2017) Genetic and Proteomic Sequence Analysis for the Theoretical Prediction of O-Glycosylation Sites in Proteins. J Appl Bioinforma Comput Biol 6:1.doi: 10.4172/2329-9533.1000132
Abstract
O-glycosylation is one of the most requisite and ubiquitous protein post translational modification by which the oligosaccharides are conjugated to polypeptide backbone at specific sites. Proteins involved in this modification are synthesized as encoded by their genetic information and the function of the glycoprotein is determined by the O-glycosylated sites and the conjugated glycans. The identification of O-glycosylation sites has been developing rapidly in recent years both by experimental and theoretical studies. In the present study, we adopted a method to predict O-glycosylation site by integrating genetic and proteomic sequence information. Data for the analysis are taken from O-GlycBase v6.0, EMBL-EBI and UniprotKB/Swissprot databases. The prediction is carried out on the collected datasets by following jack-knife procedure. As there is no predefined consensus motif identified for O-glycosylation, the preference of amino acids and codons within the window size of -3 to +3 positions of the glycosylated sites are computed. The analysis reveals the preference of specific codons and amino acids around the glycosylated sites. A prediction program is developed to identify the sites of glycosylation based on the preferences of codons and amino acids. Sensitivity, specificity and accuracy of our prediction method are 91%, 78% and 86% respectively. In order to access our prediction, a comparative study is carried out on prediction with some of the publically available online predictors. It resulted in high predictive performance. In addition to amino acids, preference of certain codons around the glycosylated sites might play a role in the glycosylability of glycoproteins and the methodology could be extended to study other such modifications in proteins to gain better insights.