Prediction of mRNA polyadenylation sites in the human genome and Mathematical modeling of alternative polyadenylation
Department of Mathematical Sciences
Doctor of Philosophy
Miura, Robert M.
Byrne, Bruce C.,
Dhar, Sunil Kumar
Golowasch, Jorge P.
SAGE data analysis
Support vector machine
Messenger RNA (mRNA) polyadenylation plays many important roles in the cell, such as transcription termination, mRNA stability and transportation, and mRNA translation in eukaryotic cells. A large number of human and mouse genes have multiple polyadenylation sites (referred to as poly(A) sites) that lead to variable transcripts, some of which are translated into various protein products with different functions. However, the details about when and where the polyadenylation occurs, and how pre-mRNA switches from one poly(A) site to another are still unknown. This kind of 3 '-end processing can be regulated by the cell environment, cell cycle stage, and tissue type.
It is generally accepted that the cleavage of pre-mRNA is based on the sequence of nucleotides around the poly(A) sites. So it is possible to predict the poly(A) sites accurately based on the pre-mRNA sequence. To accomplish the supervised prediction of a poly(A) site, a set of statistical models has been used, such as linear discriminant analysis, quadratic discriminant analysis, and support vector machine (SVM). Among these, SVM was chosen as the classification algorithm for the prediction of poly(A) sites in this work. A program called polya svm has been developed using PERL. The true positive and accuracy results obtained using this method are better than the results obtained using other commonly used algorithms.
Compared with the microarray technique, serial analysis of gene expression (SAGE) is another powerful technology for measuring the mRNA expression levels. Our study is the first investigation of the regulation of the transcripts from the same gene by analyzing the SAGE data. By filtering the noise data from the database and calculating the correlation between transcripts from the same unigene cluster, some significant genes are found to have multiple transcripts with opposite expression levels. These genes might be very interesting to biologists and they are worth being verified by biological experiments.
Alternative polyadenylation has been found to be very common in human and mouse genes recently. It has been believed that the selection of different poly(A) sites is related to biological factors such as the developmental stages, cell conditions, and the availability and abundance of some protein factors. However, it is not clear how these factors affect alternative polyadenylation. Mathematical modeling is applied to understand the dynamical selection of poly(A) sites. Cleavage stimulation Factor (CstF) is a very important protein complex required for efficient cleavage, containing subunits of 77, 64, and 50 kD (CstF-77, CstF-64, CstF-50). It has been found that human cstf-77 gene has several different transcripts due to the alternative polyadenylation and the expression levels of these transcripts display some auto-regulation. A mathematical model with a time delay is constructed to simulate the dynamical gene expression levels of gene cstf-77. Experimental data are compared with the model. This kind of mathematical model can also be extended to some other polyadenylation factors that have similar alternative polyadenylation patterns.
njit-etd2007-044 (153 pages ~ 11,745 KB pdf)
Please complete this Feedback Form to inform us about your experience using this website. It will assist us in better serving your information needs in the future. Thank You!
Created February 13, 2008