Phenotype prediction and feature selection in genome-wide association studies
Department of Computer Science
Master of Science
Roshan, Usman W.
Wang, Jason T. L.
Genome wide association studies
Single nucleotide polymorphisms
Genome wide association studies (GWAS) search for correlations between single nucleotide polymorphisms (SNPs) in a subject genome and an observed phenotype. GWAS can be used to generate models for predicting phenotype based on genotype, as well as aiding in identification of specific genes affecting the biological mechanism underlying the phenotype.
In this investigation, phenotype prediction models are constructed from GWAS training data and are evaluated for performance on test data. Three methods are used to rank SNPs by their correlation with the phenotype: the univariate Wald test, a multivariate, support vector machine (SVM) based technique, and a hybrid method where a subset of top ranked SNPs from the Wald test are used to train the SVM. Both case- control studies and quantitative phenotypes are examined. For each method and data set, a series of least squares linear regression models is generated from nested subsets of the best SNPs from each ranking method. The accuracy of these models is determined on a test data set, and a plot of prediction performance against the number of top ranked SNPs considered is generated.
The SVM and hybrid methods are found to be consistently superior to the Wald test in ranking predictive SNPs. The hybrid method allows a useful trade-off between increasing accuracy vs. using fewer SNPs to be optimized as desired.
njit-etd2012-074 (62 pages ~ 481 KB pdf)
Please complete this Feedback Form to inform us about your experience using this website. It will assist us in better serving your information needs in the future. Thank You!
Created September 7, 2012