NJIT eTD: The New Jersey Institute of Technology's electronic Theses & Dissertations
Title:
A comparative study of sequence analysis tools in computational biology
Author:
Chuang, Wei-Jen
Document Type:
Thesis
Department:
Department of Computer and Information Science
Degree:
Master of Science
Major:
Computer Science
Advisory Committee:
Wang, Jason T. L.
Calvin, James M.
Kurfess, Franz J.
Thesis Date:
1999, January
Keywords:
Computer aglorithms.
Molecular biology --Data processing.
Availability:
Unrestricted
Abstract:

A biomolecular object, such as a deoxyribonucleic acid (DNA), a ribonucleic acid (RNA) or a protein molecule, is made up of a long chain of subunits. A protein is represented as a sequence made from 20 different amino acids, each represented as a letter. There are a vast number of ways in which similar structural domains can be generated in proteins by different amino acid sequences. By contrast, the structure of DNA, made up of only four different nucleotide building blocks that occur in two pairs, is relatively simple, regular, and predictable.

Biomolecular sequence alignment/string search is the most important issue and challenging task in many areas of science and information processing. It involves identifying one-to-one correspondences between subunits of different sequences. An efficient algorithm or tool is involved with many important factors, these include the following: Scoring systems, Alignment statistics, Database redundancy and sequence repetitiveness.

Sequence "motifs" are derived from multiple alignments and can be used to examine individual sequences or an entire database for subtle patterns. With motifs, it is sometimes possible to detect distant relationships that may not be demonstrable based on comparisons of primary sequences alone.

A more comprehensive solution to the efficient string search is approached by building a small, representative set of motifs and using this as a screening database with automatic masking of matching query subsequences. This technology is still under development but recent studies indicate that a representative set of only 1,000 - 3,000 sequences may suffice and such a database can be searched in seconds.

Complete Thesis:
njit-etd1999-051 (111 pages ~ 10,533 KB pdf)
Feedback:
Please complete this Feedback Form to inform us about your experience using this website. It will assist us in better serving your information needs in the future. Thank You!
Created December 3, 2007
To view these documents you will need the Acrobat Reader Plug-in. If you do not have it you can download it free from