NJIT eTD: The New Jersey Institute of Technology's electronic Theses & Dissertations
Title:
Knowledge-based document retrieval with application to TEXPROS
Author:
Sheng, Fang
Document Type:
Dissertation
Department:
Department of Computer and Information Science
Degree:
Doctor of Philosophy
Major:
Computer and Information Science
Advisory Committee:
Thomas, Gary L.
Ng, Peter A.
Hung, Daochuan
Rana, Ajaz
Curtis, Ronald S.
Thesis Date:
2001, May
Keywords:
Document retrieval
Texpros
Predicate-based query language
Search Algorithms
Availability:
Unrestricted
Abstract:

Document retrieval in an information system is most often accomplished through keyword search. The common technique behind keyword search is indexing. The major drawback of such a search technique is its lack of effectiveness and accuracy. It is very common in a typical keyword search over the Internet to identify hundreds or even thousands of records as the potentially desired records. However, often few of them are relevant to users' interests.

This dissertation presents knowledge-based document retrieval architecture with application to TEXPROS. The architecture is based on a dual document model that consists of a document type hierarchy and, a folder organization. Using the knowledge collected during document filing, the search space can be narrowed down significantly. Combining the classical text-based retrieval methods with the knowledge-based retrieval can improve tremendously both search efficiency and effectiveness.

With the proposed predicate-based query language, users can more precisely and accurately specify the search criteria and their knowledge about the documents to be retrieved. To assist users formulate a query, a guided search is presented as part of an intelligent user interface. Supported by an intelligent question generator, an inference engine, a question base, and a predicate-based query composer, the guided search collects the most important information known to the user to retrieve the documents that satisfy users' particular interests.

A knowledge-based query processing and search engine is presented as the core component in this architecture. Algorithms are developed for the search engine to effectively and efficiently retrieve the documents that match the query. Cache is introduced to speed up the process of query refinement. Theoretical proof and performance analysis are performed to prove the efficiency and effectiveness of this knowledge-based document retrieval approach.

Complete Thesis:
njit-etd2001-083 (106 pages ~ 4,690 KB pdf)
Feedback:
Please complete this Feedback Form to inform us about your experience using this website. It will assist us in better serving your information needs in the future. Thank You!
Created June 25, 2002
To view these documents you will need the Acrobat Reader Plug-in. If you do not have it you can download it free from