A more efficient document retrieval method for TEXPROS
Department of Computer and Information Science
Doctor of Philosophy
Computer and Information Science
Thomas, Gary L.
Ng, Peter A.
Curtis, Ronald S.
Document processing is a critical element of office automation. Through document classification, extraction and filing, documents are automatically placed into a knowledge base according to certain rules. Document retrieval is a process to get a document back according to a user's requirements and to show the results to the user. Hence, a good user-interface and an efficient retrieval algorithm become core parts of document retrieval.
Unlike previous browsers that have been proposed for this purpose, this dissertation develops a new browser that has a user interface with more tools, and one that has a more efficient retrieval algorithm that can deal with a wide variety of retrieval situations.
In this dissertation, from the view of an interface, the new browser provides more functions such as "zoom in" and "zoom out", (i.e. automatic scaling of the portion of a graph that is of interest to a user), and help. These functions give users an easier way to view a large graph in one window and provide users with help during the retrieval process.
The new browser also provides an algorithm that makes retrieval more efficient by using a reusable base. The Reusable Base is used to hold information that is most related to the user previous desires and the information stored in the Reusable Base is more easily used to form the OP-Net than that in the System Catalog. Hence, it eliminates the need to go to the System Catalog to find the results. This speeds up the retrieval significantly -at least two times faster than without the Reusable Base.
Further, the new browser provides information about the folder organization and the document type hierarchy that is in addition to the OP-Net. If users know the type of documents they want, or which folder they are interested in, they can go to the particular document type or the particular folder directly.
njit-etd2001-059 (145 pages ~ 5,755 KB pdf)
Please complete this Feedback Form to inform us about your experience using this website. It will assist us in better serving your information needs in the future. Thank You!
Created July 15, 2002