A METHODOLOGY FOR DATABASE AND DOCUMENT SELECTION
DOI:
https://doi.org/10.47839/ijc.10.2.745Keywords:
Metasearch Engine, Distributed query processing, Document selection.Abstract
As web users are facing the problems of information overload and drowning due to the significant and rapid growth in the amount of information and the number of users so there is need to provide Web users the more exactly needed information which is becoming a critical issue in web-based information retrieval and Web applications. In this work, we aspire to improve the performance of Web information retrieval and Web presentation through developing and employing Web data mining paradigms. Every search engine has a corresponding database that defines the set of documents that can be searched by the search engine. Generally, an index for all documents in the database is created and stored in the search engine. Text data in the Internet can be partitioned into several databases naturally. Proficient retrieval of preferred data can be attained if we can exactly predict the usefulness of each database, because with such information, we only need to retrieve potentially useful documents from useful databases. For a given query ‘q’ the usefulness of a text database is defined to be the no. of documents in the database that are sufficiently relevant to the query ‘q’. In this paper, we propose new approaches for database selection and documents selection. We also implement these algorithms using .net framework. Our experimental results indicate that these methods can yield substantial improvements over existing techniques.References
L. Gravano and H. Garcia-Molina. Generalizing GlOSS to Vector-Space databases and Broker Hierarchies. Int’l Conf. Very Large Data Bases, Sep. 1995, pp. 78-89.
B. Jansen, A. Spink, J. Bateman, and T. Saracevic. Real Life Information Retrieval: A Study of User Queries on the Web. Proc. ACM Special Interest Group on Information Retrieval Forum, (32) 1 (1998).
B. Yuwono and D. Lee. Server Ranking for Distributed Text Resource Systems on the Internet. Proc. Fifth Int’l Conf. Database Systems for Advanced Applications, Apr. 1997, pp. 391-400.
J. Callan, Z. Lu, and W. Bruce Croft. Searching Distributed Collections with Inference Networks. Proc. ACM Special Interest Group on Information Retrieval Conf. July 1995, pp. 21-28.
Patricia Correia Saraiva, Edleno Silva deMoura, Nivio Ziviani,Wagner Meira, Rodrigo Fonseca, and Berthier Ribeiro-Neto. Rank-Preserving Two-Level Caching for Scalable Search Engines. In ACM, editor, Proceedings of the SIGIR2001 conference, New Orleans, LA, September 2001. SIGIR.
C. Badue, R. Baeza-Yates, B. Ribeiro-Neto, and N. Ziviani. Distributed query processing using partitioned inverted ?les. In Proc. of the 9th String Processing and Information Retrieval Symposium (SPIRE), September 2002.
Paolo Boldi, Bruno Codenotti, Massimo Santini, and Sebastiano Vigna. Trovatore: Towards a Highly Scalable Distributed Web Crawler. In WWW Posters 2001, 2001.
N. Craswell, P. Bailey, and D. Hawking. Server Selection on the World Wide Web. In Proceedings of the Fifth ACM Conference on Digital Libraries, 2000, pp. 37-46.
E. Selberg, and O. Etzioni. The MetaCrawler Architecture for Resource Aggregation on the Web. IEEE Expert, 1997.
Wensheng Wu, Clement Yu, Weiyi Meng. Database Selection for Longer Queries, 2003.
L. Gravano, and H. Garcia-Molina. Merging Ranks from Heterogeneous Internet sources. International Conferences on Very Large Data Bases, 1997.
G. Towell, E. Voorhees, N. Gupta, and B. Johnson-Laird. Learning Collection Fusion Strategies for Information Retrieval. 12th Int'l Conf. on Machine Learning, 1995.
E. Voorhees, N. Gupta, and B. Johnson-Laird. Learning Collection Fusion Strategies. ACM SI- GIR Conference, Seattle, 1995.
W. Meng, K.-L. Liu, C. Yu, X. Wang, Y. Chang, and N. Rishe. Determining Text Databases to Search in the Internet. Proc. Int’l Conf. Very Large Data Bases, Aug. 1998. pp. 14-25.
Downloads
Published
How to Cite
Issue
Section
License
International Journal of Computing is an open access journal. Authors who publish with this journal agree to the following terms:• Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
• Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
• Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work.