Heng-Yi Chen, Hao-Ren Ke


With the proliferation of Web 2.0, Social Tag is widely used in various applications. Online bookstores (like Amazon) and online bibliographic community Websites (like LibraryThing) have quickly accumulated a large amount of user-generated information. INEX (INitiative for the Evaluation of XML retrieval) have been using the Amazon/LibraryThing corpus for its Social Book Search Track since 2011. The purpose of the INEX Social Book Search Track is to develop novel algorithms leveraging professional metadata and user-generated metadata for effectively retrieve books. This paper uses INEX 2011 Social Book Search Track test data set to conduct book search experiments and evaluate the retrieval results. Indices based on professional metadata, user-generated metadata and both are created respectively. The experimental results show that searching via user-generated metadata outperforms searching via professional metadata.


Book Search, Information Retrieval; Metadata; Social Tag

Full Text:



Brin, S. & Page, L. (1998). The Anatomy of a Large-Scale Hypertextual Web Search Engine. Computer Networks and ISDN Systems, 30(1-7), 107-117.

Craswell, N., Hawking, D., & Robertson, S. (2001). Effective Site Finding using Link Anchor Information. Proc. of SIGIR 2001, 250-257. New Orleans.

Hadro, J. (2008). Darien Library's Open Source SOPAC 2.0 Emphasizes Patron Content. LibraryJournal.com. Retrieved from http://www.libraryjournal.com/article/ CA6591377.html?rssid=191

Hu, Y., Xin, G., Song, R., Hu, G., Shi, S., Cao, Y. & Li, H. (2005). Title Extraction from Bodies of HTML Documents and Its Application to Web Page Retrieval. Proc. of SIGIR 2005, 250-257

Järvelin, K., & Kekäläinen, J. (2002). Cumulated gain-based evaluation of IR techniques. ACM Trans. Inf. Syst., ACM Trans. Inf. Syst., 20, 422–446. doi:10.1145/582415.582418

Kleinberg, J. (1998). Authoritative sources in a hyperlinked environment. ACM-SIAM Symposium on Discrete Algorithms (SODA), 46(5), 604-632.

Koolen, M., Kazai, G., Kamps, J., Doucet, A., & Landoni, M. (2012). Overview of the INEX 2011 Books and Social Search Track. In S. Geva, J. Kamps, & R. Schenkel, Focused Retrieval of Content and Structure (Vol. 7424, pp. 1-29). Springer Berlin Heidelberg. doi:10.1007/978-3-642-35734-3_1

Koolen, M., Kazai, G., Kamps, J., Preminger, M., Doucet, A., & Landoni, M. (2012). Overview of the INEX 2012 Social Book Search Track. INEX 2012 main page. Retrieved from https://inex.mmci.uni-saarland.de/static/proceedings/INEX2012-preproceedings.pdf

O'Reilly, Tim. (2005). What Is Web 2.0: Design Patterns and Business Models for the Next Generation of Software. Retrieved from http://oreilly.com/web2/archive/what-is-web-20.html

Page, L., Brin, S., Motwani, R. & Winograd, T. (1999). The PageRank citation ranking: Bringing order to the Web. Technical Report. Stanford InfoLab. http://ilpubs.stanford.edu:8090/422

Voorhees, E. M. (2002). The Philosophy of Information Retrieval Evaluation. In Revised Papers from the Second Workshop of the Cross-Language Evaluation Forum on Evaluation of Cross-Language Information Retrieval Systems. London, UK, UK: Springer-Verlag. Retrieved from http://dl.acm.org/citation.cfm?id=648264.753539

Westerveld, T., Kraaij, W., & Hiemstra, D. (2002). Retrieving web pages using content, links, urls and anchors. Retrieved Jan. 02, 2013, from http://doc.utwente.nl/66475

Xue, G.-R., Zeng, H.-J., Chen, Z., Yu, Y., Ma, W.-Y., Xi, W., & Fan, W. (2005). Optimizing Web Search Using Web Click- through Data. Proc. of CIKM 2005 pp.118-126

Zhai, C., & Lafferty, J. (2004). A study of smoothing methods for language models applied to information retrieval. ACM Trans. Inf. Syst., ACM Trans. Inf. Syst., 22, 179–214. doi:10.1145/984321.984322