Search scripts mining from wisdom of the crowds
Journal
IEEE International Conference on Systems, Man and Cybernetics
Pages
878-883
ISBN
9781457706523
Date Issued
2011
Author(s)
Wang C.-J.
Abstract
This paper mines sequences of actions called search scripts from query logs which keep large scale users' search experiences. Search scripts can be applied to predict users' search needs, improve the retrieval effectiveness, recommend advertisements, and so on. Information quality, topic diversity, query ambiguity, and URL relevancy are major challenging issues in search scripts mining. In this paper, we calculate the relevance of URLs, adopt the Open Directory Project (ODP) categories to disambiguate queries and URLs, explore various features and clustering algorithms for intent clustering, and identify critical actions from each intent cluster to form a search script. Experiments show that the model based on a complete link hierarchical clustering algorithm with the features of query terms, relevant URLs, and disambiguated ODP categories performs the best. Search scripts are generated from the best model. When only search scripts containing a single intent are considered to be correct, the accuracy of the action identification algorithm is 0.4650. If search scripts containing a major intent are also counted, the accuracy increases to 0.7315. ? 2011 IEEE.
Subjects
mining web logs
search script
web search enhancement
Type
conference paper