A Study on Web-based Relatedness Measure and Its Applications on Community Chain Detection and Query Suggestion
Date Issued
2009
Date
2009
Author(s)
Lin, Ming-Shun
Abstract
In statistical natural language processing, resources used to compute the statistics are indispensable. Different kinds of corpora have made available and many language models have been experimented. One major issue behind the corpus-based approaches is: if corpora adopted can reflect the up-to-date usage. As we know, languages are live. New terms and phrases are used in daily life. How to capture the new usages is an important research topic. This thesis defines a novel web-based relatedness measure and explores snippets in various web domains as corpora. Mutual dependency score between two objects is calculated according to content information and frequent information of the two objects. The relatedness score of the two objects is defined as projecting the dependency score by a transfer function. Four transfer functions based on Poisson, Log-concave Power-concave and Gompertz function are considered. Three famous benchmark datasets, including WordSimilarity-353, Miller-Charles and Rubenstein-Goodenough, verify the four transfer functions. Named entities are common foci of searchers. We apply the dependency score to evaluate named level association by three strategies, direct association, association matrix and scalar association matrix. Modeling and naming general entity-entity relationships is challenging in construction of social networks. Given a seed denoting a person name, we utilize Google search engine, NER (Named Entity Recognizer) parser, and the web-based relatedness measure to construct an evolving social network. For each entity pair in the network, we apply Markov chain random process to extract potential categories defined in the ODP. Moreover, for labeling their relationships, we try to combine the tf×idf scores of noun phrases extracted fromnippets and the rank scores of the categories.ifferent from traditional query suggestion which is extracted from query logs,we extract suggestion terms from snippets. We apply our relatedness measures to the query suggestion. Using the proposed relatedness measures, our query suggestion extracted shows a high agreement of relatedness.
Subjects
Relatedness Measure
Community Chain Detection
Query Suggestion
Category Labeling
Relationships Labeling
Evolving Social Network
Type
thesis
File(s)![Thumbnail Image]()
Loading...
Name
ntu-98-D91922022-1.pdf
Size
23.32 KB
Format
Adobe PDF
Checksum
(MD5):5d238ed129baa7e1c671d4b5347051ef
