Design of Personal Preference Inference from Questionnaire Data with Exemplary Application
Date Issued
2015
Date
2015
Author(s)
Sung, Ming-Chieh
Abstract
In a rapidly developing digital society, computer identification and inference of personal preference is more important than ever to predict market trends and tailor services to customers. To assess personal preference, questionnaires are often used as a direct approach. Current methods in questionnaire analysis, however, are only able to derive preferences stated directly in questionnaires. To predict personal preferential answer to a new question, a methodology is needed to profile a person and to perform inference based on a knowledge base of existing questionnaire data. This thesis designs a semantic-based methodology – “Questionnaire data-based Personal preference Inference Engine” (QPIE) to predict the preferential answer to a new question by analyzing the relationships between the new questions and the existing questions. Such relationships include the semantic meaning of each question and the associated answer. QPIE innovatively integrates existing methods and the corresponding tools in the public domain into a implemented system, and successfully solves the following four challenges arising from personal preference inference processing: i) Construction of knowledge base of questionnaires, including personal preference profile from answers and meaning of questions, ii) Numerical representation of meaning of questions and answers for further computer processing, iii) Inferring semantic relationships between existing questions and the new questions, and iv) Predicting the preferential answer. The design of QPIE consists of following four parts in response to the four challenges: (1) Semantic abstraction of single-sentence questions It is challenging to extract proper keywords by computer processing for representing meaning of a question. QPIE first exploits the grammatical structure of a sentence to facilitate abstraction by adopting a probabilistic natural language parser, the Stanford Dependency Parser, for deriving dependency-parsing tree of each question. Based on the parsing result, a Syntax-based Keyword Extraction Algorithm (SKEA) identifies keywords to represent the meaning of each single-sentence question. (2) Numerical representation of a single-sentence question in “semantic” space QPIE then applies word2vec to encode each keyword of a question to a numeric vector representation based on its semantics. Word2vec is a class of neural-network models that provides each word with a set of numerical coordinates in a semantic space learned from an un-labeled corpus. Vectors of words serve as the foundation of semantic similarity calculation. By treating a sentence as a concatenation of syntax-based keywords, QPIE encodes the semantics of a single-sentence question into a vector by concatenating the vectors of syntax-based keywords of the question. (3) Semantic inference among questions Once semantic-based vectors of questions are available, QPIE performs straightforward classification of questions according to their respective answers, one class per preferential answer choice. To infer the preferential answer to a new question, QPIE adopts support vector machine (SVM) as a probabilistic classifier to calculate, by exploiting the semantic-based vectors of questions, the similarity of the new question to existing questions in each class and the preference probability of choosing the answer of the class. (4) Preferential answer prediction for new questions based on real questionnaire data A reference implementation of this research implements QPIE methodology into a system by exploiting existing tools including Stanford Dependency Parser, word2vec, and LIBSVM, and new design, SKEA. System integration is realized by sharing data folder between MATLAB® and VirtualBox®. The training and testing data set consist of 44 single-sentence questions, each with the same four possible choices: {never, seldom, sometimes, often}, selected from Taiwan Communication Survey . In the Experiment 1, it is proven that higher preference probability can be related to higher semantic similarity between training and testing questions. In the Experiment 2, QPIE statistically and significantly outperforms the random guess approach by personal average accuracy of 66.65% over 1,313 people in predicting answers of 14 testing questions. The contribution of this thesis is an innovative design of a semantic-based methodology, QPIE, for enriching questionnaire analysis with personal preference inference capacity, which is capable of predicting personal preferential answers to new questions according to semantic relationships among questions. Based on the design, an integrated system is developed, which can be evaluated by prediction accuracy. Besides, inspirational results proven and discussed in experiments include that preference probability to new questions accounting for the semantic similarity, and further analysis of preference probabilities showing insights of personal patterns. Specifically, contributions include: (1) Abstracting the meaning of single-sentence, multiple-choice questions ; (2) Representing each question numerically for computer processing based on externally trained word vectors; (3) Semantical inference from numerical representation of questions by adopting SVM model; (4) Reference implementation of QPIE into a system; (5) Verification of the preference probability accounting for semantic similarity; (6) Achievements of i) personal average accuracy of 66.65%, significantly higher than random guess (47.66%).
Subjects
personal preference
inference
questionnaire
dependency parser
syntax-based keyword
semantics
classification
SVM
Type
thesis
File(s)![Thumbnail Image]()
Loading...
Name
ntu-104-R02921016-1.pdf
Size
23.32 KB
Format
Adobe PDF
Checksum
(MD5):3c14a35eec6d0e930bd7e036acce649a
