BIOMEDICAL NAMED ENTITY RECOGNITION,SEMANTIC ROLE LABELING AND THEIR APPLICATION TO QUESTION ANSWERING
Date Issued
2006
Date
2006
Author(s)
Tsai, Richard Tzong-Han
DOI
en-US
Abstract
Processing biomedical literature automatically would be invaluable for both the design and interpretation of large-scale experiments. To this end, many information extraction (IE) systems using natural language processing (NLP) techniques have been developed for use in the biomedical field. In this dissertation, we study two main tasks: name entity recognition, semantic role labeling and their application to biomedical question-answering (QA).
In the first task, adding conjunction features is necessary, but it is infeasible to include all conjunction feature groups in a NER model since the memory resource is limited and some of them are ineffective. We employ sequential forward search to select the most effective feature groups. In addition, varieties of biomedical terms cause data sparseness and generate many redundant features mostly due to the varieties in the numerical parts. We apply numerical normalization to deal with this problem. In addition, the assignment of NE tags does not merely depend on the closest neighbors but may depend on words beyond the context window. We use automatically generated global patterns to remember such structures and modify the results of CRF tagger. By employing these three techniques sequentially, the F-score becomes 72.98%, which is 3.28% better than the baseline system and also outperforms the state-of-the-art systems.
In the second task, we construct a biomedical semantic role labeling (SRL) system that can be used to facilitate relation extraction. This task is divided into three steps. First, we construct a proposition bank on top of the popular biomedical GENIA treebank following the PropBank annotation scheme. We only annotate the predicate-argument structures (PAS's) of thirty frequently used biomedical predicates and their corresponding arguments. Second, we use our proposition bank to train a biomedical SRL system, which uses a maximum entropy (ME) model. Thirdly, we automatically generate argument-type templates which can be used to improve classification of biomedical argument roles. Our experimental results show that a newswire SRL system that achieves an F-score of 86.29% in the newswire domain can maintain an F-score of 64.64% when ported to the biomedical domain. By using our annotated corpus, BioProp, the F-score can be improved by 22.9%. After employing template features, the adjunct arguments such as temporal and locational arguments can be significantly improved by 1.57%.
At last, we present a biomedical Question Answering (QA) system by applying the NER and SRL systems. There is a pressing need for biologists to efficiently retrieve biological information related to their research. QA system enables biologists to ask questions conveniently in natural language and to retrieve specific answers from a large number of documents. We introduce our Biomedical Question Answering sys-tem (BeQA), which is designed to answer questions related to molecular events. By using the SRL system to label semantic arguments of questions and answers as well as to help QA mapping, we have improved both of the Top-1 accuracy and Top-5 MRR. In addition, we employ Google as our page retrieval module to find out passages with answers. The best result of BeQA achieves a Top-1 accuracy of 51.9% and a Top-5 MRR of 57.7%. In our future work, not only will we improve the ability of NER, SRL and biomedical QA, but also apply them to built a relation extraction system for pro-tein-protein and gene-disease relations.
Subjects
自然語言處理
專有名詞辨識
語意角色標註
自動問答
關連性擷取
Biomedical literature mining
natural language processing
named entity recognition
semantic role labeling
question answering
relation extraction
information extraction
Type
thesis
File(s)![Thumbnail Image]()
Loading...
Name
ntu-95-D90922013-1.pdf
Size
23.31 KB
Format
Adobe PDF
Checksum
(MD5):49b0d02b38827d26081a4142ed0c68f5
