Situation and behavior understanding by trope detection on films

Chang C.-H;Su H.-T;Hsu J.-H;Wang Y.-S;Chang Y.-C;Liu Z.Y;Chang Y.-L;Cheng W.-F;Wang K.-J;Hsu W.H.

標題:	Situation and behavior understanding by trope detection on films
作者:	Chang C.-H Su H.-T Hsu J.-H Wang Y.-S Chang Y.-C Liu Z.Y Chang Y.-L Cheng W.-F Wang K.-J WINSTON HSU
關鍵字:	Benchmarking; Deep learning; Embeddings; Motion pictures; Natural language processing systems; Semantics; World Wide Web; Behavior understanding; Cause and effects; Human evaluation; Human performance; NAtural language processing; Prediction systems; Prediction tasks; Relational network; Learning systems
公開日期:	2021
起(迄)頁:	3188-3198
來源出版物:	The Web Conference 2021 - Proceedings of the World Wide Web Conference, WWW 2021
摘要:	The human ability of deep cognitive skills is crucial for the development of various real-world applications that process diverse and abundant user generated input. While recent progress of deep learning and natural language processing have enabled learning system to reach human performance on some benchmarks requiring shallow semantics, such human ability still remains challenging for even modern contextual embedding models, as pointed out by many recent studies [9, 10, 22, 24, 32]. Existing machine comprehension datasets assume sentence-level input, lack of casual or motivational inferences, or can be answered with question-answer bias. Here, we present a challenging novel task, trope detection on films, in an effort to create a situation and behavior understanding for machines. Tropes are frequently used storytelling devices for creative works. Comparing to existing movie tag prediction tasks, tropes are more sophisticated as they can vary widely, from a moral concept to a series of circumstances, and embedded with motivations and cause-and-effects. We introduce a new dataset, Tropes in Movie Synopses (TiMoS), with 5623 movie synopses and 95 different tropes collecting from a Wikipedia-style database, TVTropes. We present a multi-stream comprehension network (MulCom) leveraging multi-level attention of words, sentences, and role relations. Experimental result demonstrates that modern models including BERT contextual embedding, movie tag prediction systems, and relational networks, perform at most 37% of human performance (23.97/64.87) in terms of F1 score. Our MulCom outperforms all modern baselines, by 1.5 to 5.0 F1 score and 1.5 to 3.0 mean of average precision (mAP) score. We also provide a detailed analysis and human evaluation to pave ways for future research. ?? 2021 ACM.
URI:	https://www.scopus.com/inward/record.uri?eid=2-s2.0-85107985371&doi=10.1145%2f3442381.3449806&partnerID=40&md5=5dd3fbc066734197575c08c924f55892 https://scholars.lib.ntu.edu.tw/handle/123456789/581469
DOI:	10.1145/3442381.3449806
SDG/關鍵字:	Dataset; Natural language processing; Trope detection
顯示於：	資訊工程學系

顯示文件完整紀錄

SCOPUS^TM
Citations

checked on 2023/11/20

Page view(s)

checked on 2024/5/11

Google Scholar^TM

檢查

Altmetric

TAIR相關文章

SCOPUSTM Citations

Page view(s)

Google ScholarTM

Altmetric

Altmetric

SCOPUS^TM
Citations

Google Scholar^TM