News video story segmentation using fusion of multi-level multi-modal features in TRECVID 2003

Iyengar, G.;Lin, C. Y.;Chang, S. F.;Huang, C. W.;Kennedy, L.;HSU WINSTON

標題:	News video story segmentation using fusion of multi-level multi-modal features in TRECVID 2003
作者:	HSU WINSTON Kennedy, L. Huang, C. W. Chang, S. F. Lin, C. Y. Iyengar, G.
公開日期:	28-九月-2004
卷:	3
起(迄)頁:	III645-III648
來源出版物:	IEEE International Conference on Acoustics, Speech and Signal Processing
摘要:	In this paper, we present our new results in news video story segmentation and classification in the context of TRECVID video retrieval benchmarking event 2003. We applied and extended the Maximum Entropy statistical model to effectively fuse diverse features from multiple levels and modalities, including visual, audio, and text. We have included various features such as motion, face, music/speech types, prosody, and high-level text segmentation information. The statistical fusion model is used to automatically discover relevant features contributing to the detection of story boundaries. One novel aspect of our method is the use of a feature wrapper to address different types of features - asynchronous, discrete, continuous and delta ones. We also developed several novel features related to prosody. Using the large news video set from the TRECVID 2003 benchmark, we demonstrate satisfactory performance (F1 measure up to 0.76 ) and more importantly observe an interesting opportunity for further improvement.
URI:	https://scholars.lib.ntu.edu.tw/handle/123456789/413059
ISSN:	15206149
DOI:	https://api.elsevier.com/content/abstract/scopus_id/4544242800
顯示於：	資訊工程學系

顯示文件完整紀錄

SCOPUS^TM
Citations

checked on 2023/12/27

Page view(s)

checked on 2024/5/18

Google Scholar^TM

檢查

Altmetric

TAIR相關文章

SCOPUSTM Citations

Page view(s)

Google ScholarTM

Altmetric

Altmetric

SCOPUS^TM
Citations

Google Scholar^TM