摘要
The rapid growth of multimedia content necessitates powerful technologies to filter, classify, index and retrieve video documents more efficiently. However, the essential bottleneck of image and video analysis is the problem of semantic gap that low level features extracted by computers always fail to coincide with high-level concepts interpreted by humans. In this paper, we present a generic scheme for the detection video semantic concepts based on multiple visual features machine learning. Various global and local low-level visual features are systelrtically investigated, and kernelbased learning method equips the concept detection system to explore the potential of these features. Then we combine the different features and sub-systen on both classifier-level and kernel-level fusion that contribute to a more robust system Our proposed system is tested on the TRECVID dataset. The resulted Mean Average Precision (MAP) score is rmch better than the benchmark perforrmnce, which proves that our concepts detection engine develops a generic model and perforrrs well on both object and scene type concepts.
The rapid growth of multimedia content necessitates powerful technologies to filter, classify, index and retrieve video documents more efficiently. However, the essential bottleneck of image and video analysis is the problem of semantic gap that low level features extracted by computers always fail to coincide with high-level concepts interpreted by humans. In this paper, we present a generic scheme for the detection video semantic concepts based on multiple visual features machine learning. Various global and local low-level visual features are systematically investigated, and kernel-based learning method equips the concept detection system to explore the potential of these features. Then we combine the different features and sub-systems on both classifier-level and kernel-level fusion that contribute to a more robust system. Our proposed system is tested on the TRECVID dataset. The resulted Mean Average Precision (MAP) score is much better than the benchmark performance, which proves that our concepts detection engine develops a generic model and performs well on both object and scene type concepts.
基金
Acknowledgements This paper was supported by the coUabomtive Research Project SEV under Cant No. 01100474 between Beijing University of Posts and Telecorrrcnications and France Telecom R&D Beijing
the National Natural Science Foundation of China under Cant No. 90920001
the Caduate Innovation Fund of SICE, BUPT, 2011.
作者简介
Dong Yuan, associate professor at Beijing University of Posts and Telecommunications, China, also a senior research consultant at the France Telecorn Research & Development Beijing in multimedia indexing research. He received his Ph.D. degree in Shanghai Jiao Tong University and worked as a postdoctoral research staff at the Engineering Department, Cambridge University, UK. He is now working on European speech recognition project-CORtEX. His cunent research interests include senntic video indexing, video copy detection, and multimedia content search. Email: yuandong@bupt.edu.cnZhang Jiwei postgraduate student from Pattern Recognition Lab, Beijng University of Posts and Telecornunications, China. His current research interests include visual concept detection and sports categorization. Email: buptjiwei@gmail.comZhao Nan, postgraduate student from Pattern Recognition Lab, Beijing University of Posts and Telecorrnunications, China. Her current research inter- ests include visual concept detection and sports categorization. Ernail: zhao.nan07@gmail.comChang Xiaofu, researcher of Multimedia Analysis and Retrieval, France Telecom Research & Development-Beijing, China. H/s research interests include image/video search, object recognition, and data rrming. Email: xiaofu.chang@orange.comLiu Wei, researcher of Multimedia Analysis andRetrieval, France Telecom Research & Development-Beijing, China. His current research interests include video and image copy detection, and face detection. Email: wei.liu@omnge.com