A semantic unit based event detection scheme in soccer videos is proposed in this paper.The scheme can be characterized as a three-layer framework. At the lowest layer, low-level featuresincluding color, texture, edge...A semantic unit based event detection scheme in soccer videos is proposed in this paper.The scheme can be characterized as a three-layer framework. At the lowest layer, low-level featuresincluding color, texture, edge, shape, and motion are extracted. High-level semantic events aredefined at the highest layer. In order to connect low-level features and high-level semantics, wedesign and define some semantic units at the intermediate layer. A semantic unit is composed of asequence of consecutives frames with the same cue that is deduced from low-level features. Based onsemantic units, a Bayesian network is used to reason the probabilities of events. The experiments forshoot and card event detection in soccer videos show that the proposed method has an encouragingperformance.展开更多
1. IntroductionHumans have the ability (or competence) to think logically, and this is an undeniable fact. However,what this ability consists in is a difficult question. It might be said that logical ability consists ...1. IntroductionHumans have the ability (or competence) to think logically, and this is an undeniable fact. However,what this ability consists in is a difficult question. It might be said that logical ability consists in theknowledge of a set of logic rules. But what are those logic rules? For centuries logicians have devel-展开更多
Accurate segmentation of camouflage objects in aerial imagery is vital for improving the efficiency of UAV-based reconnaissance and rescue missions.However,camouflage object segmentation is increasingly challenging du...Accurate segmentation of camouflage objects in aerial imagery is vital for improving the efficiency of UAV-based reconnaissance and rescue missions.However,camouflage object segmentation is increasingly challenging due to advances in both camouflage materials and biological mimicry.Although multispectral-RGB based technology shows promise,conventional dual-aperture multispectral-RGB imaging systems are constrained by imprecise and time-consuming registration and fusion across different modalities,limiting their performance.Here,we propose the Reconstructed Multispectral-RGB Fusion Network(RMRF-Net),which reconstructs RGB images into multispectral ones,enabling efficient multimodal segmentation using only an RGB camera.Specifically,RMRF-Net employs a divergentsimilarity feature correction strategy to minimize reconstruction errors and includes an efficient boundary-aware decoder to enhance object contours.Notably,we establish the first real-world aerial multispectral-RGB semantic segmentation of camouflage objects dataset,including 11 object categories.Experimental results demonstrate that RMRF-Net outperforms existing methods,achieving 17.38 FPS on the NVIDIA Jetson AGX Orin,with only a 0.96%drop in mIoU compared to the RTX 3090,showing its practical applicability in multimodal remote sensing.展开更多
深度学习算法在虚假新闻检测关键特征提取方面具有优势,然而,现有的基于深度学习的多模态虚假新闻检测方法仍存在不足之处,例如,从输入的图像与文本中提取特征并进行特征融合时存在融合不充分的问题。针对这一问题,该文提出了一种基于...深度学习算法在虚假新闻检测关键特征提取方面具有优势,然而,现有的基于深度学习的多模态虚假新闻检测方法仍存在不足之处,例如,从输入的图像与文本中提取特征并进行特征融合时存在融合不充分的问题。针对这一问题,该文提出了一种基于多模态上下文融合及语义增强的虚假新闻检测模型MCEFSE(Multimodal Context based Early Fusion and Semantic Enhancement)。首先,该文利用预训练语言模型BERT对句子进行编码。同时,以Swin Transformer模型作为主要框架,在早期视觉特征编码时引入文本特征,增强语义交互。此外,我们还使用InceptionNetV3作为图像模式分析器。最后,对文本语义、视觉语义和图像模式特征进行细化和融合,得到最终的多模态特征表示。结果显示,MCEFSE模型在微博数据集和微博-21数据集上的准确率分别为0.921和0.932,验证了该方法的有效性。展开更多
文摘A semantic unit based event detection scheme in soccer videos is proposed in this paper.The scheme can be characterized as a three-layer framework. At the lowest layer, low-level featuresincluding color, texture, edge, shape, and motion are extracted. High-level semantic events aredefined at the highest layer. In order to connect low-level features and high-level semantics, wedesign and define some semantic units at the intermediate layer. A semantic unit is composed of asequence of consecutives frames with the same cue that is deduced from low-level features. Based onsemantic units, a Bayesian network is used to reason the probabilities of events. The experiments forshoot and card event detection in soccer videos show that the proposed method has an encouragingperformance.
文摘1. IntroductionHumans have the ability (or competence) to think logically, and this is an undeniable fact. However,what this ability consists in is a difficult question. It might be said that logical ability consists in theknowledge of a set of logic rules. But what are those logic rules? For centuries logicians have devel-
基金National Natural Science Foundation of China(Grant Nos.62005049 and 62072110)Natural Science Foundation of Fujian Province(Grant No.2020J01451).
文摘Accurate segmentation of camouflage objects in aerial imagery is vital for improving the efficiency of UAV-based reconnaissance and rescue missions.However,camouflage object segmentation is increasingly challenging due to advances in both camouflage materials and biological mimicry.Although multispectral-RGB based technology shows promise,conventional dual-aperture multispectral-RGB imaging systems are constrained by imprecise and time-consuming registration and fusion across different modalities,limiting their performance.Here,we propose the Reconstructed Multispectral-RGB Fusion Network(RMRF-Net),which reconstructs RGB images into multispectral ones,enabling efficient multimodal segmentation using only an RGB camera.Specifically,RMRF-Net employs a divergentsimilarity feature correction strategy to minimize reconstruction errors and includes an efficient boundary-aware decoder to enhance object contours.Notably,we establish the first real-world aerial multispectral-RGB semantic segmentation of camouflage objects dataset,including 11 object categories.Experimental results demonstrate that RMRF-Net outperforms existing methods,achieving 17.38 FPS on the NVIDIA Jetson AGX Orin,with only a 0.96%drop in mIoU compared to the RTX 3090,showing its practical applicability in multimodal remote sensing.
文摘深度学习算法在虚假新闻检测关键特征提取方面具有优势,然而,现有的基于深度学习的多模态虚假新闻检测方法仍存在不足之处,例如,从输入的图像与文本中提取特征并进行特征融合时存在融合不充分的问题。针对这一问题,该文提出了一种基于多模态上下文融合及语义增强的虚假新闻检测模型MCEFSE(Multimodal Context based Early Fusion and Semantic Enhancement)。首先,该文利用预训练语言模型BERT对句子进行编码。同时,以Swin Transformer模型作为主要框架,在早期视觉特征编码时引入文本特征,增强语义交互。此外,我们还使用InceptionNetV3作为图像模式分析器。最后,对文本语义、视觉语义和图像模式特征进行细化和融合,得到最终的多模态特征表示。结果显示,MCEFSE模型在微博数据集和微博-21数据集上的准确率分别为0.921和0.932,验证了该方法的有效性。