Achieving accurate navigation information by integrating multiple sensors is key to the safe operation of land vehicles in global navigation satellite system(GNSS)-denied environment.However,current multi-sensor fusio...Achieving accurate navigation information by integrating multiple sensors is key to the safe operation of land vehicles in global navigation satellite system(GNSS)-denied environment.However,current multi-sensor fusion methods are based on stovepipe architecture,which is optimized with custom fusion strategy for specific sensors.Seeking to develop adaptable navigation that allows rapid integration of any combination of sensors to obtain robust and high-precision navigation solutions in GNSS-denied environment,we propose a generic plug-and-play fusion strategy to estimate land vehicle states.The proposed strategy can handle different sensors in a plug-and-play manner as sensors are abstracted and represented by generic models,which allows rapid reconfiguration whenever a sensor signal is additional or lost during operation.Relative estimations are fused with absolute sensors based on improved factor graph,which includes sensors’error parameters in the non-linear optimization process to conduct sensor online calibration.We evaluate the performance of our approach using a land vehicle equipped with a global positioning system(GPS)receiver as well as inertial measurement unit(IMU),camera,wireless sensor and odometer.GPS is not integrated into the system but treated as ground truth.Results are compared with the most common filtering-based fusion algorithm.It shows that our strategy can process low-quality input sources in a plug-and-play and robust manner and its performance outperforms filtering-based method in GNSS-denied environment.展开更多
针对商品包装文本检测任务中弯曲密集型文本导致的错检、漏检问题,提出了一种由2个子网络组成的基于链接关系预测的文本检测框架(text detection network based on relational prediction,RPTNet)。在文本组件检测网络中,下采样采用卷...针对商品包装文本检测任务中弯曲密集型文本导致的错检、漏检问题,提出了一种由2个子网络组成的基于链接关系预测的文本检测框架(text detection network based on relational prediction,RPTNet)。在文本组件检测网络中,下采样采用卷积神经网络和自注意力并行的双分支结构提取局部和全局特征,并加入空洞特征增强模块(DFM)减少深层特征图在降维过程中信息的丢失;上采样采用特征金字塔与多级注意力融合模块(MAFM)相结合的方式进行多级特征融合以增强文本特征间的潜在联系,通过文本检测器从上采样输出的特征图中检测文本组件;在链接关系预测网络中,采用基于图卷积网络的关系推理框架预测文本组件间的深层相似度,采用双向长短时记忆网络将文本组件聚合为文本实例。为验证RRNet的检测性能,构建了一个由商品包装图片组成的文本检测数据集(text detection dataset composed of commodity packaging,CPTD1500)。实验结果表明:RPTNet不仅在公开文本数据集CTW-1500和Total-Text上取得了优异的性能,而且在CPTD1500数据集上的召回率和F值分别达到了85.4%和87.5%,均优于当前主流算法。展开更多
近年来,随着监控摄像头的不断增多和互联网的迅速发展,监控视频与网络视频越来越多,对视频进行自动行为冲突检测对降低人为审核导致的隐私信息泄露风险及维护社会治安、净化网络环境等具有重要意义.为了充分提取视频中的行为冲突特征,...近年来,随着监控摄像头的不断增多和互联网的迅速发展,监控视频与网络视频越来越多,对视频进行自动行为冲突检测对降低人为审核导致的隐私信息泄露风险及维护社会治安、净化网络环境等具有重要意义.为了充分提取视频中的行为冲突特征,并获得有较好泛化能力与检测效果的模型,采用I3D(inflated 3D convolutional network)与VGGish,基于XD-Violence进行多模态特征的提取,并提出了基于Transformer和图卷积网络的行为冲突检测模型TG-BCDM(behavior conflict detection model based on Transformer and graph convolution networks).该模型包含Transformer编码器模块和图卷积模块,可以在有效捕捉视频中长距离依赖关系的同时,关注视频特征的全局信息和局部信息.经过实验证明,该模型优于现有的8种方法.展开更多
基金partially supported by the National Natural Science Foundation of China(No. 61703207)the Jiangsu Provincial Natural Science Founda- tion of China(No. BK20170801)+2 种基金the Aeronautical Science Foundation of China(No. 2017ZC52017)the Jiangsu Provincial SixTalent Peaks(No. 2015-XXRJ-005)the Jiangsu Province Qing Lan Project
文摘Achieving accurate navigation information by integrating multiple sensors is key to the safe operation of land vehicles in global navigation satellite system(GNSS)-denied environment.However,current multi-sensor fusion methods are based on stovepipe architecture,which is optimized with custom fusion strategy for specific sensors.Seeking to develop adaptable navigation that allows rapid integration of any combination of sensors to obtain robust and high-precision navigation solutions in GNSS-denied environment,we propose a generic plug-and-play fusion strategy to estimate land vehicle states.The proposed strategy can handle different sensors in a plug-and-play manner as sensors are abstracted and represented by generic models,which allows rapid reconfiguration whenever a sensor signal is additional or lost during operation.Relative estimations are fused with absolute sensors based on improved factor graph,which includes sensors’error parameters in the non-linear optimization process to conduct sensor online calibration.We evaluate the performance of our approach using a land vehicle equipped with a global positioning system(GPS)receiver as well as inertial measurement unit(IMU),camera,wireless sensor and odometer.GPS is not integrated into the system but treated as ground truth.Results are compared with the most common filtering-based fusion algorithm.It shows that our strategy can process low-quality input sources in a plug-and-play and robust manner and its performance outperforms filtering-based method in GNSS-denied environment.
文摘针对商品包装文本检测任务中弯曲密集型文本导致的错检、漏检问题,提出了一种由2个子网络组成的基于链接关系预测的文本检测框架(text detection network based on relational prediction,RPTNet)。在文本组件检测网络中,下采样采用卷积神经网络和自注意力并行的双分支结构提取局部和全局特征,并加入空洞特征增强模块(DFM)减少深层特征图在降维过程中信息的丢失;上采样采用特征金字塔与多级注意力融合模块(MAFM)相结合的方式进行多级特征融合以增强文本特征间的潜在联系,通过文本检测器从上采样输出的特征图中检测文本组件;在链接关系预测网络中,采用基于图卷积网络的关系推理框架预测文本组件间的深层相似度,采用双向长短时记忆网络将文本组件聚合为文本实例。为验证RRNet的检测性能,构建了一个由商品包装图片组成的文本检测数据集(text detection dataset composed of commodity packaging,CPTD1500)。实验结果表明:RPTNet不仅在公开文本数据集CTW-1500和Total-Text上取得了优异的性能,而且在CPTD1500数据集上的召回率和F值分别达到了85.4%和87.5%,均优于当前主流算法。
文摘识别非驾驶行为是提高驾驶安全性的重要手段之一。目前基于骨架序列和图像的融合识别方法具有计算量大和特征融合困难的问题。针对上述问题,本文提出一种基于多尺度骨架图和局部视觉上下文融合的驾驶员行为识别模型(skeleton-image based behavior recognition network,SIBBR-Net)。SIBBR-Net通过基于多尺度图的图卷积网络和基于局部视觉及注意力机制的卷积神经网络,充分提取运动和外观特征,较好地平衡了模型表征能力和计算量间的关系。基于手部运动的特征双向引导学习策略、自适应特征融合模块和静态特征空间上的辅助损失,使运动和外观特征间互相引导更新并实现自适应融合。最终在Drive&Act数据集进行算法测试,SIBBR-Net在动态标签和静态标签条件下的平均正确率分别为61.78%和80.42%,每秒浮点运算次数为25.92G,较最优方法降低了76.96%。
文摘近年来,随着监控摄像头的不断增多和互联网的迅速发展,监控视频与网络视频越来越多,对视频进行自动行为冲突检测对降低人为审核导致的隐私信息泄露风险及维护社会治安、净化网络环境等具有重要意义.为了充分提取视频中的行为冲突特征,并获得有较好泛化能力与检测效果的模型,采用I3D(inflated 3D convolutional network)与VGGish,基于XD-Violence进行多模态特征的提取,并提出了基于Transformer和图卷积网络的行为冲突检测模型TG-BCDM(behavior conflict detection model based on Transformer and graph convolution networks).该模型包含Transformer编码器模块和图卷积模块,可以在有效捕捉视频中长距离依赖关系的同时,关注视频特征的全局信息和局部信息.经过实验证明,该模型优于现有的8种方法.