期刊文献+

融合卷积与多头注意力的人体姿态迁移模型 被引量:3

Human pose transfer model combining convolution and multi-head attention
在线阅读 下载PDF
导出
摘要 对于给定某个人物的参考图像,人体姿态迁移(HPT)的目标是生成任意姿态下的该人物图像。许多现有的相关方法在捕捉人物外观细节、推测不可见区域方面仍存在不足,特别是对于复杂的姿态变换,难以生成清晰逼真的人物外观。为了解决以上问题,提出一种新颖的融合卷积与多头注意力的HPT模型。首先,融合卷积与多头注意力机制构建卷积-多头注意力(Conv-MHA)模块,提取丰富的上下文特征;其次,利用Conv-MHA模块构建HPT网络,提升所提模型的学习能力;最后,引入参考图像的自我重建作为辅助任务,更充分地发挥所提模型的性能。在DeepFashion和Market-1501数据集上验证了基于Conv-MHA的HPT模型,结果显示:它在DeepFashion测试集上的结构相似性(SSIM)、感知相似度(LPIPS)和FID(Fréchet Inception Distance)指标均优于现有的HPT模型DPTN(Dualtask Pose Transformer Network)。实验结果表明,融合卷积与多头注意力机制的Conv-MHA模块可以提升模型的表示能力,更加有效地捕捉人物外观细节,提升人物图像生成的精度。 For a given reference image of a person,the goal of Human Pose Transfer(HPT)is to generate an image of that person in any arbitrary pose.Many existing related methods fail to capture the details of a person’s appearance and have difficulties in predicting invisible regions,especially for complex pose transformation,and it is difficult to generate a clear and realistic person’s appearance.To address the above problems,a new HPT model that integrated convolution and multihead attention was proposed.Firstly,the Convolution-Multi-Head Attention(Conv-MHA)block was constructed by fusing the convolution and multi-head attention,then it was used to extract rich contextual features.Secondly,to improve the learning ability of the proposed model,the HPT network was constructed by using Conv-MHA block.Finally,the selfreconstruction of the reference image was introduced as an auxiliary task to make the model more fully utilized its performance.The Conv-MHA-based human pose transfer model was validated on DeepFashion and Market-1501 datasets,and the results on DeepFashion test dataset show that it outperforms the state-of-the-art human pose transfer model,DPTN(Dual-task Pose Transformer Network),in terms of Structural SIMilarity(SSIM),Learned Perceptual Image Patch Similarity(LPIPS)and FID(Fréchet Inception Distance)indicators.Experimental results show that the Conv-MHA module,which integrates convolution and multi-head attention mechanism,can improve the representation ability of the model,capture the details of person’s appearance more effectively,and improve the accuracy of person image generation.
作者 杨红 张贺 靳少宁 YANG Hong;ZHANG He;JIN Shaoning(Information Science and Technology College,Dalian Maritime University,Dalian Liaoning 116026,China)
出处 《计算机应用》 CSCD 北大核心 2023年第11期3403-3410,共8页 journal of Computer Applications
关键词 人体姿态迁移 图像生成 生成对抗网络 多头注意力 卷积 Human Pose Transfer(HPT) image generation generative adversarial network multi-head attention convolution
作者简介 通信作者:杨红(1977-),女,辽宁葫芦岛人,副教授,博士,主要研究方向:数据挖掘、行为识别,电子邮箱yanghong@dlmu.edu.cn;张贺(1998-),男,山东临沂人,硕士研究生,主要研究方向:图像生成、深度生成模型;靳少宁(1996-),女,甘肃静宁人,硕士研究生,主要研究方向:步态识别、人工智能。
  • 相关文献

同被引文献35

引证文献3

二级引证文献3

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部