摘要
作为计算机视觉的基础任务,单幅图像超分辨率(Single Image Super-Resolution,SISR)长期以来一直是一个备受关注的研究课题。近期的研究表明,Transformer的成功不仅归功于其自注意力(Self-Attention,SA)机制,还体现在其宏观框架和先进组件的整体设计上。空间池化、位移、多层感知机(Multi-Layer Perception,MLP)、傅里叶变换和常数矩阵等方法,具有与SA机制相似的空间信息编码能力,能够替代并实现与其相当的效果。基于这一发现,本文的目标是利用Transformer中优越的宏观架构与高效的空间信息编码技术结合,改进复杂度较高的SA机制,以提升SISR性能。具体而言,本文重新审视了空间卷积的设计,旨在通过卷积调制技术实现更高效的空间特征编码,并通过动态调制方法表达特征。提出的高效空间信息编码(Efficient Spatial Information Encoding,ESIE)层,采用大核卷积和Hadamard积的方式,模仿查询与键之间的点积操作,并实现与SA机制中值表示再校准类似的效果。因此,ESIE层不仅能够捕捉长程依赖和自适应行为,还能够保持线性计算复杂度。另一方面,针对传统前馈网络(Feed-Forward Network,FFN)在处理空间信息时的次优表现,本文在提出的高效通道信息编码(Efficient Channel Information Encoding,ECIE)层中引入了空间感知和动态自适应机制。该方法有助于增强特征的多样性,并有效地调节层间的信息流动。实验结果表明,本文提出的高效空间-通道信息编码网络(Efficient Spatial-Channel Information Encoding,ESCIEN)在定量和定性评估上均优于现有模型。
As a fundamental task of computer vision,Single Image Super-Resolution(SISR)is a hot topic that has been intensively studied for a long time.Recent researches have shown that the success of Transformers comes from their macro-level framework and advanced components,not just their Self-Attention(SA)mechanism.Spatial pooling,shifting,MLP,Fourier transform,and constant matrix,which all have spatial information encoding capabilities similar to SA,can replace SA and achieve comparable results.Based on these findings,this work aims to combine efficient spatial information encoding technology with superior macro architectures in Transformers for SISR.To this end,the paper rethinks spatial convolution to achieve more efficient encoding of spatial features and realizes dynamic modulation by convolutional modulation techniques.The large-kernel convolution and Hadamard product are utilized in the proposed Efficient Spatial Information Encoding(ESIE)layer to imitate the matrix multiplication between query and key and recalibration of value representations in SA.Therefore,ESIE layer also achieve long-range correlations and self-adaptation behavior,similar to SA,but only requires linear computational complexity.In addition,to address the sub-optimality of vanilla Feed-Forward Networks(FFN),the paper introduces spatial awareness and locality in the proposed Efficient Channel Information Encoding(ECIE)layer.It can improve feature diversity and regulate information flow between layers.Experimental results show that the proposed Efficient Spatial-Channel Information Encoding Network(ESCIEN)outperforms other models both quantitatively and qualitatively.Codes and trained models will be made available if the paper is accepted.
作者
莫开治
滕奇志
任超
MO Kaizhi;TENG Qizhi;REN Chao(College of Electronics and Information Engineering,Sichuan University,Chengdu 610065,China)
出处
《智能计算机与应用》
2025年第8期1-9,共9页
Intelligent Computer and Applications
基金
国家自然科学基金(62271336,62171304)。
关键词
图像超分辨率
空间信息编码
卷积调制技术
大核卷积
image super resolution
spatial information encoding
convolutional modulation technology
large kernel convolution
作者简介
莫开治(1998-),男,硕士研究生,主要研究方向:图像处理;任超(1988-),男,博士,副研究员,主要研究方向:图像处理,计算机视觉,人工智能,多媒体通信与信息系统;通信作者:滕奇志(1961-),女,博士,教授,主要研究方向:多维数字信号处理,模式识别,计算机应用。Email:qzteng@scu.edu.cn。