摘要
以物体为中心的学习方法旨在以组成式的方式对场景进行解析与建模,并提取场景中物体的表示.早期以物体为中心的学习方法通常使用简单的像素混合解码器来建模场景.然而,这些方法在处理复杂的合成数据集和真实世界数据集时通常表现不佳.相比之下,最近的一些以物体为中心的学习方法已经开始尝试使用结构更为复杂的解码器(例如自回归Transformer和扩散模型)来更有效地提取物体表示并建模场景.尽管这些近期的方法相比于早期的方法具有更好的效果,但这些方法采用的非组成式建模方法与人类的直觉相悖,且它们无法根据物体的表示生成对应的物体图像.为了解决这个问题,本文提出了以物体为中心的扩散(object-centric diffusion,OCD)模型,OCD使用一种改进的扩散模型作为解码器,在重构场景的过程中分别生成物体的外观和掩码,从而在保证模型效果的同时实现图像的组成式建模.大量的实验证明,OCD在多种数据集(包括两个合成数据集和两个真实世界数据集)上的图像分割和生成任务中表现出色,证明了其普适性和有效性.
Object-centric learning methods aim to parse and model scenes in a compositional way while extracting representations of objects within those scenes.Early object-centric approaches typically employ simple pixel-mixing decoders for scene modeling.However,these methods often perform poorly when handling complex synthetic datasets and real-world datasets.In contrast,recent object-centric learning methods have begun experimenting with more complex decoders,such as autoregressive Transformers and diffusion models,to extract object representations and model scenes more effectively.Despite the improved performance of these newer methods over earlier ones,their non-compositional modeling approaches contradict human intuition and fail to generate corresponding object images given object representations.To address this issue,the proposed object-centric diffusion(OCD)model employs an improved diffusion model as a decoder.OCD generates the appearance and masks of objects separately during the scene reconstruction process,achieving true compositional modeling while maintaining model performance.Extensive experiments demonstrate that OCD excels in image segmentation and generation tasks across various datasets,including two synthetic and two real-world datasets,proving its versatility and effectiveness.
作者
沈知萌
黄尹璇
SHEN Zhi-Meng;HUANG Yin-Xuan(School of Computer Science,Fudan University,Shanghai 200433,China;Shanghai Key Laboratory of Intelligent Information Processing,Fudan University,Shanghai 200433,China)
出处
《计算机系统应用》
2025年第8期80-92,共13页
Computer Systems & Applications
基金
上海市科学技术委员会项目(22511105000)
上海市类脑芯片与片上智能系统研发与转化功能型平台(17DZ2260900)。
关键词
以物体为中心的学习
无监督学习
组成式场景建模
扩散模型
生成模型
object-centric learning(OCL)
unsupervised learning
compositional scene modeling
diffusion model
generative model
作者简介
通信作者:沈知萌,E-mail:zmshen22@m.fudan.edu.cn。