期刊文献+

深度残差学习下的光源颜色估计 被引量:6

Illuminant estimation via deep residual learning
原文传递
导出
摘要 目的颜色恒常性通常指人类在任意光源条件下正确感知物体颜色的自适应能力,是实现识别、分割、3维视觉等高层任务的重要前提。对图像进行光源颜色估计是实现颜色恒常性计算的主要途径之一,现有光源颜色估计方法往往因局部场景的歧义颜色导致估计误差较大。为此,提出一种基于深度残差学习的光源颜色估计方法。方法将输入图像均匀分块,根据局部图像块的光源颜色估计整幅图像的全局光源颜色。算法包括光源颜色估计和图像块选择两个残差网络:光源颜色估计网络通过较深的网络层次和残差结构提高光源颜色估计的准确性;图像块选择网络按照光源颜色估计误差对图像块进行分类,根据分类结果去除图像中误差较大的图像块,进一步提高全局光源颜色估计精度。此外,对输入图像进行对数色度预处理,可以降低图像亮度对光源颜色估计的影响,提高计算效率。结果在NUS-8和重处理的Color Checker数据集上的实验结果表明,本文方法的估计精度和稳健性较好;此外,在相同条件下,对数色度图像比原始图像的估计误差低10%15%,图像块选择网络能够进一步使光源颜色估计网络的误差降低约5%。结论在两组单光源数据集上的实验表明,本文方法的总体设计合理有效,算法精度和稳健性好,可应用于需要进行色彩校正的图像处理和计算机视觉等领域。 Objective Color constancy refers to the human ability that allows the brain to recognize an object as having a consistent color under varying illuminants. Color constancy has become an important prerequisite of high-level tasks, such as recognition, segmentation, and 3 D vision. In the computer vision community, the goal of computational color constancy is to remove illuminant color casts and obtain accurate color representations for images. Therefore, illuminant estimation is an important means to achieve computational color constancy, which is a difficult and underdetermined problem because the observed image color is influenced by unknown factors, such as scene illuminants and object reflections. Illuminant estimation methods can be categorized into two classes: statistics-based(or static) and learning-based methods. Statistics-based methods estimate the illuminant based on the statistical properties(e.g., reflectance distributions) of the image. Learning-based methods learn a model from training images then estimate the illuminant using the model. Convolutional neural networks(CNNs) are very powerful methods of estimating illuminants, and many competitive results have been obtained with CNN-based methods. We propose a CNN-based illuminant estimation algorithm in this study. We use deep residual learning to improve network accuracy and a patch-selecting network to overcome the color ambiguity issue of local patches. Method We uniformly sample local patches from the image, estimate the local illuminant of each patch individually, and generate a global illuminant estimation of the entire image by combining the local illuminants. We use a 64×64 patch size in the patch sampling to guarantee the estimation accuracy of the local illuminant and provide sufficient training inputs without data augmentation. The proposed approach includes two residual networks, namely, illuminant estimation net(IEN) and patch selection net(PSN). IEN estimates the local illuminant of image patches. To improve the estimation accuracy of IEN, we increase the feature extraction hierarchy by adding network depth and use the residual structure to ensure gradient propagation and facilitate the training of the deep network. IEN is based on the residual structure, which consists of many stacked 3×3 and 1×1 convolutional layers, batch normalization layers, and rectified linear unit layers. The remaining part is composed of one global average pooling layer and one full connection layer. We use Euclidean loss and stochastic gradient descent(SGD) to optimize IEN. PSN shares a similar architecture with IEN, except that PSN has an additional Softmax layer that serves as the classifier at the end of the network. PSN is proposed to classify image patches according to their illuminant estimation errors. We use cross entropy loss and SGD to optimize PSN. According to the results of PSN, patches with a large estimation error are removed from the entire image, thus improving the performance of global illuminant estimation. Additionally, we preprocess the input image by using the log-chrominance algorithm, which converts a three-channel RGB image into a two-channel log-chrominance image;this reduces the influence of image luminance and improves the computational efficiency by decreasing the amount of data by 1/3. Result We implement the proposed IEN and PSN on the Caffe library. To evaluate the performance of our approach, we use two standard single-illuminant datasets, namely, the NUS-8 dataset and the reprocessed ColorChecker dataset. Both datasets include indoor and outdoor images, and a Macbeth ColorChecker is placed in each image to calculate the ground truth illuminant. The NUS-8 dataset contains 1 736 images captured from 8 different cameras, and the reprocessed ColorChecker dataset consists of 568 images from 2 cameras. Following the configurations of previous studies, we report the following metrics: the mean, the median, the tri-mean, and the mean of the lowest 25% and the highest 25% of angular errors. We also report the additional metric of the 95 th percentile for the reprocessed ColorChecker dataset. We divide the NUS-8 dataset into eight subsets, apply three-fold cross-validation on the eight subsets individually, and report the geometric mean of the proposed metrics for all eight subsets. We directly apply three-fold cross-validation on the reprocessed ColorChecker dataset. Experimental results show that the proposed approach is competitive with state-of-the-art methods. For the NUS-8 dataset, the proposed IEN achieves the best results among all compared methods, and the proposed PSN can further increase the precision of the IEN results. For the reprocessed ColorChecker dataset, our results are comparable with those of other advanced methods. In addition, we conduct ablation studies to evaluate the model components of the proposed approach. We compare the proposed IEN with several shallower CNNs. Experimental results show that deep residual learning is effective in improving illuminant estimation accuracy. Moreover, compared with the estimated illuminant on the original image, log-chrominance preprocessing can reduce the illuminant estimation error by 10% to 15%. The proposed PSN can further decrease the global illuminant estimation error by 5% compared with the method that uses IEN alone. Finally, we evaluate the time cost of our method on a PC with an Intel i5 2.7 GHz CPU, 16 GB of memory, and an NVIDIA GeForce GTX 1080 Ti GPU. Our code takes less than 1.4 s to estimate a 2 K image, which has a typical resolution of 2 048×1 080 pixels. Conclusion Experiments on the two single-illuminant datasets show that the proposed approach, which includes log-chrominance preprocessing, deep residual learning-based network structure, and patch selection for global illuminant estimation, is reasonable and effective. The proposed approach has high precision and robustness and can be widely used in image processing and computer vision systems that require color calibrations.
作者 崔帅 张骏 高隽 Cui Shuai;Zhang Jun;Gao Jun(School of Computer Science and Information Engineering,Hefei University of Technology,Hefei 230601,China)
出处 《中国图象图形学报》 CSCD 北大核心 2019年第12期2111-2125,共15页 Journal of Image and Graphics
基金 国家自然科学基金项目(61876057,61403116)~~
关键词 视觉光学 颜色恒常性 光源颜色估计 深度残差学习 对数色度 visual optics color constancy illuminant estimation deep residual learning log-chrominance
作者简介 崔帅,1986年生,男,博士研究生,主要研究方向为人工智能与机器人技术。E-mail:baalme@163.com;通信作者:张骏,女,副研究员,主要研究方向为计算机视觉、图像处理、机器学习。E-mail:zhangjun@hfut.edu.cn;高隽,男,教授,博士生导师,主要研究方向为图像处理、模式识别、神经网络理论及应用、光电信息处理、智能信息处理:E-mail:gaojun@hfut.edu.cn。
  • 相关文献

参考文献5

二级参考文献32

  • 1华希俊,廖茜,陈美云,王木菊.阴影环境下拖拉机视觉导航路径识别方法研究[J].农机化研究,2012,34(4):181-184. 被引量:4
  • 2POSSEGGER H, MAUTHNER T, and BISCHOF H. In defense of color-based model-free tracking[C]. IEEE Conference on Computer Vision and Pattern Recognition, Boston, USA, 2015: 2113-2120.
  • 3ORON S, BAR-HILLEL A, LEVI D, et al. Locally orderless tracking[C]. IEEE Conference on Computer Vision and Pattern Recognition, Rhode Island, USA, 2012: 1940-1947.
  • 4MEER P, RAMESH V, and COMANICIU D. Kernel-based object tracking[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2003, 25(5): 564-575.
  • 5Van de WEIJER J, SCHMID C, and VERBEEK J. Learning color names from real-world Images[C]. IEEE Conference on Computer Vision and Pattern Recognition, Minneapolis, Minnesota, USA, 2007: 1-8.
  • 6Van de WEIJER J, SCHMID C, VERBEEK J, et al. Learning color names for real-world applications[J]. IEEE Transactions on Image Processing, 2009, 18(7): 1512-1523.
  • 7KHAN F S, Van de WEIJER J, and VANRELL M. Modulating shape features by color attention for object recognition[J]. International Journal of Computer Vision, 2012, 98(1): 49-64.
  • 8KHAN F S, ANWER R M, Van de WEIJER J, et al. Color attributes for object detection[C]. IEEE Conference on Computer Vision and Pattern Recognition, Rhode Island, USA, 2012: 3306-3313.
  • 9DANELLJAN M, KHAN F S, FELSBERG M, et al. Adaptive color attributes for real-time visual tracking[C]. IEEE Conference on Computer Vision and Pattern Recognition, Columbus, USA, 2014: 1090-1097.
  • 10COMANICIU D, RAMESH V, and MEER P. Real-time tracking of non-rigid objects using mean shift[C]. IEEE Conference on Computer Vision and Pattern Recognition, Hilton Head, SC, USA, 2000: 142-149.

共引文献67

同被引文献26

引证文献6

二级引证文献7

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部