Semantic segmentation is a crucial step for document understanding.In this paper,an NVIDIA Jetson Nano-based platform is applied for implementing semantic segmentation for teaching artificial intelligence concepts and...Semantic segmentation is a crucial step for document understanding.In this paper,an NVIDIA Jetson Nano-based platform is applied for implementing semantic segmentation for teaching artificial intelligence concepts and programming.To extract semantic structures from document images,we present an end-to-end dilated convolution network architecture.Dilated convolutions have well-known advantages for extracting multi-scale context information without losing spatial resolution.Our model utilizes dilated convolutions with residual network to represent the image features and predicting pixel labels.The convolution part works as feature extractor to obtain multidimensional and hierarchical image features.The consecutive deconvolution is used for producing full resolution segmentation prediction.The probability of each pixel decides its predefined semantic class label.To understand segmentation granularity,we compare performances at three different levels.From fine grained class to coarse class levels,the proposed dilated convolution network architecture is evaluated on three document datasets.The experimental results have shown that both semantic data distribution imbalance and network depth are import factors that influence the document’s semantic segmentation performances.The research is aimed at offering an education resource for teaching artificial intelligence concepts and techniques.展开更多
The document image segmentation is very useful for printing, faxing and data processing. An algorithm is developed for segmenting and classifying document image. Feature used for classification is based on the histogr...The document image segmentation is very useful for printing, faxing and data processing. An algorithm is developed for segmenting and classifying document image. Feature used for classification is based on the histogram distribution pattern of different image classes. The important attribute of the algorithm is using wavelet correlation image to enhance raw image's pattern, so the classification accuracy is improved. In this paper document image is divided into four types; background, photo, text and graph. Firstly, the document image background has been distingusished easily by former normally method;secondly, three image types will be distinguished by their typical histograms, in order to make histograms feature clearer, each resolution's HH wavelet subimage is used to add to the raw image at their resolution. At last, the photo, text and praph have been devided according to how the feature fit to the Laplacian distrbution by 2 and L . Simulations show that classification accuracy is significantly improved. The comparison with related shows that our algorithm provides both lower classification error rates and better visual results.展开更多
由于文档纸张的几何形变、拍摄场景的干扰及拍摄角度不理想导致的透视失真,移动设备获取的文档图像的光学字符识别(Optical character recognition,OCR)性能受到很大挑战。针对折叠和扭曲的畸变文档图像预处理问题,设计了两种基于自编...由于文档纸张的几何形变、拍摄场景的干扰及拍摄角度不理想导致的透视失真,移动设备获取的文档图像的光学字符识别(Optical character recognition,OCR)性能受到很大挑战。针对折叠和扭曲的畸变文档图像预处理问题,设计了两种基于自编码器的网络结构,以实现自适应性图像矫正并提高文字识别正确率。首先提出空洞残差块和非对称卷积残差块两种残差块,然后将残差块与自编码器相结合,设计了一种非对称空洞自编码器网络;同时利用空间金字塔池化代替全连接层,并用非对称卷积残差块实现特征提取,设计了另一种空间金字塔自编码器网络。实验结果表明,与畸变图像相比,经非对称空洞自编码器网络矫正后的图像在OCR正确率、OCR召回率和文本相似度上分别提高了26.3%、20.4%和12.3%,而经空间金字塔自编码器网络矫正后的图像在正确率、召回率和文本相似度上分别提高了27.7%、22.0%和15.5%。与RectiNet等其他图像矫正网络相比,这两种网络可以自适应矫正多种类型的畸变文档图像,且矫正后的图像在文字识别上表现更为优异。本文提出的两种矫正网络能有效提高图像文字识别正确率、召回率和文本相似度,同时在鲁棒性、泛化性等方面与现有矫正网络相比具有明显的优势。展开更多
基金Project(61806107)supported by the National Natural Science Foundation of ChinaProject supported by the Shandong Key Laboratory of Wisdom Mine Information Technology,ChinaProject supported by the Opening Project of State Key Laboratory of Digital Publishing Technology,China。
文摘Semantic segmentation is a crucial step for document understanding.In this paper,an NVIDIA Jetson Nano-based platform is applied for implementing semantic segmentation for teaching artificial intelligence concepts and programming.To extract semantic structures from document images,we present an end-to-end dilated convolution network architecture.Dilated convolutions have well-known advantages for extracting multi-scale context information without losing spatial resolution.Our model utilizes dilated convolutions with residual network to represent the image features and predicting pixel labels.The convolution part works as feature extractor to obtain multidimensional and hierarchical image features.The consecutive deconvolution is used for producing full resolution segmentation prediction.The probability of each pixel decides its predefined semantic class label.To understand segmentation granularity,we compare performances at three different levels.From fine grained class to coarse class levels,the proposed dilated convolution network architecture is evaluated on three document datasets.The experimental results have shown that both semantic data distribution imbalance and network depth are import factors that influence the document’s semantic segmentation performances.The research is aimed at offering an education resource for teaching artificial intelligence concepts and techniques.
文摘The document image segmentation is very useful for printing, faxing and data processing. An algorithm is developed for segmenting and classifying document image. Feature used for classification is based on the histogram distribution pattern of different image classes. The important attribute of the algorithm is using wavelet correlation image to enhance raw image's pattern, so the classification accuracy is improved. In this paper document image is divided into four types; background, photo, text and graph. Firstly, the document image background has been distingusished easily by former normally method;secondly, three image types will be distinguished by their typical histograms, in order to make histograms feature clearer, each resolution's HH wavelet subimage is used to add to the raw image at their resolution. At last, the photo, text and praph have been devided according to how the feature fit to the Laplacian distrbution by 2 and L . Simulations show that classification accuracy is significantly improved. The comparison with related shows that our algorithm provides both lower classification error rates and better visual results.
文摘由于文档纸张的几何形变、拍摄场景的干扰及拍摄角度不理想导致的透视失真,移动设备获取的文档图像的光学字符识别(Optical character recognition,OCR)性能受到很大挑战。针对折叠和扭曲的畸变文档图像预处理问题,设计了两种基于自编码器的网络结构,以实现自适应性图像矫正并提高文字识别正确率。首先提出空洞残差块和非对称卷积残差块两种残差块,然后将残差块与自编码器相结合,设计了一种非对称空洞自编码器网络;同时利用空间金字塔池化代替全连接层,并用非对称卷积残差块实现特征提取,设计了另一种空间金字塔自编码器网络。实验结果表明,与畸变图像相比,经非对称空洞自编码器网络矫正后的图像在OCR正确率、OCR召回率和文本相似度上分别提高了26.3%、20.4%和12.3%,而经空间金字塔自编码器网络矫正后的图像在正确率、召回率和文本相似度上分别提高了27.7%、22.0%和15.5%。与RectiNet等其他图像矫正网络相比,这两种网络可以自适应矫正多种类型的畸变文档图像,且矫正后的图像在文字识别上表现更为优异。本文提出的两种矫正网络能有效提高图像文字识别正确率、召回率和文本相似度,同时在鲁棒性、泛化性等方面与现有矫正网络相比具有明显的优势。