Object-based audio coding is the main technique of audio scene coding. It can effectively reconstruct each object trajectory, besides provide sufficient flexibility for personalized audio scene reconstruction. So more...Object-based audio coding is the main technique of audio scene coding. It can effectively reconstruct each object trajectory, besides provide sufficient flexibility for personalized audio scene reconstruction. So more and more attentions have been paid to the object-based audio coding. However, existing object-based techniques have poor sound quality because of low parameter frequency domain resolution. In order to achieve high quality audio object coding, we propose a new coding framework with introducing the non-negative matrix factorization(NMF) method. We extract object parameters with high resolution to improve sound quality, and apply NMF method to parameter coding to reduce the high bitrate caused by high resolution. And the experimental results have shown that the proposed framework can improve the coding quality by 25%, so it can provide a better solution to encode audio scene in a more flexible and higher quality way.展开更多
Lattice vector quantization (LVQ) has been used for real-time speech and audio coding systems. Compared with conventional vector quantization, LVQ has two main advantages: It has a simple and fast encoding process,...Lattice vector quantization (LVQ) has been used for real-time speech and audio coding systems. Compared with conventional vector quantization, LVQ has two main advantages: It has a simple and fast encoding process, and it significantly reduces the amount of memory required. Therefore, LVQ is suitable for use in low-complexity speech and audio coding. In this paper, we describe the basic concepts of LVQ and its advantages over conventional vector quantization. We also describe some LVQ techniques that have been used in speech and audio coding standards of international standards developing organizations (SDOs).展开更多
Audio Video Coding Standard (AVS) is a second-generation source coding standard and the first standard for audio and video coding in China with independent intellectual property rights. Its performance has reached t...Audio Video Coding Standard (AVS) is a second-generation source coding standard and the first standard for audio and video coding in China with independent intellectual property rights. Its performance has reached the international standard. Its coding efficiency is 2 to 3 times greater than that of MPEG -2. This technical solution is more simple, and it can greatly save channel resource. After more than ten years' development, AVS has achieved great success. The latest version of the AVS audio coding standard is ongoing and mainly aims at the increasing demand for low bitrate and high quality audio services. The paper reviews the history and recent development of AVS audio coding standard in terms of basic features, key techniques and performance. Finally, the future development of AVS audio coding standard is discussed.展开更多
A new three-dimensional(3D) audio coding approach is presented to improve the spatial perceptual quality of 3D audio. Different from other audio coding approaches, the distance side information is also quantified, and...A new three-dimensional(3D) audio coding approach is presented to improve the spatial perceptual quality of 3D audio. Different from other audio coding approaches, the distance side information is also quantified, and the non-uniform perceptual quantization is proposed based on the spatial perception features of the human auditory system, which is named as concentric spheres spatial quantization(CSSQ) method. Comparison results were presented, which showed that a better distance perceptual quality of 3D audio can be enhanced by 5.7%~8.8% through extracting and coding the distance side information comparing with the directional audio coding, and the bit rate of our coding method is decreased of 8.07% comparing with the spatial squeeze surround audio coding.展开更多
A Bark-band residual noise model integrated with the human hearing mechanism is proposed to efficiently complement sinusoidal model in parametric audio coding. The time-varying spectrum of the residual noise is retrie...A Bark-band residual noise model integrated with the human hearing mechanism is proposed to efficiently complement sinusoidal model in parametric audio coding. The time-varying spectrum of the residual noise is retrieved by Bark-scale piecewise constant magnitude estimates along with random phases. In the proposed noise model, Bark bands information is obtained by short-time FFT method and window overlap-add technique is exploited to remove boundary discontinuities. SVQ is also incorporated into parameter quantization process for the low bit-rate coding demand. Simulation results and informal listening tests show that when the sinusoidal model is combined with the Bark-band noise model, better synthesis audio quality can be achieved compared with the original sinusoidal modeling audio codec.展开更多
Non-blind audio bandwidth extension is a standard technique within contemporary audio codecs to efficiently code audio signals at low bitrates. In existing methods, in most cases high frequencies signal is usually gen...Non-blind audio bandwidth extension is a standard technique within contemporary audio codecs to efficiently code audio signals at low bitrates. In existing methods, in most cases high frequencies signal is usually generated by a duplication of the corresponding low frequencies and some parameters of high frequencies. However, the perception quality of coding will significantly degrade if the correlation between high frequencies and low frequencies becomes weak. In this paper, we quantitatively analyse the correlation via computing mutual information value. The analysis results show the correlation also exists in low frequency signal of the context dependent frames besides the current frame. In order to improve the perception quality of coding, we propose a novel method of high frequency coarse spectrum generation to improve the conventional replication method. In the proposed method, the coarse high frequency spectrums are generated by a nonlinear mapping model using deep recurrent neural network. The experiments confirm that the proposed method shows better performance than the reference methods.展开更多
为了面向低延时的浅压缩场景提供更加适配的编码方案,并降低硬件实现成本,提出一种基于数字音视频编解码技术标准(Audio Video coding Standard,AVS)浅压缩算法的帧内预测模式优化以及快速率失真优化算法。该算法通过减少原有算法帧内...为了面向低延时的浅压缩场景提供更加适配的编码方案,并降低硬件实现成本,提出一种基于数字音视频编解码技术标准(Audio Video coding Standard,AVS)浅压缩算法的帧内预测模式优化以及快速率失真优化算法。该算法通过减少原有算法帧内预测所需的预测循环次数,以及打破各块之间的数据依赖关系等措施,克服了原始方案不适合硬件流水并行处理的限制,提高了编码的效率和稳定性,从而既保障了算法的视频质量,又使新的硬件实现方案更符合实际应用需求。实验结果表明,该算法优化方案能够有效改善实际面向低延时浅压缩场景下的编码效果。展开更多
基金supported by National High Technology Research and Development Program of China (863 Program) (No.2015AA016306)National Nature Science Foundation of China (No.61231015)National Nature Science Foundation of China (No.61671335)
文摘Object-based audio coding is the main technique of audio scene coding. It can effectively reconstruct each object trajectory, besides provide sufficient flexibility for personalized audio scene reconstruction. So more and more attentions have been paid to the object-based audio coding. However, existing object-based techniques have poor sound quality because of low parameter frequency domain resolution. In order to achieve high quality audio object coding, we propose a new coding framework with introducing the non-negative matrix factorization(NMF) method. We extract object parameters with high resolution to improve sound quality, and apply NMF method to parameter coding to reduce the high bitrate caused by high resolution. And the experimental results have shown that the proposed framework can improve the coding quality by 25%, so it can provide a better solution to encode audio scene in a more flexible and higher quality way.
文摘Lattice vector quantization (LVQ) has been used for real-time speech and audio coding systems. Compared with conventional vector quantization, LVQ has two main advantages: It has a simple and fast encoding process, and it significantly reduces the amount of memory required. Therefore, LVQ is suitable for use in low-complexity speech and audio coding. In this paper, we describe the basic concepts of LVQ and its advantages over conventional vector quantization. We also describe some LVQ techniques that have been used in speech and audio coding standards of international standards developing organizations (SDOs).
文摘Audio Video Coding Standard (AVS) is a second-generation source coding standard and the first standard for audio and video coding in China with independent intellectual property rights. Its performance has reached the international standard. Its coding efficiency is 2 to 3 times greater than that of MPEG -2. This technical solution is more simple, and it can greatly save channel resource. After more than ten years' development, AVS has achieved great success. The latest version of the AVS audio coding standard is ongoing and mainly aims at the increasing demand for low bitrate and high quality audio services. The paper reviews the history and recent development of AVS audio coding standard in terms of basic features, key techniques and performance. Finally, the future development of AVS audio coding standard is discussed.
基金supported by National High Technology Research and Development Program of China (863 Program, No. 2015AA016306)National Nature Science Foundation of China (No. 61662010, 61231015, 61471271, 61761044, 61762005)
文摘A new three-dimensional(3D) audio coding approach is presented to improve the spatial perceptual quality of 3D audio. Different from other audio coding approaches, the distance side information is also quantified, and the non-uniform perceptual quantization is proposed based on the spatial perception features of the human auditory system, which is named as concentric spheres spatial quantization(CSSQ) method. Comparison results were presented, which showed that a better distance perceptual quality of 3D audio can be enhanced by 5.7%~8.8% through extracting and coding the distance side information comparing with the directional audio coding, and the bit rate of our coding method is decreased of 8.07% comparing with the spatial squeeze surround audio coding.
文摘A Bark-band residual noise model integrated with the human hearing mechanism is proposed to efficiently complement sinusoidal model in parametric audio coding. The time-varying spectrum of the residual noise is retrieved by Bark-scale piecewise constant magnitude estimates along with random phases. In the proposed noise model, Bark bands information is obtained by short-time FFT method and window overlap-add technique is exploited to remove boundary discontinuities. SVQ is also incorporated into parameter quantization process for the low bit-rate coding demand. Simulation results and informal listening tests show that when the sinusoidal model is combined with the Bark-band noise model, better synthesis audio quality can be achieved compared with the original sinusoidal modeling audio codec.
基金supported by the National Natural Science Foundation of China under Grant No. 61762005, 61231015, 61671335, 61702472, 61701194, 61761044, 61471271National High Technology Research and Development Program of China (863 Program) under Grant No. 2015AA016306+2 种基金 Hubei Province Technological Innovation Major Project under Grant No. 2016AAA015the Science Project of Education Department of Jiangxi Province under No. GJJ150585The Opening Project of Collaborative Innovation Center for Economics Crime Investigation and Prevention Technology, Jiangxi Province, under Grant No. JXJZXTCX-025
文摘Non-blind audio bandwidth extension is a standard technique within contemporary audio codecs to efficiently code audio signals at low bitrates. In existing methods, in most cases high frequencies signal is usually generated by a duplication of the corresponding low frequencies and some parameters of high frequencies. However, the perception quality of coding will significantly degrade if the correlation between high frequencies and low frequencies becomes weak. In this paper, we quantitatively analyse the correlation via computing mutual information value. The analysis results show the correlation also exists in low frequency signal of the context dependent frames besides the current frame. In order to improve the perception quality of coding, we propose a novel method of high frequency coarse spectrum generation to improve the conventional replication method. In the proposed method, the coarse high frequency spectrums are generated by a nonlinear mapping model using deep recurrent neural network. The experiments confirm that the proposed method shows better performance than the reference methods.
文摘为了面向低延时的浅压缩场景提供更加适配的编码方案,并降低硬件实现成本,提出一种基于数字音视频编解码技术标准(Audio Video coding Standard,AVS)浅压缩算法的帧内预测模式优化以及快速率失真优化算法。该算法通过减少原有算法帧内预测所需的预测循环次数,以及打破各块之间的数据依赖关系等措施,克服了原始方案不适合硬件流水并行处理的限制,提高了编码的效率和稳定性,从而既保障了算法的视频质量,又使新的硬件实现方案更符合实际应用需求。实验结果表明,该算法优化方案能够有效改善实际面向低延时浅压缩场景下的编码效果。