借助离散神经音频编解码器的能力,大型语言模型(Large language model,LLM)已被广泛认为是一种零样本语音合成(Text-to-Speech,TTS)的潜在方法。然而,基于采样的解码策略虽然能够为语音生成带来丰富的多样性,但同时也引入了诸如拼写错...借助离散神经音频编解码器的能力,大型语言模型(Large language model,LLM)已被广泛认为是一种零样本语音合成(Text-to-Speech,TTS)的潜在方法。然而,基于采样的解码策略虽然能够为语音生成带来丰富的多样性,但同时也引入了诸如拼写错误、遗漏和重复等鲁棒性问题。为了解决上述问题,我们提出了VALL-E R,一个鲁棒且高效的零样本TTS系统,并以VALL-E为基础进行构建。具体而言,我们引入了一种音素单调对齐策略,通过约束声学标记与其对应的音素严格匹配,增强了音素与声学序列之间的映射关系,从而确保更精确的对齐。此外,我们采用编解码器合并的方法,在浅层量化层对离散码进行降采样,以减少解码计算量,同时保持语音输出的高质量。受益于这些策略,VALL-E R在音素可控性方面取得了显著提升,并通过逼近真实语音的词错误率展现了卓越的鲁棒性。此外,该系统仅需较少的自回归推理步骤,推理时间降低超过60%,极大提升了推理效率。展开更多
End-to-end delay is one of the most important characteristics of Internet end-to-end packet dynamics, which can be applied to quality of services (OoS) management, service level agreement (SLA) management, congest...End-to-end delay is one of the most important characteristics of Internet end-to-end packet dynamics, which can be applied to quality of services (OoS) management, service level agreement (SLA) management, congestion control algorithm development, etc. Nonstationarity and nonlinearity are found by the analysis of various delay series measured from different links. The fact that different types of links have different degree of Self-Similarity is also obtained. By constructing appropriate network architecture and neural functions, functional networks can be used to model the Internet end-to-end nonlinear delay time series. Furthermore, by using adaptive parameter studying algorithm, the nonstationarity can also be well modeled. The numerical results show that the provided functional network architecture and adaptive algorithm can precisely characterize the Internet end-to-end delay dynamics.展开更多
文摘借助离散神经音频编解码器的能力,大型语言模型(Large language model,LLM)已被广泛认为是一种零样本语音合成(Text-to-Speech,TTS)的潜在方法。然而,基于采样的解码策略虽然能够为语音生成带来丰富的多样性,但同时也引入了诸如拼写错误、遗漏和重复等鲁棒性问题。为了解决上述问题,我们提出了VALL-E R,一个鲁棒且高效的零样本TTS系统,并以VALL-E为基础进行构建。具体而言,我们引入了一种音素单调对齐策略,通过约束声学标记与其对应的音素严格匹配,增强了音素与声学序列之间的映射关系,从而确保更精确的对齐。此外,我们采用编解码器合并的方法,在浅层量化层对离散码进行降采样,以减少解码计算量,同时保持语音输出的高质量。受益于这些策略,VALL-E R在音素可控性方面取得了显著提升,并通过逼近真实语音的词错误率展现了卓越的鲁棒性。此外,该系统仅需较少的自回归推理步骤,推理时间降低超过60%,极大提升了推理效率。
基金This project was supported by the National Natural Science Foundation of China (60132030 60572147)
文摘End-to-end delay is one of the most important characteristics of Internet end-to-end packet dynamics, which can be applied to quality of services (OoS) management, service level agreement (SLA) management, congestion control algorithm development, etc. Nonstationarity and nonlinearity are found by the analysis of various delay series measured from different links. The fact that different types of links have different degree of Self-Similarity is also obtained. By constructing appropriate network architecture and neural functions, functional networks can be used to model the Internet end-to-end nonlinear delay time series. Furthermore, by using adaptive parameter studying algorithm, the nonstationarity can also be well modeled. The numerical results show that the provided functional network architecture and adaptive algorithm can precisely characterize the Internet end-to-end delay dynamics.