Prosodic structure generation is the key component in improving the intelligibility and naturalness of synthetic speech for a text-to-speech (TTS) system. This paper investigates the problem of automatic segmentation ...Prosodic structure generation is the key component in improving the intelligibility and naturalness of synthetic speech for a text-to-speech (TTS) system. This paper investigates the problem of automatic segmentation of prosodic word and prosodic phrase,which are two fundamental layers in the hierarchical prosodic structure of Mandarin,and presents a two-stage prosodic structure generation strategy. Conditional random fields (CRF) models are built for both prosodic word and prosodic phrase prediction at the front end with diflerent feature selections. Besides,a transformation-based error-driven learning (TBL) modification module is introduced in the back end to amend the initial prediction. Experiment results show that the approach combining CRF and TBL achieves an F-score of 94.66%.展开更多
文摘针对重叠联盟的合作博弈框架(OCF games)中重叠联盟结构生成(OCSG)求解困难的问题,提出了一种基于贪心方法的有效算法。首先使用了一种带有联盟数量k约束的OCF博弈(kOCF games)模型来限制OCSG问题的规模;然后引入了一种相似度量来表示任意两个联盟结构之间的相似程度,并基于相似度量定义了单调性的性质,这意味着某一联盟结构与最优联盟结构的相似度越高,该联盟的单调性的值就越大;最后对于具有单调性质的kOCF博弈,采用了逐一插入玩家编号以逼近最优联盟结构的方法设计了联盟约束贪心(CCG)算法来求解给定的OCSG问题,并在理论上证明了CCG算法的复杂度是O(n2k+1)。通过实验分析和验证了不同参数和联盟值分布对所提算法性能的影响,并把该算法与Zick等提出的算法(ZICK Y,CHALKIADAKIS G,ELKIND E,et al.Cooperative games with overlapping coalitions:charting the tractability frontier.Artificial Intelligence,2019,271:74-97)在约束条件等方面进行了对比,得出了当联盟最大数量k被常数约束时所提算法的搜索次数随agent的个数基本呈线性增长的结果。可见CCG算法是固定参数k可解的,而且拥有更好的适用性。
基金Supported by National Natural Science Foundation of China(90920001)the Key Project of the Ministry of Education of China(108012)Joint-research Project between France Telecom R&DBeijing and Beijing University of Posts and Telecommunications(SEV01100474)
文摘Prosodic structure generation is the key component in improving the intelligibility and naturalness of synthetic speech for a text-to-speech (TTS) system. This paper investigates the problem of automatic segmentation of prosodic word and prosodic phrase,which are two fundamental layers in the hierarchical prosodic structure of Mandarin,and presents a two-stage prosodic structure generation strategy. Conditional random fields (CRF) models are built for both prosodic word and prosodic phrase prediction at the front end with diflerent feature selections. Besides,a transformation-based error-driven learning (TBL) modification module is introduced in the back end to amend the initial prediction. Experiment results show that the approach combining CRF and TBL achieves an F-score of 94.66%.