期刊文献+

数字技术下《老子》文本与先秦两汉典籍的关系挖掘 被引量:4

Research on the Relationship Between the Text of"Laozi"and the Classics of the Pre-Qin and Han Dynasties Based on Digital Technologies
在线阅读 下载PDF
导出
摘要 [目的/意义]理解老子思想关乎理解中国早期文化,结合数字人文的方法,开展实证研究。利用大数据计算的方式,通过定量统计、定性分析,解决老子研究领域长期存在的疑而难决的源头、影响等方面的问题,发掘依靠阅读经验难以发现的文本组织特征及相互关系。[方法/过程]统计河上公版《老子》语料的字频;进行相似度分析和典籍引用情况分析;最后训练出古汉语的BERT模型,利用生成的字嵌入计算典籍句子之间的相似程度,在《老子》之前的典籍上进行相关性研究。[结果/结论]使用TF-IDF进行文本向量化,得出《老子》与其后世的作品中的《淮南子》最为相似;使用BERT模型的自监督学习训练,达到在完形填空任务上52.11%的精度和在预测是否是下一个句子上98.45%的精度,相似度计算结果显示出《墨子》与《老子》密切相关。这种方法引起了我们对《老子》和《墨子》间论说思想关系的一番新思考。 [Purpose/Significance]Understanding the Laozi’s thoughts relates to comprehend the early culture of Chinese. In this study, digital humanities methods were applied to empirical research. By using the method of big data calculation, including quantitative statistics and qualitative analysis, many long-standing problems in the field of Laozi’s research were deeply explored, such as the source, influences and other aspects of difficulties, mainly about the text organization characteristics and interrelationships which are difficult to find by reading. [Method/Process] The word frequencies were counted on the "Laozi" corpus of Heshanggong’s version. Similarity analysis were conducted and the citation of classics were analyzed. The BERT model were trained on ancient Chinese, and the generated word embeddings were used to calculate the similarity between classic sentences. [Result/Conclusion] By using TF-IDF for text vectorization, we found that "Huainanzi" is the most similar work with "Laozi" among its later works. By training the self-supervised learning model, BERT, a model whose accuracy reached 52.11% on the cloze task and 98.45% on predicting whether it’s the next sentence task was got. The result of similarity calculation indicates the close relevance of "Laozi" and "Mozi".The proposed method could help us to rethink about the theoretical and ideological relationship between "Laozi" and "Mozi".
作者 高瑞卿 董启文 方达 王弘治 方勇 Gao Ruiqing;Dong Qiwen;Fang Da;Wang Hongzhi;Fang Yong(School of Data Science and Engineering,East China Normal University,Shanghai 200062;Department of Chinese Language and Literature,East China Normal University,Shanghai 200062;School of Humanities,Shanghai Normal University,Shanghai 200234)
出处 《情报杂志》 CSSCI 北大核心 2021年第10期99-107,共9页 Journal of Intelligence
基金 国家社会科学重大基金项目“中国诸子学通史”(编号:19ZDA244)研究成果之一 国家社会科学基金项目“《经典释文》音义辞典”(编号:19FYYB008)研究成果之一 华东师大幸福之花先导基金重大研究专项“‘幸福之花’先导研究基金项目--大数据视野下的老子思想源头与涵义研究”(编号:44300-19312-542500/005)的研究成果之一。
关键词 BERT 数字人文 相似度 关系挖掘 先秦 老子 BERT digital humanities similarity relationship mining Pre-Qin Laozi
作者简介 高瑞卿,女,1997年生,硕士,研究方向:自然语言处理和文本挖掘;董启文,男,1977年生,博士,教授,研究方向:数据科学应用技术、包括网络信息学、机器学习和计算广告等;方达,男,1987年生,博士,助理研究员,研究方向:诸子学研究;通信作者:王弘治,男,1977年生,博士,副教授,研究方向:汉语史;方勇,男,1956年生,博士,教授,研究方向:诸子学研究。
  • 相关文献

参考文献25

二级参考文献257

共引文献340

同被引文献149

引证文献4

二级引证文献11

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部