摘要
非遗产品创新无法满足当前流行趋势,提出基于网络爬虫与TF-IDF算法的非遗产品创新方法。以百度百科、天猫商城为对象基于网络爬虫技术爬取非遗产品创新热点的网络词条,构造一个语料库粗糙集;利用TF-IDF算法对语料库进行精确搜索,在传统TF-IDF算法中引入词跨度,选取权重最高的前n个作为非遗产品创新设计的关键词,获得符合非遗产品创新设计需求的结果。测试结果显示:该方法抽取的非遗创新关键词与人工抽取结果更契合,准确度均在90%以上,基于网络爬虫与TF-IDF算法的非遗产品创新具有良好的推广应用前景。
The innovation of intangible cultural heritage products fails to meet the current popular trends.Therefore,an innovation method for intangible cultural heritage products based on web crawlers and TF-IDF algorithms is proposed.Taking Baidu Baike and Tmall Mall as the objects,the online entries of the innovation hotspots of intangible cultural heritage products are crawled based on web crawler technology to construct a rough corpus set.The TF-IDF algorithm is used to conduct an accurate search of the corpus.The word span is introduced in the traditional TF-IDF algorithm,and the top n with the highest weight are selected as the keywords for the innovative design of intangible cultural heritage products to obtain the results that meet the requirements of the innovative design of intangible cultural heritage products.The test results show that the key words of intangible cultural heritage innovation extracted by this method are more consistent with the results extracted manually,and the accuracy is all above 90%.The innovation of intangible cultural heritage products based on web crawlers and TF-IDF algorithms has a good prospect for promotion and application.
作者
王菁
杨晓翔
WANG Jing;YANG Xiaoxiang(College of Art and Media,Chuzhou City Vocational College,Chuzhou Anhui 239000,China;College of Art and Design,Yunnan University,Kunming 650000,China)
出处
《佳木斯大学学报(自然科学版)》
2025年第8期52-54,74,共4页
Journal of Jiamusi University:Natural Science Edition
基金
2023年度安徽省高等学校科学研究重点项目(2023AH052836)
2024年度安徽省高等学校科学研究重点项目(2024AH052925)。
关键词
网络爬虫
TF-IDF算法
语料库
词频率
非遗创新
产品
web crawler
TF-IDF algorithm
corpus
word frequency
intangible heritage innovation
product
作者简介
王菁(1991-),女,安徽滁州人,讲师,硕士,研究方向:数字图文信息设计;通讯作者:杨晓翔(1967-),男,云南昆明人,教授,硕士,研究方向:非遗地域文化与空间设计、民族民居环境发展。