To avoid the curse of dimensionality, text categorization (TC) algorithms based on machine learning (ML) have to use an feature selection (FS) method to reduce the dimensionality of feature space. Although havin...To avoid the curse of dimensionality, text categorization (TC) algorithms based on machine learning (ML) have to use an feature selection (FS) method to reduce the dimensionality of feature space. Although having been widely used, FS process will generally cause information losing and then have much side-effect on the whole performance of TC algorithms. On the basis of the sparsity characteristic of text vectors, a new TC algorithm based on lazy feature selection (LFS) is presented. As a new type of embedded feature selection approach, the LFS method can greatly reduce the dimension of features without any information losing, which can improve both efficiency and performance of algorithms greatly. The experiments show the new algorithm can simultaneously achieve much higher both performance and efficiency than some of other classical TC algorithms.展开更多
立足于企业科技强国战略,推动交叉工程项目发展,深化企业跨领域技术,本文提出一种基于二重BERT(bidirectional encoder representations from transformers)文本分类模型实现企业技术分布匹配的策略。首先,基于深度学习,本文构建了4种B...立足于企业科技强国战略,推动交叉工程项目发展,深化企业跨领域技术,本文提出一种基于二重BERT(bidirectional encoder representations from transformers)文本分类模型实现企业技术分布匹配的策略。首先,基于深度学习,本文构建了4种BERT模型,结合工程专业标签,对7万条专利文本进行预训练,实现企业属性识别;构建标签张力矩阵,计算加权余弦相似函数,实现技术合作匹配模块,筛选合作者。其次,基于时序分析,实现合作企业间的技术竞合追踪,确定合作程度范围,从“静态”和“动态”角度,为企业跨领域技术合作提出一种定量策略,补充了现有研究针对该问题的系统性、动态性缺陷。最后,选用生物医药工程高成长企业展开实例分析,证实了本文方法的可靠性。展开更多
文摘To avoid the curse of dimensionality, text categorization (TC) algorithms based on machine learning (ML) have to use an feature selection (FS) method to reduce the dimensionality of feature space. Although having been widely used, FS process will generally cause information losing and then have much side-effect on the whole performance of TC algorithms. On the basis of the sparsity characteristic of text vectors, a new TC algorithm based on lazy feature selection (LFS) is presented. As a new type of embedded feature selection approach, the LFS method can greatly reduce the dimension of features without any information losing, which can improve both efficiency and performance of algorithms greatly. The experiments show the new algorithm can simultaneously achieve much higher both performance and efficiency than some of other classical TC algorithms.
文摘立足于企业科技强国战略,推动交叉工程项目发展,深化企业跨领域技术,本文提出一种基于二重BERT(bidirectional encoder representations from transformers)文本分类模型实现企业技术分布匹配的策略。首先,基于深度学习,本文构建了4种BERT模型,结合工程专业标签,对7万条专利文本进行预训练,实现企业属性识别;构建标签张力矩阵,计算加权余弦相似函数,实现技术合作匹配模块,筛选合作者。其次,基于时序分析,实现合作企业间的技术竞合追踪,确定合作程度范围,从“静态”和“动态”角度,为企业跨领域技术合作提出一种定量策略,补充了现有研究针对该问题的系统性、动态性缺陷。最后,选用生物医药工程高成长企业展开实例分析,证实了本文方法的可靠性。
基金Supported by the National Natural Science Foundation of China under Grant Nos.60473002, 60603094 (国家自然科学基金)the Beijing Natural Science Foundation of China under Grant No.4051004 (北京市自然科学基金)
基金Supported by the National Natural Science Foundation of China under Grant Nos.6987301169935010+2 种基金60103014 (国家自然科学基金) the National High Technology Development 863 Program of China under Grant No.863-306-ZD02-02-4 (国家863高科技发展计划) th