摘要
本文设计了一种用户生成文本的质量分析框架.首先,基于主题分析构建商品类别主题特征集合.其次,利用主题特征与商品分类的强关联关系,构建形式化概念分析的形式背景,将分类-主题概念格化简并生成主题特征格,以此构建五个质量特征并生成质量评估模型.最后,在真实评论数据上的实验结果表明新方法具有更高预测精度.
In this paper,we design a topic-features lattices analysis(TFLA)framework based on objectivity quality dimensions.Firstly,we apply the latent Dirichlet allocation(LDA)approach to get latent topics as topic-features for each goods categories.Secondly,we construct formal background based on the strong relationship between goods categories and topic-features.So we could get generalization and instantiation relationship among the topic-features through formal concept analysis(FCA).We employ domain knowledge and relationships among topic-features to define five objective quality features.Also,we use machine learning methods to build quality evaluation models based on these quality features.Experiment results on actual comment data sets show that our new quality models’prediction results are in agreement with the artificial quality tags in most cases.The best performances could get that the mean absolute error(MAE)is 0.7 and F-measure is 0.5,which is significantly better than the conventional quality prediction model based on support vector machine(SVM)classification.
作者
钟将
张淑芳
郭卫丽
李雪
ZHONG Jiang;ZHANG Shu-fang;GUO Wei-li;LI Xue(Key Laboratory of Dependable Service Computing in Cyber Physical Society,Ministry of Education,Changqing University,Chongqing 400030,China;College of Computer Science,Chongqing University,Chongqing 400030,China;;Chongqing College of Electronic Engineering,Chongqing 401331,China;School of Information Technology and Electrical Engineering,University of Queensland,Brisbane 4072,Australia)
出处
《电子学报》
EI
CAS
CSCD
北大核心
2018年第9期2201-2206,共6页
Acta Electronica Sinica
基金
国家863高技术研究发展计划(No.2015AA015308)
国家重点研发计划项目(No.2017YFB1402401)
重庆市社会事业与民生保障科技创新专项(No.cstc2017shmsA20013)
关键词
用户评论
质量评估
主题特征
主题特征格
user comment
data quality
topic features
lattices of topic-features
作者简介
钟将,男,1974年出生,重庆江津人.博士,教授,主要研究方向为数据挖掘及应用,网络信息安全.E-mail:zhongjiang@cqu.edu.cn;张淑芳,女,1972年出生,陕西澄城人,博士研究生,副教授,主要研究方向大数据挖掘和模拟计算.E-mail:roseymcn2000@foxmail.com;郭卫丽,女,1990年出生,河北行唐人,硕士,主要研究方向为数据挖掘、高性能计算.E-mail:870188993@qq.com;李雪,男,1955年出生,重庆沙坪坝人,博士,教授,主要研究方向为数据挖掘,大数据.E-mail:xueli@itee.uq.edu.au