摘要
                
                    随着互联网技术的飞速发展与普及,网络上提供了很多用户对商品评论的地方,这些评论信息直接体现了客户对商品功能或性能方面的情感态度,因此对商品评论进行文本挖掘具有重大意义。然而网络评论数据量巨大,多半为半结构化、非结构性化数据,且其中的无用评论较多,如何快速获取商品评论语料以及选取何种方式分析成为研究的关键问题。首先,本文利用Python通过爬虫获取烟草的评论语料,并对语料进行简繁转化、错别字替换、无用评论剔除等数据预处理操作,接下来在把评论语料初步分为正面情感和反面情感的基础上,基于情感词典、程度副词词典、否定词词典计算消费者对烟草的情感评分。结果表明:国内对本商品的情感评分还是比较高的,且长江沿岸省份的评分稍高于其他地区。
                
                With the rapid development and popularization of Internet technology, the Internet provides a lot of places for users to comment on products. These comments directly reflect the customer’s emotional attitude towards the function or performance of the product. Therefore, text mining of product reviews is of great significance. However, the amount of online commentary data is huge, mostly semi-structured and unstructured data, and there are many useless comments. How to quickly obtain commodity review corpus and select which method to analyze becomes a key issue for research. First of all, this paper uses Python to obtain tobacco commentary corpus through crawlers, and performs data preprocessing operations such as simplification and corpus transformation, typos replacement, useless comment culling, etc., and then based on the preliminary categorization of the corpus into positive and negative emotions. The emotional score of tobacco is calculated based on the sentiment dictionary, the degree adverb dictionary, and the negative word dictionary. The results show that the domestic emotional scores on this commodity are still relatively high, and the scores of the provinces along the Yangtze River are slightly higher than other regions.
    
    
                作者
                    贾春光
                Chunguang Jia(School of Statistics and Mathematics, Yunnan University of Finance and Economics, Kunming Yunnan)
     
    
    
                出处
                
                    《社会科学前沿》
                        
                        
                    
                        2018年第12期1962-1973,共12页
                    
                
                    Advances in Social Sciences
     
    
                关键词
                    网络爬虫
                    文本挖掘
                    情感分析
                    烟草
                
                        Web Crawler
                        Text Mining
                        Sentiment Analysis
                        Tobacco