Due to the rapid development,Internet has become the main field for brand building.Under this circumstance,the image of the brand is always consistent with the consumers' perception.Therefore,this study uses the m...Due to the rapid development,Internet has become the main field for brand building.Under this circumstance,the image of the brand is always consistent with the consumers' perception.Therefore,this study uses the method of text mining of search engine to explore the categories of brand archetype based on Brand Personality Theory from the perspective of Internet.The results find that 12 brand archetypes,including caregiver,sage,hero,innocent,dominator,creator,vitality,explorer,stylish woman,lover,cooperator,and vogue gentleman,have a high degree explanation.Deeper study uses case study to verify the reasonability and effectiveness of the classification standard.展开更多
For the complex questions of Chinese question answering system, we propose an answer extraction method with discourse structure feature combination. This method uses the relevance of questions and answers to learn to ...For the complex questions of Chinese question answering system, we propose an answer extraction method with discourse structure feature combination. This method uses the relevance of questions and answers to learn to rank the answers. Firstly, the method analyses questions to generate the query string, and then submits the query string to search engines to retrieve relevant documents. Sec- ondly, the method makes retrieved documents seg- mentation and identifies the most relevant candidate answers, in addition, it uses the rhetorical relations of rhetorical structure theory to analyze the relationship to determine the inherent relationship between para- graphs or sentences and generate the answer candi- date paragraphs or sentences. Thirdly, we construct the answer ranking model,, and extract five feature groups and adopt Ranking Support Vector Machine (SVM) algorithm to train ranking model. Finally, it re-ranks the answers with the training model and fred the optimal answers. Experiments show that the proposed method combined with discourse structure features can effectively improve the answer extrac- ting accuracy and the quality of non-factoid an- swers. The Mean Reciprocal Rank (MRR) of the an- swer extraction reaches 69.53%.展开更多
In the global information era,people acquire more and more information from the Internet,but the quality of the search results is degraded strongly because of the presence of web spam.Web spam is one of the serious pr...In the global information era,people acquire more and more information from the Internet,but the quality of the search results is degraded strongly because of the presence of web spam.Web spam is one of the serious problems for search engines,and many methods have been proposed for spam detection.We exploit the content features of non-spam in contrast to those of spam.The content features for non-spam pages always possess lots of statistical regularities; but those for spam pages possess very few statistical regularities,because spam pages are made randomly in order to increase the page rank.In this paper,we summarize the regularities distributions of content features for non-spam pages,and propose the calculating probability formulae of the entropy and independent n-grams respectively.Furthermore,we put forward the calculation formulae of multi features correlation.Among them,the notable content features may be used as auxiliary information for spam detection.展开更多
基金supported by Project 71202155 of National Science Funds for Distinguished Young Scientists of China
文摘Due to the rapid development,Internet has become the main field for brand building.Under this circumstance,the image of the brand is always consistent with the consumers' perception.Therefore,this study uses the method of text mining of search engine to explore the categories of brand archetype based on Brand Personality Theory from the perspective of Internet.The results find that 12 brand archetypes,including caregiver,sage,hero,innocent,dominator,creator,vitality,explorer,stylish woman,lover,cooperator,and vogue gentleman,have a high degree explanation.Deeper study uses case study to verify the reasonability and effectiveness of the classification standard.
基金supported by the National Nature Science Foundation of China under Grants No.60863011,No.61175068,No.61100205,No.60873001the Fundamental Research Funds for the Central Universities under Grant No.2009RC0212+1 种基金the National Innovation Fund for Technology based Firms under Grant No.11C26215305905the Open Fund of Software Engineering Key Laboratory of Yunnan Province under Grant No.2011SE14
文摘For the complex questions of Chinese question answering system, we propose an answer extraction method with discourse structure feature combination. This method uses the relevance of questions and answers to learn to rank the answers. Firstly, the method analyses questions to generate the query string, and then submits the query string to search engines to retrieve relevant documents. Sec- ondly, the method makes retrieved documents seg- mentation and identifies the most relevant candidate answers, in addition, it uses the rhetorical relations of rhetorical structure theory to analyze the relationship to determine the inherent relationship between para- graphs or sentences and generate the answer candi- date paragraphs or sentences. Thirdly, we construct the answer ranking model,, and extract five feature groups and adopt Ranking Support Vector Machine (SVM) algorithm to train ranking model. Finally, it re-ranks the answers with the training model and fred the optimal answers. Experiments show that the proposed method combined with discourse structure features can effectively improve the answer extrac- ting accuracy and the quality of non-factoid an- swers. The Mean Reciprocal Rank (MRR) of the an- swer extraction reaches 69.53%.
基金supported by the National Science Foundation of China(No.61170145,61373081)the Specialized Research Fund for the Doctoral Program of Higher Education of China(No.20113704110001)+1 种基金the Technology and Development Project of Shandong(No.2013GGX10125)the Taishan Scholar Project of Shandong,China
文摘In the global information era,people acquire more and more information from the Internet,but the quality of the search results is degraded strongly because of the presence of web spam.Web spam is one of the serious problems for search engines,and many methods have been proposed for spam detection.We exploit the content features of non-spam in contrast to those of spam.The content features for non-spam pages always possess lots of statistical regularities; but those for spam pages possess very few statistical regularities,because spam pages are made randomly in order to increase the page rank.In this paper,we summarize the regularities distributions of content features for non-spam pages,and propose the calculating probability formulae of the entropy and independent n-grams respectively.Furthermore,we put forward the calculation formulae of multi features correlation.Among them,the notable content features may be used as auxiliary information for spam detection.