Purpose:We present an analytical,open source and flexible natural language processing and text mining method for topic evolution,emerging topic detection and research trend forecasting for all kinds of data-tagged tex...Purpose:We present an analytical,open source and flexible natural language processing and text mining method for topic evolution,emerging topic detection and research trend forecasting for all kinds of data-tagged text.Design/methodology/approach:We make full use of the functions provided by the open source VOSviewer and Microsoft Office,including a thesaurus for data clean-up and a LOOKUP function for comparative analysis.Findings:Through application and verification in the domain of perovskite solar cells research,this method proves to be effective.Research limitations:A certain amount of manual data processing and a specific research domain background are required for better,more illustrative analysis results.Adequate time for analysis is also necessary.Practical implications:We try to set up an easy,useful,and flexible interdisciplinary text analyzing procedure for researchers,especially those without solid computer programming skills or who cannot easily access complex software.This procedure can also serve as a wonderful example for teaching information literacy.Originality/value:This text analysis approach has not been reported before.展开更多
Purpose:This paper introduces an analysis framework for tracking the evolution of research topics at the selected topics level,covering a research topic’s evolution trend,evolution path and its content changes over t...Purpose:This paper introduces an analysis framework for tracking the evolution of research topics at the selected topics level,covering a research topic’s evolution trend,evolution path and its content changes over time.Design/methodology/approach:After the topics were recovered by the author-topic model,we first built the keyword-topic co-occurrence network to track the dynamics of topic trends.Then a single-mode network was constructed with each node representing a topic and edge indicating the relationship between topics.It was used to illustrate the evolution path and content changes of research topics.A case study was conducted on the digital library research in China to verify the effectiveness of the analysis framework.Findings:The experimental results show that this analysis framework can be used to track evolution of research topics at a micro level and using social network analysis method can help understand research topics’evolution paths and content changes with the passage of time.Research limitations:Using the analysis framework will produce limited results when examining unstructured data such as social media data.In addition,the effectiveness of the framework introduced in this paper needs to be verified with more research topics in information science and in more scientific fields.Practical implications:This analysis framework can help scholars and researchers map research topics’evolution process and gain insights into how a field’s topics have evolved over time.Originality/value:Tbe analysis framework used in this study can help reveal more micro evolution details.The index to measure topic association strength defined in this paper reflects both similarity and dissimilarity between topics,which belps better understand research topics’evolution paths and content changes.展开更多
The problem of "rich topics get richer"(RTGR) is popular to the topic models,which will bring the wrong topic distribution if the distributing process has not been intervened.In standard LDA(Latent Dirichlet...The problem of "rich topics get richer"(RTGR) is popular to the topic models,which will bring the wrong topic distribution if the distributing process has not been intervened.In standard LDA(Latent Dirichlet Allocation) model,each word in all the documents has the same statistical ability.In fact,the words have different impact towards different topics.Under the guidance of this thought,we extend ILDA(Infinite LDA) by considering the bias role of words to divide the topics.We propose a self-adaptive topic model to overcome the RTGR problem specifically.The model proposed in this paper is adapted to three questions:(1) the topic number is changeable with the collection of the documents,which is suitable for the dynamic data;(2) the words have discriminating attributes to topic distribution;(3) a selfadaptive method is used to realize the automatic re-sampling.To verify our model,we design a topic evolution analysis system which can realize the following functions:the topic classification in each cycle,the topic correlation in the adjacent cycles and the strength calculation of the sub topics in the order.The experiment both on NIPS corpus and our self-built news collections showed that the system could meet the given demand,the result was feasible.展开更多
Purpose:This article aims to describe the global research profile and the development trends of single cell research from the perspective of bibliometric analysis and semantic mining.Design/methodology/approach:The li...Purpose:This article aims to describe the global research profile and the development trends of single cell research from the perspective of bibliometric analysis and semantic mining.Design/methodology/approach:The literatures on single cell research were extracted from Clarivate Analytic’s Web of Science Core Collection between 2009 and 2019.Firstly,bibliometric analyses were performed with Thomson Data Analyzer(TDA).Secondly,topic identification and evolution trends of single cell research was conducted through the LDA topic model.Thirdly,taking the post-discretized method which is used for topic evolution analysis for reference,the topics were also be dispersed to countries to detect the spatial distribution.Findings:The publication of single cell research shows significantly increasing tendency in the last decade.The topics of single cell research field can be divided into three categories,which respectively refers to single cell research methods,mechanism of biological process,and clinical application of single cell technologies.The different trends of these categories indicate that technological innovation drives the development of applied research.The continuous and rapid growth of the topic strength in the field of cancer diagnosis and treatment indicates that this research topic has received extensive attention in recent years.The topic distributions of some countries are relatively balanced,while for the other countries,several topics show significant superiority.Research limitations:The analyzed data of this study only contain those were included in the Web of Science Core Collection.Practical implications:This study provides insights into the research progress regarding single cell field and identifies the most concerned topics which reflect potential opportunities and challenges.The national topic distribution analysis based on the post-discretized analysis method extends topic analysis from time dimension to space dimension.Originality/value:This paper combines bibliometric analysis and LDA model to analyze the evolution trends of single cell research field.The method of extending post-discretized analysis from time dimension to space dimension is distinctive and insightful.展开更多
Purpose: This study aims to reveal the landscape and trends ofgraphene research in the world by using data from Chemical Abstracts Service (CAS). Design/methodology/approach: Index data from CAS have been retrieve...Purpose: This study aims to reveal the landscape and trends ofgraphene research in the world by using data from Chemical Abstracts Service (CAS). Design/methodology/approach: Index data from CAS have been retrieved on 78,756 papers and 23,057 patents on graphene from 1985 to March 2016, and scientometric methods were used to analyze the growth and distribution of R&D output, topic distribution and evolution, and distribution and evolution of substance properties and roles. Findings: In recent years R&D in graphene keeps in rapid growth, while China, South Korea and United States are the largest producers in research but China is relatively weak in patent applications in other countries. Research topics in graphene are continuously expanding from mechanical, material, and electrical properties to a diverse range of application areas such as batteries, capacitors, semiconductors, and sensors devices. The roles of emerging substances are increasing in Preparation and Biological Study. More techniques have been included to improve the preparation processes and applications of graphene in various fields. Research limitations: Only data from CAS is used and some R&D activities solely reported through other channels may be missed. Also more detailed analysis need to be done to reveal the impact of research on development or vice verse, development dynamics among the players, and impact of emerging terms or substance roles on research and technology development. Practical implications: This will provide a valuable reference for scientists and developers, R&D managers, R&D policy makers, industrial and business investers to understand the landscape and trends ofgraphene research. Its methodologies can be applied to other fields or with data from other similar sources.Originality/value: The integrative use of indexing data on papers and patents of CAS and the systematic exploration of the distribution trends in output, topics, substance roles are distinctive and insightful.展开更多
文摘Purpose:We present an analytical,open source and flexible natural language processing and text mining method for topic evolution,emerging topic detection and research trend forecasting for all kinds of data-tagged text.Design/methodology/approach:We make full use of the functions provided by the open source VOSviewer and Microsoft Office,including a thesaurus for data clean-up and a LOOKUP function for comparative analysis.Findings:Through application and verification in the domain of perovskite solar cells research,this method proves to be effective.Research limitations:A certain amount of manual data processing and a specific research domain background are required for better,more illustrative analysis results.Adequate time for analysis is also necessary.Practical implications:We try to set up an easy,useful,and flexible interdisciplinary text analyzing procedure for researchers,especially those without solid computer programming skills or who cannot easily access complex software.This procedure can also serve as a wonderful example for teaching information literacy.Originality/value:This text analysis approach has not been reported before.
文摘Purpose:This paper introduces an analysis framework for tracking the evolution of research topics at the selected topics level,covering a research topic’s evolution trend,evolution path and its content changes over time.Design/methodology/approach:After the topics were recovered by the author-topic model,we first built the keyword-topic co-occurrence network to track the dynamics of topic trends.Then a single-mode network was constructed with each node representing a topic and edge indicating the relationship between topics.It was used to illustrate the evolution path and content changes of research topics.A case study was conducted on the digital library research in China to verify the effectiveness of the analysis framework.Findings:The experimental results show that this analysis framework can be used to track evolution of research topics at a micro level and using social network analysis method can help understand research topics’evolution paths and content changes with the passage of time.Research limitations:Using the analysis framework will produce limited results when examining unstructured data such as social media data.In addition,the effectiveness of the framework introduced in this paper needs to be verified with more research topics in information science and in more scientific fields.Practical implications:This analysis framework can help scholars and researchers map research topics’evolution process and gain insights into how a field’s topics have evolved over time.Originality/value:Tbe analysis framework used in this study can help reveal more micro evolution details.The index to measure topic association strength defined in this paper reflects both similarity and dissimilarity between topics,which belps better understand research topics’evolution paths and content changes.
基金ACKNOWLEDGMENTS This work is supported by grants National 973 project (No.2013CB29606), Natural Science Foundation of China (No.61202244), research fund of ShangQiu Normal Colledge (No. 2013GGJS013). N1PS corpus is supported by SourceForge. We thank the anonymous reviewers for their helpful comments.
文摘The problem of "rich topics get richer"(RTGR) is popular to the topic models,which will bring the wrong topic distribution if the distributing process has not been intervened.In standard LDA(Latent Dirichlet Allocation) model,each word in all the documents has the same statistical ability.In fact,the words have different impact towards different topics.Under the guidance of this thought,we extend ILDA(Infinite LDA) by considering the bias role of words to divide the topics.We propose a self-adaptive topic model to overcome the RTGR problem specifically.The model proposed in this paper is adapted to three questions:(1) the topic number is changeable with the collection of the documents,which is suitable for the dynamic data;(2) the words have discriminating attributes to topic distribution;(3) a selfadaptive method is used to realize the automatic re-sampling.To verify our model,we design a topic evolution analysis system which can realize the following functions:the topic classification in each cycle,the topic correlation in the adjacent cycles and the strength calculation of the sub topics in the order.The experiment both on NIPS corpus and our self-built news collections showed that the system could meet the given demand,the result was feasible.
基金the Chinese Academy of Sciences literature information capability construction project of 2020“Construction of strategic information research and consultation system in science and technology field”(Grant No.E290001)。
文摘Purpose:This article aims to describe the global research profile and the development trends of single cell research from the perspective of bibliometric analysis and semantic mining.Design/methodology/approach:The literatures on single cell research were extracted from Clarivate Analytic’s Web of Science Core Collection between 2009 and 2019.Firstly,bibliometric analyses were performed with Thomson Data Analyzer(TDA).Secondly,topic identification and evolution trends of single cell research was conducted through the LDA topic model.Thirdly,taking the post-discretized method which is used for topic evolution analysis for reference,the topics were also be dispersed to countries to detect the spatial distribution.Findings:The publication of single cell research shows significantly increasing tendency in the last decade.The topics of single cell research field can be divided into three categories,which respectively refers to single cell research methods,mechanism of biological process,and clinical application of single cell technologies.The different trends of these categories indicate that technological innovation drives the development of applied research.The continuous and rapid growth of the topic strength in the field of cancer diagnosis and treatment indicates that this research topic has received extensive attention in recent years.The topic distributions of some countries are relatively balanced,while for the other countries,several topics show significant superiority.Research limitations:The analyzed data of this study only contain those were included in the Web of Science Core Collection.Practical implications:This study provides insights into the research progress regarding single cell field and identifies the most concerned topics which reflect potential opportunities and challenges.The national topic distribution analysis based on the post-discretized analysis method extends topic analysis from time dimension to space dimension.Originality/value:This paper combines bibliometric analysis and LDA model to analyze the evolution trends of single cell research field.The method of extending post-discretized analysis from time dimension to space dimension is distinctive and insightful.
文摘Purpose: This study aims to reveal the landscape and trends ofgraphene research in the world by using data from Chemical Abstracts Service (CAS). Design/methodology/approach: Index data from CAS have been retrieved on 78,756 papers and 23,057 patents on graphene from 1985 to March 2016, and scientometric methods were used to analyze the growth and distribution of R&D output, topic distribution and evolution, and distribution and evolution of substance properties and roles. Findings: In recent years R&D in graphene keeps in rapid growth, while China, South Korea and United States are the largest producers in research but China is relatively weak in patent applications in other countries. Research topics in graphene are continuously expanding from mechanical, material, and electrical properties to a diverse range of application areas such as batteries, capacitors, semiconductors, and sensors devices. The roles of emerging substances are increasing in Preparation and Biological Study. More techniques have been included to improve the preparation processes and applications of graphene in various fields. Research limitations: Only data from CAS is used and some R&D activities solely reported through other channels may be missed. Also more detailed analysis need to be done to reveal the impact of research on development or vice verse, development dynamics among the players, and impact of emerging terms or substance roles on research and technology development. Practical implications: This will provide a valuable reference for scientists and developers, R&D managers, R&D policy makers, industrial and business investers to understand the landscape and trends ofgraphene research. Its methodologies can be applied to other fields or with data from other similar sources.Originality/value: The integrative use of indexing data on papers and patents of CAS and the systematic exploration of the distribution trends in output, topics, substance roles are distinctive and insightful.