Two lines of research on eye movements in reading are summarized. One line of research examines how adult readers identify compound words during reading. The other line of research deals with how a specific reading go...Two lines of research on eye movements in reading are summarized. One line of research examines how adult readers identify compound words during reading. The other line of research deals with how a specific reading goal influences the way long expository texts are read. Both lines of research are conducted using Finnish as the source language. With respect to the first research question, it is demonstrated that compound words are recognized either holistically or via their components, depending on the length of the compound word. Readers begin to process whatever information is readily available in the foveal vision(i.e., either the whole-word form or the initial component). The second line of research demonstrates that(1)a specific reading goal is capable of exerting an early effect on readers’ eye fixation patterns,(2)time course analyses based on eye movement patterns can reveal interesting individual differences, and(3)working memory capacity is linked to the efficiency to strategically allocate attention as well as to encode information to and retrieve it from the long-term memory. It is concluded that the eye-tracking technique is an excellent research tool to tap into the workings of the human mind during the comprehension of written texts.展开更多
The sharp increase of the amount of Internet Chinese text data has significantly prolonged the processing time of classification on these data.In order to solve this problem,this paper proposes and implements a parall...The sharp increase of the amount of Internet Chinese text data has significantly prolonged the processing time of classification on these data.In order to solve this problem,this paper proposes and implements a parallel naive Bayes algorithm(PNBA)for Chinese text classification based on Spark,a parallel memory computing platform for big data.This algorithm has implemented parallel operation throughout the entire training and prediction process of naive Bayes classifier mainly by adopting the programming model of resilient distributed datasets(RDD).For comparison,a PNBA based on Hadoop is also implemented.The test results show that in the same computing environment and for the same text sets,the Spark PNBA is obviously superior to the Hadoop PNBA in terms of key indicators such as speedup ratio and scalability.Therefore,Spark-based parallel algorithms can better meet the requirement of large-scale Chinese text data mining.展开更多
With the rising and spreading of micro-blog, the sentiment classification of short texts has become a research hotspot. Some methods have been developed in the past decade. However, since the Chinese and English are d...With the rising and spreading of micro-blog, the sentiment classification of short texts has become a research hotspot. Some methods have been developed in the past decade. However, since the Chinese and English are different in language syntax, semantics and pragmatics, sentiment classification methods that are effective for English twitter may fail on Chinese micro-blog. In addition, the colloquialism and conciseness of short Chinese texts introduces additional challenges to sentiment classification. In this work, a novel hybrid learning model was proposed for sentiment classification of Chinese micro-blogs, which included two stages. In the first stage, emotional scores were calculated over the whole dataset by utilizing an improved Chinese-oriented sentiment dictionary classification method. Data with extremely high or low scores were directly labeled. In the second stage, the remaining data were labeled by using an integrated classification method based on sentiment dictionary, support vector machine(SVM) and k-nearest neighbor(KNN). An improved feature selection method was adopted to enhance the discriminative power of the selected features. The two-stage hybrid framework made the proposed method effective for sentiment classification of Chinese micro-blogs. Experiments on the COAE2014(Chinese Opinion Analysis Evaluation 2014) dataset show that the proposed method outperforms other schemes.展开更多
To avoid the curse of dimensionality, text categorization (TC) algorithms based on machine learning (ML) have to use an feature selection (FS) method to reduce the dimensionality of feature space. Although havin...To avoid the curse of dimensionality, text categorization (TC) algorithms based on machine learning (ML) have to use an feature selection (FS) method to reduce the dimensionality of feature space. Although having been widely used, FS process will generally cause information losing and then have much side-effect on the whole performance of TC algorithms. On the basis of the sparsity characteristic of text vectors, a new TC algorithm based on lazy feature selection (LFS) is presented. As a new type of embedded feature selection approach, the LFS method can greatly reduce the dimension of features without any information losing, which can improve both efficiency and performance of algorithms greatly. The experiments show the new algorithm can simultaneously achieve much higher both performance and efficiency than some of other classical TC algorithms.展开更多
In this article I discuss data from a series of experiments in which readers’ eye movements were recorded as they processed sentences in which each word disappeared or was masked 60ms after fixation onset. We used th...In this article I discuss data from a series of experiments in which readers’ eye movements were recorded as they processed sentences in which each word disappeared or was masked 60ms after fixation onset. We used this paradigm to investigate whether we could induce a gap effect during reading, and how visual and linguistic factors affected eye movements under these conditions. The data showed that no gap effect occurred in our experiment. Overall reading times were the same under normal and disappearing presentation conditions. However, readers did adopt a strategy of making fewer but longer fixations when the text disappeared than when it did not. Additionally, clear frequency effects occurred regardless of whether the text was presented normally or disappeared. This finding indicates that while the visual uptake of information is important, cognitive processes associated with the lexical identification of words are a primary influence on when readers move their eyes during reading. The findings are taken to support the E-Z Reader model of eye movement control.展开更多
Cohesion and coherence are two of the most important components in discourse analysis. This thesis investigales some cohesive devices and coherent means. At zhe same time, it gives an account of how a text is identifi...Cohesion and coherence are two of the most important components in discourse analysis. This thesis investigales some cohesive devices and coherent means. At zhe same time, it gives an account of how a text is identified as a text. It also discusses the relationship between cohesion and coherence.展开更多
In order to avoid the risk of saltwater intrusion for large amount pumping groundwater,this study used small flux group drilling pumping-pouring text,and simulated the fall of water depth and recovery value with Feflo...In order to avoid the risk of saltwater intrusion for large amount pumping groundwater,this study used small flux group drilling pumping-pouring text,and simulated the fall of water depth and recovery value with Feflow software,and obtained a group of the best hydro-geological parameters.Compared with that of the method for calculating parameter with展开更多
With the advancement of content-based retrieval technology, the importance of semantics for text information contained in images attracts many researchers. An algorithm which will automatically locate the textual regi...With the advancement of content-based retrieval technology, the importance of semantics for text information contained in images attracts many researchers. An algorithm which will automatically locate the textual regions in the input image will facilitate the retrieving task, and the optical character recognizer can then be applied to only those regions of the image which contain text. In this paper a new text location method based wavelet is described, which can be used to locate textual regions from complex image and video frame. Experimental results show that the textual regions in image can be located effectively and quickly.展开更多
No text written in the mid-nineteenth century has held the road until today as well as the Communist Manifesto of 1848.There is no any other text written in the middle of the 21th century which retained validity until...No text written in the mid-nineteenth century has held the road until today as well as the Communist Manifesto of 1848.There is no any other text written in the middle of the 21th century which retained validity until this day as well as the Manifesto of the Communist Party.Even today entire paragraphs of the text correspond to the contemporary reality even better than in 1848.Starting from the premises which were hardly visible in the era,Marx and Engels drew the conclusions which the deployment of 170 years of history fully consolidated.In this article I will give further enlightening examples.展开更多
In the course of mechanical part designing, process p lanning and assembling designing, we often have to calculate and analyse a dimen sion chain. Traditionally, a dimension chain is established and calculated m anual...In the course of mechanical part designing, process p lanning and assembling designing, we often have to calculate and analyse a dimen sion chain. Traditionally, a dimension chain is established and calculated m anually. With wide computer application in the field of mechanical design and ma nufacture, people began to use a computer to acquire and calculate a dimension c hain automatically. In reported work, a dimension chain can be established and c alculated automatically. However, dimension text values of dimensions composing a dimension chain and these dimensions’ tolerance’s upper values and lower valu es are put into a computer manually, which is inefficient and easy to make mis takes. In order to overcome above difficulties. it is very important to acquir e noted dimensions automatically, furthermore analyse and calculate a dimens ion chain, then show results. At present AutoCAD softwares of Autodesk company h ave been used popularly in mechanical designing. For automatically acquiring noted dimensions, analyzing and calculating a dimension chain in a design draw in AutoCAD, this paper introduces the solvable scheme of automatic dimension acq uisition and dimension chain calculation in AutoCAD by ObjectARX. ObjectARX is a developing tool for AutoCAD. In this paper a dimension chain is expressed b y three matrixes, which respectively stand for dimension text value matrix, tole rance’s upper value matrix and tolerance’s lower value matrix. The developed p rogram can be used to both calculate a assembling dimension chain, and a process dimension chain. When the program running in AutoCAD, noted dimensions comp osing a dimension chain in AutoCAD are selected in turn with a mouse, then the c omputer begin to calculate the dimension chain and results are shown in a dialog box. A running example is given in this paper.展开更多
文摘Two lines of research on eye movements in reading are summarized. One line of research examines how adult readers identify compound words during reading. The other line of research deals with how a specific reading goal influences the way long expository texts are read. Both lines of research are conducted using Finnish as the source language. With respect to the first research question, it is demonstrated that compound words are recognized either holistically or via their components, depending on the length of the compound word. Readers begin to process whatever information is readily available in the foveal vision(i.e., either the whole-word form or the initial component). The second line of research demonstrates that(1)a specific reading goal is capable of exerting an early effect on readers’ eye fixation patterns,(2)time course analyses based on eye movement patterns can reveal interesting individual differences, and(3)working memory capacity is linked to the efficiency to strategically allocate attention as well as to encode information to and retrieve it from the long-term memory. It is concluded that the eye-tracking technique is an excellent research tool to tap into the workings of the human mind during the comprehension of written texts.
基金Project(KC18071)supported by the Application Foundation Research Program of Xuzhou,ChinaProjects(2017YFC0804401,2017YFC0804409)supported by the National Key R&D Program of China
文摘The sharp increase of the amount of Internet Chinese text data has significantly prolonged the processing time of classification on these data.In order to solve this problem,this paper proposes and implements a parallel naive Bayes algorithm(PNBA)for Chinese text classification based on Spark,a parallel memory computing platform for big data.This algorithm has implemented parallel operation throughout the entire training and prediction process of naive Bayes classifier mainly by adopting the programming model of resilient distributed datasets(RDD).For comparison,a PNBA based on Hadoop is also implemented.The test results show that in the same computing environment and for the same text sets,the Spark PNBA is obviously superior to the Hadoop PNBA in terms of key indicators such as speedup ratio and scalability.Therefore,Spark-based parallel algorithms can better meet the requirement of large-scale Chinese text data mining.
基金Projects(61573380,61303185)supported by the National Natural Science Foundation of ChinaProject(13BTQ052)supported by the National Social Science Foundation of China+1 种基金Project(2016M592450)supported by the China Postdoctoral Science FoundationProject(2016JJ4119)supported by the Hunan Provincial Natural Science Foundation of China
文摘With the rising and spreading of micro-blog, the sentiment classification of short texts has become a research hotspot. Some methods have been developed in the past decade. However, since the Chinese and English are different in language syntax, semantics and pragmatics, sentiment classification methods that are effective for English twitter may fail on Chinese micro-blog. In addition, the colloquialism and conciseness of short Chinese texts introduces additional challenges to sentiment classification. In this work, a novel hybrid learning model was proposed for sentiment classification of Chinese micro-blogs, which included two stages. In the first stage, emotional scores were calculated over the whole dataset by utilizing an improved Chinese-oriented sentiment dictionary classification method. Data with extremely high or low scores were directly labeled. In the second stage, the remaining data were labeled by using an integrated classification method based on sentiment dictionary, support vector machine(SVM) and k-nearest neighbor(KNN). An improved feature selection method was adopted to enhance the discriminative power of the selected features. The two-stage hybrid framework made the proposed method effective for sentiment classification of Chinese micro-blogs. Experiments on the COAE2014(Chinese Opinion Analysis Evaluation 2014) dataset show that the proposed method outperforms other schemes.
文摘To avoid the curse of dimensionality, text categorization (TC) algorithms based on machine learning (ML) have to use an feature selection (FS) method to reduce the dimensionality of feature space. Although having been widely used, FS process will generally cause information losing and then have much side-effect on the whole performance of TC algorithms. On the basis of the sparsity characteristic of text vectors, a new TC algorithm based on lazy feature selection (LFS) is presented. As a new type of embedded feature selection approach, the LFS method can greatly reduce the dimension of features without any information losing, which can improve both efficiency and performance of algorithms greatly. The experiments show the new algorithm can simultaneously achieve much higher both performance and efficiency than some of other classical TC algorithms.
文摘In this article I discuss data from a series of experiments in which readers’ eye movements were recorded as they processed sentences in which each word disappeared or was masked 60ms after fixation onset. We used this paradigm to investigate whether we could induce a gap effect during reading, and how visual and linguistic factors affected eye movements under these conditions. The data showed that no gap effect occurred in our experiment. Overall reading times were the same under normal and disappearing presentation conditions. However, readers did adopt a strategy of making fewer but longer fixations when the text disappeared than when it did not. Additionally, clear frequency effects occurred regardless of whether the text was presented normally or disappeared. This finding indicates that while the visual uptake of information is important, cognitive processes associated with the lexical identification of words are a primary influence on when readers move their eyes during reading. The findings are taken to support the E-Z Reader model of eye movement control.
文摘Cohesion and coherence are two of the most important components in discourse analysis. This thesis investigales some cohesive devices and coherent means. At zhe same time, it gives an account of how a text is identified as a text. It also discusses the relationship between cohesion and coherence.
文摘In order to avoid the risk of saltwater intrusion for large amount pumping groundwater,this study used small flux group drilling pumping-pouring text,and simulated the fall of water depth and recovery value with Feflow software,and obtained a group of the best hydro-geological parameters.Compared with that of the method for calculating parameter with
文摘With the advancement of content-based retrieval technology, the importance of semantics for text information contained in images attracts many researchers. An algorithm which will automatically locate the textual regions in the input image will facilitate the retrieving task, and the optical character recognizer can then be applied to only those regions of the image which contain text. In this paper a new text location method based wavelet is described, which can be used to locate textual regions from complex image and video frame. Experimental results show that the textual regions in image can be located effectively and quickly.
文摘No text written in the mid-nineteenth century has held the road until today as well as the Communist Manifesto of 1848.There is no any other text written in the middle of the 21th century which retained validity until this day as well as the Manifesto of the Communist Party.Even today entire paragraphs of the text correspond to the contemporary reality even better than in 1848.Starting from the premises which were hardly visible in the era,Marx and Engels drew the conclusions which the deployment of 170 years of history fully consolidated.In this article I will give further enlightening examples.
文摘In the course of mechanical part designing, process p lanning and assembling designing, we often have to calculate and analyse a dimen sion chain. Traditionally, a dimension chain is established and calculated m anually. With wide computer application in the field of mechanical design and ma nufacture, people began to use a computer to acquire and calculate a dimension c hain automatically. In reported work, a dimension chain can be established and c alculated automatically. However, dimension text values of dimensions composing a dimension chain and these dimensions’ tolerance’s upper values and lower valu es are put into a computer manually, which is inefficient and easy to make mis takes. In order to overcome above difficulties. it is very important to acquir e noted dimensions automatically, furthermore analyse and calculate a dimens ion chain, then show results. At present AutoCAD softwares of Autodesk company h ave been used popularly in mechanical designing. For automatically acquiring noted dimensions, analyzing and calculating a dimension chain in a design draw in AutoCAD, this paper introduces the solvable scheme of automatic dimension acq uisition and dimension chain calculation in AutoCAD by ObjectARX. ObjectARX is a developing tool for AutoCAD. In this paper a dimension chain is expressed b y three matrixes, which respectively stand for dimension text value matrix, tole rance’s upper value matrix and tolerance’s lower value matrix. The developed p rogram can be used to both calculate a assembling dimension chain, and a process dimension chain. When the program running in AutoCAD, noted dimensions comp osing a dimension chain in AutoCAD are selected in turn with a mouse, then the c omputer begin to calculate the dimension chain and results are shown in a dialog box. A running example is given in this paper.