摘要
利用搜狗(Sogou)2007年3月的查询日志,使用词性标注方法,得出高频词性标注结果的分布特征。发现用户在查询中以使用名词为主,动词为辅,鲜有其它词类出现在高频词性标注结果中。以"的"为代表的虚词较少地出现在高频词性标注结果中。网络搜索的查询式与自然语言在句法上存在一定差异,但也有相通之处。用户主要使用名词进行概念性检索,关键词仍为用户进行检索的主要手段。高频词性标注结果部分符合Zipf定律。
The paper analyzes the query logs in March, 2007, from Sogou search engine. POS tagging is used to get the characters of high frequency POS results. Web users use nouns as primary and verbs as complementary methods in Web queries ; but other parts of speech seldom appear in the queries. The empty words in natural language, such as "的", do not appear in the high frequency POS results very often. Queries in the Web searching are different from natural language in syntax to a certain degree and they have shared characters at the same time. Web users' use nouns to do concept -focused retrieval and keywords are still the primary method to search on the Web. The high frequency results of POS tagging partially obey the Zipf' s law.
出处
《现代图书情报技术》
CSSCI
北大核心
2009年第4期50-56,共7页
New Technology of Library and Information Service
关键词
日志挖掘
词性标注
语言行为
词性分布
查询句法
Log mining Part -of- speech tagging Language behavior POS distribution QuerT syntax
作者简介
E—mail:pqu@pku.edu.cn