首页> 外军国防科技报告 >The effects of indexing strategy-query term combination on retrieval effectiveness in a Swedish full text database
【2h】

The effects of indexing strategy-query term combination on retrieval effectiveness in a Swedish full text database

机译:索引策略 - 查询词组合对瑞典全文数据库中检索有效性的影响

代理获取
代理获取并翻译 | 示例

摘要

This thesis deals with Swedish full text retrieval and the problem of morphological variation of query terms in thedocument database. The study is an information retrieval experiment with a test collection. While no Swedish testcollection was available, such a collection was constructed. It consists of a document database containing 161,336news articles, and 52 topics with four-graded (0, 1, 2, 3) relevance assessments.

The effects of indexing strategy-query term combination on retrieval effectiveness were studied. Three of five testedmethods involved indexing strategies that used conflation, in the form of normalization. Further, two of these threecombinations used indexing strategies that employed compound splitting. Normalization and compound splittingwere performed by SWETWOL, a morphological analyzer for the Swedish language. A fourth combinationattempted to group related terms by right hand truncation of query terms. A search expert performed the truncation.The four combinations were compared to each other and to a baseline combination, where no attempt was made tocounteract the problem of morphological variation of query terms in the document database.

Two situations were examined in the evaluation: the binary relevance situation and the multiple degree relevancesituation. With regard to the binary relevance situation, where the three (positive) relevance degrees (1, 2, 3) weremerged into one, and where precision was used as evaluation measure, the four alternative combinationsoutperformed the baseline. The best performing combination was the combination that used truncation. Thiscombination performed better than or equal to a median precision value for 41 of the 52 topics. One reason for therelatively good performance of the truncation combination was the capacity of its queries to retrieve different partsof speech.

In the multiple degree relevance situation, where the three (positive) relevance degrees were retained, retrievaleffectiveness was taken to be the accumulated gain the user receives by examining the retrieval result up to givenpositions. The evaluation measure used was nDCG (normalized cumulated gain with discount). This measurecredits retrieval methods that (1) rank highly relevant documents higher than less relevant ones, and (2) rankrelevant (of any degree) documents high. With respect to (2), nDCG involves a discount component: a discount withregard to the relevance score of a relevant (of any degree) document is performed, and this discount is greater andgreater, the higher position the document has in the ranked list of retrieved documents.

In the multiple degree relevance situation, the five combinations were evaluated under four different user scenarios,where each scenario simulated a certain user type. Again, the four alternative combinations outperformed thebaseline, for each user scenario. The truncation combination had the best performance under each user scenario.This outcome agreed with the performance result in the binary relevance situation. However, there were alsodifferences between the two relevance situations. For 25 percent of the topics and with regard to one of the four userscenarios, the set of best performing combinations in the binary relevance situation was disjunct from the set of bestperforming combinations in the multiple degree relevance situation. The user scenario in question was such thatalmost all importance was placed on highly relevant documents, and the discount was sharp.

The main conclusion of the thesis is that normalization and right hand truncation (performed by a search expert)enhanced retrieval effectiveness in comparison to the baseline, irrespective of which of the two relevance situationswe consider. Further, the three indexing strategy-query term combinations based on normalization were almost asgood as the combination that involves truncation. This holds for both relevance situations.

著录项

  • 作者

    Ahlgren, Per;

  • 作者单位
  • 年(卷),期 2015(),
  • 年度 2015
  • 页码
  • 总页数 163
  • 原文格式 PDF
  • 正文语种
  • 中图分类
  • 网站名称 在线学术档案数据库
  • 栏目名称 所有文件
  • 关键词

代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号