You are here

MT-Based Query Translation CLIR Meets Frequent Case Generation

Journal Name:

Publication Year:

Author Name
Abstract (2. Language): 
The paper introduces the evaluation results of Cross Language Information Retrieval(CLlR) for three target languages, Finnish, German and Swedish using English as the source language. Our CLIR approach is based on machine translation of topics and usage of the Frequent Case Generation (FCG) method for management of query term variation in translated topics and retrieval in inflected indexes. Retrieval results of more standard query term variation management approaches, such as stemming and lemmatization of translated topics, are also shown. Results of the paper show, that when machine translation of queries are combined with FCG, results can be at best very promising. The best Machine Translation (MT) programs seem to translate standard laboratory type Information Retrieval (IR) topics quite well at least from the query performance point of view. Few times the translated queries perform as well as or slightly better than the monolingual baseline. Many times differences to monolingual baseline are small.
142-166

REFERENCES

References: 

Abusalah, Mustafa., Tait, John., & Oakcs, Michael. (2005). Literature Review of Cross Language Information Retrieval. Transactions on Engineering, Computing and Technology. V. 4, pp. 175177.
Airio, Eija., & Kettunen, Kimmo. (2009 ). Does Dictionary based bilingual retrieval work in non-normalized index? Information Processing and Management (to appear).
Ballestcros, Lisa., & Croft, W. B. (1997). Phrasal translation and query expansion techniques for cross-language information retrieval. In Proceedings of the 20th ACM SIGIR conference on research and development in information retrieval, 8491.
Braschler, Martin. (2004). Combination Approaches for Multilingual Text Retrieval, information Retrieval, 7 (1-2), 183-204.
Church, Kenneth W., & Hovy, E.H. (1993). Good application for crummy machine translation. Machine Translation. 8(4), 239-258.
G. Figucrola, C, Alonso Bcrroeal, J. L., Zazo. A. F,, & Gomez-Diaz, R. (2000). Retrieval of bilingual Spanish-English information by means of a standard automatic translation system. Working Notes for CLEF Workshop. Retrieved August 14,2008 from htlp:'. clef.isti.cnr.it/DIT..OSCLFF salanianca.pdf"
Grossman, David A. and Frieder, Ophir. (2004). Information Retrieval. Algorithms and Heuristics (T* ed.), Springer: Netherlands.Hcdlund. Turid. (2003) Dictionary-based Cross-language Information
Retrieval. At la Universitatis Tamperensis, 962. Hcdlund. Turid.. Airio, E., Keskustalo, H., Lchtokungas, R., Pirkola. A.. &
Jarvelin, Kalervo. (2004). Dictionary-based cross-language
information retrieval: learning experiences from CLEF 2000-2002.
Information Retrieval 7(1-2), 99-119. Jones. G.J.F. & Lam-Adesina. A.M. (2001). Exeter at CLEF 2001:
Experiments with Machine Translation for Bilingual Texts.
Retrieved August 19.2008 from
h 11 p : / / w w w .ercim.org/publication/ws-
proccedings/CLEF2(jones.p(lt' Kettunen, Kimmo. (2008). Automatic Generation of Frequent Case Forms
of Query Keywords in Text Retrieval. In Nordstrom, B. and Ranta, A.
(eds.), Advances in Natural Language Processing (pp.222-236).
GoTAI.200S.LNAI 5221.SpringerVeriag. Kettunen. Kimmo. (2009). Reductive and Generative Approaches to
Management of Morphological Variation of Keywords in
Monolingual Information Retrieval - an Overview. Journal of
Documentation, C5(71,267290. Kettunen. Kimmo., & Airio. Eija. (2006), Is a morphologically complex
language really that complex in full-text retrieval'.' Tn T. Salakoski et
al. (eds.). Advances in Natural Language Processing (pp.411-422),
LNAI4139. Berlin Heidelberg: Springer-Veriag. Kettunen, Kimmo., Airio, Eija., & Jarvelin, Kalervo. (2007). Restricted
Inflectional Form Generation in Management of Morphological
Keyword Variation. Information Retrieval 10(4-5), 45. Kishida. Kazuaki. (2005). Technical issues of cross-language information
retrieval: a review. Information Processing & Management, 41 (3),
433455.
Kraaij, Wessel. (2001). TNO at CLEF-2001: Comparing Translation Resources. In Working Notes for the CLEF 2001 Workshop. Retrieved August 21,2008 from
http: www.ercim.org publ ica'.ion'ws-proceedings/clef2/kraaij.pdf.Kraaij, Wessel., Nicy, J.-Y.. & Simard, M. (2003). Embedding Web-Based Statistical Translation Models in Cross-Language Information Retrieval. Computational Linguistics. (29)3,381-419.
Lam-Adesina. A.M. and Jones. G.J.F. (2002). Exeter at CLEF 2002: Experiments with Machine Translation for Monolingual and Bilingual Texts. Retrieved August 19.200K from http://clef.isti. cnr.ii/workshop2002/wn/7.pdf
Lam-Adesina, A. M., & Jones, G.J.F. (2003). Exeter at CLEF 2003: Experiments with Machine Translation for Monolingual. Bilingual and Multilingual Texts. Retrieved August 19,2008 from http://clef.isti.cnr.it/2003/WN web/18b.pdf
Lehtokangas, Raija., Keskustaio, Heikki., & Jarvelin. Kalervo. (2008). Experiments with Transitive Dictionary Translation and Pseudo-Relevance Feedback Using Graded Relevance Assessments. Journal of the American Society for Information Science and Technology. 59(3), 476-488.
The Lemur Toolkit for Language Modeling and Information. (2008).
Retrieval. Retrieved August 19.2008 from
http ://www. lemurproiect. org / Levow, Gina-Anne., Oard, D. W. and Resnik, P. (2005). Dictionary-based
techniques for cross-language information retrieval. Information
Processing & Management 41 (3), 523547. McNamee, Paul and Mayfield, James. (2002). Comparing Cross-Language
Querv Expansion Techniques by Degrading Translation Resources.
In Proceedings ofSigir'02, Tampere, Finland, 159-166. Metzlcr. Donald., & Croft, W. Bruce. (2004). Combining the Language
Model and Inference Network Approaches to Retrieval. Information
Processing and Management. Special Issue on Bayesian Networks
and Information Retrieval 40 (5), 735750. Monz, Christof. (2006). Statistical Machine Translation and Cross-Language IR: QMUL at CLEF 2006.
http://clcf.isti,cnr.it/2006/working notcs/workingnotes2006,monz CLEF2006.rjdfNie, Jian-Yun. (2003). Query Expansion and Query Translation
as Logical Inference. Journal of the American Society for Information Science and Technology, 54 (4), 335-346.
Oard, Douglas W., & Hacked. Paul. (1997). Document Translation for Cross-Language Text Retrieval at the University of Maryland. The Sixth Text Retrieval Conference (TREC 6). Retrieved August 19. 2008 from
http://trec.nist.gOv/pubs/trcc6/papers/unid.ps.g/'
Oard, Douglas W., & Diekema. Anne R. (1998). Cross-language information retrieval. In Martha E. Williams (ed.), AnnualRevie^vof information Science ami Technology (AR\?>T), V. 33, pp. 223256.
Pirkola, Ari. (1998). The effects of query structure and dictionary setups in dictionary-based cross-language information retrieval. In: Proceedings of the 21st Annual International A CM Sigir Conference on Research and Development in Information Retrieval, Melbourne, August 24-28. New York: ACM, 55-63.
Pirkola. Ari,. Hedlund. Turid,. Keskustalo, Hcikki., & Järvclin. Kalcrvo. (2001). Dictionary-hascd cross-language information retrieval: problems, methods, ?nd research findings. Information Retrieval 4(3/4), 209230.
Rasmussen, E. M. (2003). Indexing and Retrieval for the Web. In: Cronin, B. (ed.), Annual Review of Information Science and Technology, V 37, pp.91124.
Somers, Harold. (2004). Machine Translation: latest developments. In Mitkov, Ruslan (ed.). The Oxford Handbook of Computational Linguistics (pp.512-528). Oxford, New York: Oxford University Press.
Xu, Jinxi and Weichscdel. Ralph. 2004. Empirical studies on the impact of
lexical resources on CLIR performance. Information Processing &
Management, V. 41, pp. 475-487. Yang, Jin and Lange, Elke. 2003. Going live on the Internet. In Somers, H.
(ed.), Computers and Translation. A translator's guide (pp. 191-211).
Amsterdam ; Philadelphia: John Benjamins Publishing Company.

Thank you for copying data from http://www.arastirmax.com