Neural Ranking Models for Information Retrieval
keywords:
Information retrieval, deep learning, semantic similarity
Objectives:
The objective of this thesis is to apply shallow or deep neural networks to information retrieval. Text retrieval task consists on ranking text documents in response to a query. This task comes with some challenges: (1) query-document vocabulary mismatch problem and semantic understanding (a good IR system may consider the terms “hot” and “warm” related, as well as “dog” and “puppy”, but must also distinguish that a user who submits the query “hot dog” is not looking for a “warm puppy”); (2) to interpret words based on context (given a query about Prime Minister of UK it may be obvious from context whether it refers to John Major or Theresa May); (3) to find relevant documents that may also contain large sections of irrelevant text; etc. In this thesis the student will explore the use of shallow and deep neural networks to overcome these challenges of information retrieval tasks. Most existing shallow neural methods for IR focus on inexact matching using term embeddings, comparing the query with the document directly in the embedding space or using embeddings for query expansion. An alternative is to directly incorporate word embeddings within deep neural network models, developing an end-to-end neural network architecture for information retrieval.
Task:
- Learning vector representations of language from raw text that can bridge the gap between query and document vocabulary.
- Applying shallow neural IR methods that employ pre-trained neural term embeddings.
- Applying deep neural networks to learn the IR task end-to-end.
References:
F. Diaz, B. Mitra, N. Craswell. Query Expansion with Locally-Trained Word Embeddings. Proc. ACL. 2016.
J. Guo, Y. Fan, Q. Ai, W.B. Croft. A Deep Relevance Matching Model for Ad-hoc Retrieval. 25th ACM International Conference on Information and Knowledge Management. 2016.
CD. Manning, P. Raghavan, H. Schütze. Introduction to Information Retrieval. Cambridge University Press. 2008.
B. Mitra, N. Craswell. Neural Models for Information Retrieval. https://arxiv.org/abs/1705.01509. 2017.
D. Roy, D. Paul, M. Mitra, U. Garain. Using Word Embeddings for Automatic Query Expansion. Neu-IR '16 SIGIR Workshop on Neural Information Retrieval. 2016.
J. Rygl, J. Pomikálek, R. Řehůřek, M. Růžička, V. Novotný, P. Sojka. Semantic Vector Encoding and Similarity Search Using Fulltext Search Engines. Proceedings of the 2nd Workshop on Representation Learning for NLP., Association for Computational Linguistics. 2017.
C. Xiong, Z. Dai, J. Callan, Z. Liu, R. Power. End-to-End Neural Ad-hoc Ranking with Kernel Pooling. Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2017.
J. Guo, Y. Fan, Q. Ai, W.B. Croft. A Deep Relevance Matching Model for Ad-hoc Retrieval. 25th ACM International Conference on Information and Knowledge Management. 2016.
CD. Manning, P. Raghavan, H. Schütze. Introduction to Information Retrieval. Cambridge University Press. 2008.
B. Mitra, N. Craswell. Neural Models for Information Retrieval. https://arxiv.org/abs/1705.01509. 2017.
D. Roy, D. Paul, M. Mitra, U. Garain. Using Word Embeddings for Automatic Query Expansion. Neu-IR '16 SIGIR Workshop on Neural Information Retrieval. 2016.
J. Rygl, J. Pomikálek, R. Řehůřek, M. Růžička, V. Novotný, P. Sojka. Semantic Vector Encoding and Similarity Search Using Fulltext Search Engines. Proceedings of the 2nd Workshop on Representation Learning for NLP., Association for Computational Linguistics. 2017.
C. Xiong, Z. Dai, J. Callan, Z. Liu, R. Power. End-to-End Neural Ad-hoc Ranking with Kernel Pooling. Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2017.
Team:
Arantxa Otegi, Eneko Agirre
Profile:
Computer scientist
File:
contact:
arantza.otegi[abildua|at]ehu.eus
Date:
2017