CSE FACULTY SEMINARS


Title of Talk: A Brief History of Question Answering


Dr. Soumen Chakrabarti, Professor, IIT Bombay

Short Bio:
Soumen Chakrabarti is a Professor of Computer Science at IIT Bombay. He got his PhD from University of California, Berkeley and worked on Clever Web search and Focused Crawling at IBM Almaden Research Center. He has also worked at Carnegie-Mellon University and Google. He works on linking unstructured text to knowledge bases and exploiting these links for better search and ranking. Other interests include link formation and influence propagation in social networks, and personalized proximity search in graphs. He has published extensively in WWW, SIGKDD, EMNLP, ACL, IJCAI, AAAI, SIGIR, VLDB, ICDE and other conferences. His work on keyword search in databases got the 10-year influential paper award at ICDE 2012. He is also the author of one of the earliest books on Web search and mining.
https://www.cse.iitb.ac.in/~soumen/main/bio.html

Date: May 07, 2020; Thursday
Time: 4:00 PM IST

Abstract:

Web search has come a long way from matching query words with document words. It is now mediated by knowledge graphs (KGs) such as Freebase, having hundreds of millions of entities belonging to tens of thousands of types, connected by billions of relations. Also essential is to annotate token spans in the Web corpus with canonical types (e.g. `scientist’) and entities (e.g. `m.0jcx’, Freebase’s unique ID for Albert Einstein). Armed with suitable indexes and ranking functions, we can now search for “scientists who played the violin”, but only if the search engine can reliably infer that `scientists’ is the target type, `violin’ is a grounded entity, and `played’ is the connecting relation.

We will trace advances in QA systems since 2004, from embellishments to information retrieval (IR) systems, to machine learnt graphical models that infer diverse roles of query words and match them accordingly to KG and corpus, to hybrid systems that combine deep and traditional learning components, to recent homogeneous BERT-based architectures. We will also trace the parallel development of benchmarks, from “expert search” and “entity search” in the IR community, to single-relation queries such as “where was Obama born” to multi-clause queries such as “how did Obama’s father die”, or “hollywood actor whose spouse is a lawyer”. Another parallel line of research into KG embeddings greatly assisted the evolution of QA systems, and will be reviewed as needed.