EECS Main > Events

Event Details

MEET THE FACULTY: Doug Downey

4:00 p.m.
November 12, 2008
Ford ITW Auditorium – Room 1-350


Doug Downey
"Autonomous Web-scale Information Extraction"
Abstract: Search engines are extremely useful tools for answering questions. However, a significant number of questions users might pose -- for example, "which nanotechnology companies are hiring on the West Coast?" -- cannot be addressed using existing search engines, because the answers do not lie on a single page. To answer these kinds of queries, users must extract and synthesize information from multiple documents. Currently, this is a tedious and error-prone manual process. In this talk, I will describe my research aimed at automating the extraction of this information from the Web. I begin by presenting a model of the redundancy inherent in the Web, and show that the model can be used to identify correct extractions autonomously, without the manually labeled examples typically assumed in previous information extraction research. However, the model has limited efficacy for the "long tail" of infrequently mentioned facts; my second investigation shows how unsupervised language models can be leveraged in concert with redundancy to overcome this limitation. Lastly, I describe recent work that generalizes this extraction strategy to a broader machine learning setting, and I demonstrate experimentally that the framework is effective beyond information extraction.
Bio:
Doug Downey joined Northwestern University in the Fall of 2008. He obtained his PhD from the University of Washington, where he was advised by Oren Etzioni and supported by an NSF Fellowship and Microsoft Research Graduate Fellowship. His research interests are in the areas of natural language processing, machine learning, and artificial intelligence. At UW, he was part of the KnowItAll project, a system which utilizes the Web to autonomously extract large knowledge bases. Doug's primary research results concern probabilistic models of the redundancy inherent in large corpora, along with associated techniques that allow systems like KnowItAll to extract data autonomously at high precision.
Northwestern University Robert R. McCormick School of Engineering
and Applied Science Electrical Engineering and Computer Science Department