Tuesday, July 29, 2014, 12:00pm - 01:00pm
Assistant Professor (Starting Fall 14'), CS Dept, University of Colorado-Boulder
"Big Data Analysis with Topic Models: Human Interaction, Streaming Computation, and Social Science Applications"
Abstract: A common information need is to understand large, unstructured datasets: millions of e-mails during e-discovery, a decade worth of science correspondence, or a day’s tweets. In the last decade, topic models have become a common tool for navigating such datasets. This talk investigates the foundational research that allows successful tools for these data exploration tasks: how to know when you have an effective model of the dataset; how to correct bad models; how to scale to large datasets; and how to detect framing and spin using these techniques. After introducing topic models, I argue why traditional measures of topic model quality---borrowed from machine learning---are inconsistent with how topic models are actually used. In response, I describe interactive topic modeling, a technique that enables users to impart their insights and preferences to models in a principled, interactive way. I will then address computational and statistical limits to existing approaches and how streaming topic models, with an "infinite vocabulary", can be applied to real-world online datasets. Finally, I’ll discuss ongoing collaborations with political scientists to use these techniques to detect spin and framing in political and online interactions.
Bio: Prof. Jordan Boyd-Graber is an assistant professor at the University of Maryland's College of Information Studies and Institute for Advanced Computer Studies who will be moving to the University of Colorado in August. He is a graduate of Princeton University, with a PhD thesis on "Linguistic Extensions of Topic Models" working under David Blei. Jordan's research focus is in applying machine learning and Bayesian probabilistic models to problems that help us better understand social interaction or the human cognitive process. This research often leads him to use tools such as large-scale inference for probabilistic methods, natural language processing, multilingual corpus understanding, and human computation.
He has received the NIPS Best Student Paper award honorable mention, the Computing Innovation Fellowship award, and received the Jorgensen scholarship while an undergrad at the California Institute of Technology.
Jordan is originally from Iowa and did his undergrad at Caltech, where he received a BS in computer science in 2004. In his spare time, Jordan enjoys competing in and writing questions for trivia competitions.
Hosted by: EECS Prof. Doug Downey