|
New England Database Society sponsored by Netezza Corporation |
|
|
NEDS |
Statistical Data-Analysis in an RDBMS Almost for Free
Christopher Re
University of Wisconsin-Madison
Friday, May 4, 2012, 4PM
HP/Vertica Computer Science Lounge (Volen 104),
Brandeis University
(preceded by a wine and cheese reception at 3:00 pm, and followed by dinner at 6:00 pm)
Abstract:
The main question driving my
research is: how does one deploy statistical data-analysis tools to
enhance data-driven systems? Our goal is to find abstractions
that one needs to deploy and maintain such systems. In this talk, I
describe my group's attack on this question by building a diverse set
of statistical-based data-driven applications: a system whose goal is
to read the Web and answer complex questions, a muon detector in
collaboration with a neutrino telescope called IceCube, and a
social-science applications involving rich content (OCR and speech
data). Even in this diverse set, we have found common abstractions that
we are exploiting to build systems.
In the technical portion of the talk, I discuss one such abstraction
that we found attempting to answer the question: how can we bring
sophisticated data-analysis tools to data that lives in an RDBMS? My
technical message is that the algorithmic problems underlying many
statistical data analysis techniques can be solved with a classical
algorithm called incremental gradient descent that is no more difficult
to compute than a SQL AVG. To demonstrate our point, we have
implemented this method on top of a handful of commercial and
open-source databases. Our approach is often faster than
special-purpose tools and avoids a messy export-reimport cycle.
Papers, software, virtual machines containing installations of our
software with data, and links to applications that are discussed in
this talk are available from http://www.cs.wisc.edu/hazy.
Christopher (Chris) Ré is an
assistant professor in the department of Computer Sciences at the
University of Wisconsin-Madison. The goal of his work is to enable
users and developers to build applications that more deeply understand
and exploit data. Chris received his PhD from
the University of Washington, Seattle under the supervision of Dan
Suciu. For his PhD work in the area of probabilistic data management,
Chris received the SIGMOD 2010 Jim Gray Dissertation Award.
Chris's papers have received four best papers or best-of-conference
citations
(best paper in PODS 2012 and best-of-conference in PODS 2010, twice,
and one in ICDE 2009). Chris received an NSF CAREER Award in 2011 and
was recently granted his first patent.
Maintained by Olga Papaemmanouil olga AT cs.brandeis.edu