|
New England Database Society
Friday, September
11, 2009
sponsored
by Sun Microsystems
|

|
NEDS
|
Some Tools and Techniques for Managing Uncertain Data
Peter Haas
IBM Almaden
Friday, September 11, 2009, 4PM
Volen 101, Brandeis University
(preceded by a wine and
cheese reception at 3:00 pm, and followed by dinner at 6:00 pm)
Abstract:
There is an
increasing need for tools that facilitate enterprise risk assessment
and decisionmaking in the face of uncertain data. The problem of data
uncertainty is becoming acute, due to data integration, automated
information extraction, data anonymization for privacy protection, and
the growing importance of RFID and sensor data. In this talk we give an
overview of some of our recent work on uncertainty management. We first
describe the MCDB relational database system, which uses a Monte Carlo
approach to query uncertain data. This system can handle complicated
real-world queries and data, and has an extensible and flexible
uncertainty model, encapsulated via user-defined "value generation"
(VG) functions. Importantly, MCDB also allows sophisticated,
data-intensive stochastic modeling and prediction to be performed close
to the data, opening up new possibilities for data analytics. We then
describe MC3, a system that extends the MCDB approach to a Hadoop-based
cluster-computing platform, enabling robust, scalable, and highly
parallel execution of MCDB-style Monte Carlo computations on commodity
hardware. MC3, which operates on JSON data, also relaxes MCDB's
requirement of a strict relational schema. Finally, we briefly describe
a couple of additional projects related to managing uncertain data in
the context of OLAP and of information extraction from text.
Maintained by Olga Papaemmanouil
olga AT cs.brandeis.edu