neds.gif (1190 bytes)

New England Database Society

Friday, September 11, 2009

sponsored by Sun Microsystems

sunlogo.gif (4979 bytes)


   Some Tools and Techniques for Managing Uncertain Data

Peter Haas
  IBM Almaden

Friday, September 11, 2009, 4PM
Volen 101, Brandeis University

(preceded by a wine and cheese reception at 3:00 pm, and followed by dinner at 6:00 pm)


There is an increasing need for tools that facilitate enterprise risk assessment and decisionmaking in the face of uncertain data. The problem of data uncertainty is becoming acute, due to data integration, automated information extraction, data anonymization for privacy protection, and the growing importance of RFID and sensor data. In this talk we give an overview of some of our recent work on uncertainty management. We first describe the MCDB relational database system, which uses a Monte Carlo approach to query uncertain data. This system can handle complicated real-world queries and data, and has an extensible and flexible uncertainty model, encapsulated via user-defined "value generation" (VG) functions. Importantly, MCDB also allows sophisticated, data-intensive stochastic modeling and prediction to be performed close to the data, opening up new possibilities for data analytics. We then describe MC3, a system that extends the MCDB approach to a Hadoop-based cluster-computing platform, enabling robust, scalable, and highly parallel execution of MCDB-style Monte Carlo computations on commodity hardware. MC3, which operates on JSON data, also relaxes MCDB's requirement of a strict relational schema. Finally, we briefly describe a couple of additional projects related to managing uncertain data in the context of OLAP and of information extraction from text.

Maintained by Olga Papaemmanouil olga AT