neds.gif (1190 bytes)

New England Database Society

Friday, February 22, 2008

sponsored by Sun Microsystems

sunlogo.gif (4979 bytes)


Making Database Systems Usable

Peter Haas
IBM Almaden

Friday, February 22, 2008, 4PM
Volen 101, Brandeis University

(preceded by a wine and cheese reception at 3:00 pm, and followed by dinner at 6:00 pm)


Data synopses are an essential ingredient of methods for fast approximate analytical processing, interactive data exploration, auditing, and automated metadata discovery. We consider the problem of maintaining a warehouse of synopses that "shadows" a full-scale data warehouse. Incoming data is decompsed into partitions, and a synopsis is created for each partition. As the data partitions are rolled in and out of the full-scale warehouse, the corresponding synopses are rolled in and out of the synopsis warehouse. Synopses are combined as needed to yield synopses of the corresponding combination of partitions. This approach is efficient, allowing parallel processing, as well as flexible. We discuss some recent work aimed at supporitng a warehouse of synopses. Our focus is on two types of synopses: unform random samples and synopses for estimating the number of distinct data values in a partition. Our algorithms correct, improve, and extend techniques such as clasical reservoir and Bernoulli sampling, the "concise" and "sample counting" schemes of Gibbons and Matias, and various probabalistic-counting methods.