New England Database Society
by Sun Microsystems
Dependence and Truth
Friday, February 19, 2010, 4PM
Volen 101, Brandeis University
(preceded by a wine and
cheese reception at 3:00 pm, and followed by dinner at 6:00 pm)
The Web has enabled
the availability of a huge amount of useful information, but has also
eased the ability to spread false information and rumors across
multiple sources, making it hard to distinguish between what is true
and what is not. Since it is important to permit the expression
of dissenting and conflicting opinions, it would be a fallacy to try to
ensure that the Web provides only consistent information.
However, to help in separating the wheat from the chaff, it is
essential to be able to determine dependence between sources.
Given the huge number of data sources and the vast volume of
conflicting data available on the Web, doing so in a scalable manner is
We present a novel approach that considers dependence between data
sources in truth discovery. We start from a static world where we
have a snapshot of data from various data sources. We apply
Bayesian analysis to decide dependence between sources and design an
algorithm that iteratively detects dependence and discovers truth from
conflicting information. We then consider a dynamic world where
the true values can evolve over time and sources can update data to
capture such changes. We develop a Hidden Markov Model that
decides whether a source is a copier of another source and identifies
the specific moments at which it copies. Experimental results on
both real-world and synthetic data show high accuracy and scalability
of our techniques.
This is joint work with Xin Luna Dong and Laure Berti-Equille.
Divesh Srivastava is the head of Database Research at AT&T
Labs Research. His current research interests include data
quality, data streams and data privacy.
Maintained by Olga Papaemmanouil
olga AT cs.brandeis.edu