Tech Report CS-95-03

Approximate Queries and Representations for Large Data Sequences

Hagit Shatkay

March 1995

Abstract:

Many new database application domains such as experimental sciences and medicine are characterized by large sequences as their main form of data. In terms of storage, using approximate representation can significantly reduce the search space. Through a good choice of representation, a broad new class of approximate queries can be supported. These queries are concerned with application-dependent features of the data as opposed to the actual sampled points. We introduce a new notion of {\it generalized approximate queries} and a general {\it divide and conquer} approach that supports them. This approach uses families of real-valued functions as an approximate representation of data. We present algorithms for realizing our technique, and the results of an implementation that has been applied to medical cardiology data.

(complete text in pdf or gzipped postscript)