Tech Report CS-93-21

Indexing for Data Models with Constraints and Classes

Paris C. Kanellakis, Sridhar Ramaswamy, Darren E. Vengroff and Jeffrey S. Vitter

May 1993

Abstract:

We examine I/O-efficient data structures that provide indexing support for new data models. The database languages of these models include concepts from constraint programming (e.g., relational tuples are generalized to conjunctions of constraints) and from object-oriented programming (e.g., objects are organized in class hierarchies). Let $n$ be the size of the database, $c$ the number of classes, $B$ the secondary storage page size, and $t$ the size of the output of a query. Indexing by one attribute in the constraint data model (for a fairly general type of constraints) is equivalent to external dynamic interval management, which is a special case of external dynamic two-dimensional range searching. We present a semi-dynamic data structure for this problem which has optimal worst-case space $O(n/B)$ pages and optimal query I/O time $O(\log_B n + t/B)$ and has $O(\log_B n + (\log^2_B n) / B)$ amortized insert I/O time. If the order of the insertions is random then the expected number of I/O operations needed to perform insertions is reduced to $O(\log_B n)$. Indexing by one attribute and by class name in an object-oriented model, where objects are organized as a forest hierarchy of classes, is also a special case of external dynamic two-dimensional range searching. Based on this observation, we first identify a simple algorithm with good worst-case performance for the class indexing problem. Using the forest structure of the class hierarchy and techniques from the constraint indexing problem, we improve its query I/O time from $O(\log_2 c \log_B n +t/B)$ to $O(\log_B n + t/B + \log_2 B)$.

(complete text in pdf or gzipped postscript)