FITing-Tree: A Data-aware Index Structure


Index structures are one of the most important tools that DBAs leverage to improve the performance of analytics and transactional workloads. However, with the explosion of data that is constantly generated in a wide variety of domains including autonomous vehicles, Internet of Things (IoT) devices, and E-commerce sites, building several indexes can often become prohibitive and consume valuable system resources. In fact, a recent study has shown that indexes created as part of the TPC-C benchmark can account for 55% of the total memory available in a state-of-the-art DBMS. This overhead consumes valuable and expensive main memory, and limits the amount of space that a database has available to store new data or process existing data. In this paper, we present a novel data-aware index structure called FITing-Tree which approximates an index using piece-wise linear functions with a bounded error specified at construction time. This error knob provides a tunable parameter that allows a DBA to FIT an index to a given dataset and workload by being able to balance lookup performance and space consumption. To navigate this tradeoff, we provide a cost model that helps the DBA choose an appropriate error parameter given either (1) a lookup latency requirement (e.g., $500ns$) or (2) a storage budget (e.g., $100MB$). Using a variety of real-world datasets, we show that our index structure is able to provide performance that is comparable to full index structures while reducing the storage footprint by orders of magnitude.

In arXiv