• Ukieweb


    Here are some projects that I have been working on during my PhD.

Query Processing on Hybrid CPU-FPGA environments: A Case for Code Generation

Details coming soon...

A Morsel-Driven Query Execution Engine for Heterogeneous Multi-Cores

Currently, we face the next major shift in processor designs that arose from the physical limitations known as the "dark silicon effect". Due to thermal limitations and shrinking transistor sizes, multi-core scaling is coming to an end. A major new direction that hardware vendors are currently investigating involves specialized and energy-efficient hardware accelerators (e.g., ASICs) placed on the same die as the normal CPU cores.

In this paper, we present a novel query processing engine that targets such heterogeneous processor environments. Based on the SSB benchmarks, as well as other micro benchmarks, we compare the efficiency of \system{} with existing execution strategies that make use of co-processors (e.g., FPGAs, GPUs) and demonstrate speed-up improvements of up to $2 \times$.

Rethinking DBMSs for Modern Heterogeneous Co-Processor Environments

In the last decade, the work centered around specialized co-processors for DBMSs has largely focused on efficient query processing algorithms for individual operators. However, a major limitation of existing co-processor systems is the PCI bottleneck, which severely limits the efficient use of this type of hardware in existing DBMSs.

In recent years, we have seen the emergence of a new class of co-processor systems that include specialized accelerators, implemented as ASICs or FPGAs, which co-reside with the CPU and share the entire cache hierarchy. Here we revisit DBMS architectures in this context, and take an initial step towards the design of a new database system called SiliconDB that targets these new densely integrated heterogeneous co-processor environments.

HashStash: Revisiting Reuse in Main Memory Database Systems

Reusing intermediates in databases to speed-up analytical query processing was studied in prior work. Existing solutions require intermediate results of individual operators to be materialized using materialization operators. However, inserting such materialization operations into a query plan not only incurs additional execution costs but also often eliminates important cache- and register-locality opportunities, resulting in even higher performance penalties.

This paper studies a novel reuse model for intermediates, which caches internal physical data structures materialized during query processing (due to pipeline breakers) and externalizes them so that they become reusable for upcoming operations. We focus on hash tables, the most commonly used internal data structure in main memory databases to perform join and aggregation operations. As queries arrive, our reuse-aware optimizer reasons about the reuse opportunities for hash tables, employing cost models that take into account hash table statistics together with the CPU and data movement costs within the cache hierarchy. Experimental results, based on our prototype implementation, HashStash , demonstrate performance gains of 2x for typical analytical workloads with no additional overhead for materializing intermediates.

An Architecture for Compiling UDF-centric Workflows

Data analytics has recently grown to include increasingly sophisticated techniques, such as machine learning and advanced statistics. Users frequently express these complex analytics tasks as workflows of user-defined functions (UDFs) that specify each algorithmic step. However, given typical hardware configurations and dataset sizes, the core challenge of complex analytics is no longer sheer data volume but rather the computation itself, and the next generation of analytics frameworks must focus on optimizing for this computation bottleneck. While query compilation has gained widespread popularity as a way to tackle the computation bottleneck for traditional SQL workloads, relatively little work addresses UDF-centric workflows in the domain of complex analytics

In this paper, we describe a novel architecture for automatically compiling workflows of UDFs. We also propose several optimizations that consider properties of the data, UDFs, and hardware together in order to generate different code on a case-by-case basis. To evaluate our approach, we implemented these techniques in TUPLEWARE, a new high-performance distributed analytics system, and our benchmarks show performance improvements of up to three orders of magnitude compared to alternative systems.