Searchlight is an interactive data exploration system for very large data sets. It is continuation of the Semantic Windows (SW) project. Users can search for window-like objects satisfying some interesting constraints on shape and content. The user can combine different constraints within a single query. For example, the user might ask for a time-interval having characteristics similar to the specified query interval (e.g., in terms of the pattern), and also having additional properties of its own (e.g., mean, variance, etc.).
Users can also search for groups of objects with inter-related properties, e.g., "windows with similar brightness" or "windows with the distance of 100km between each other". We call such queries "higher-order queries".
The system brings Constraint Programming (CP) techniques inside DBMSs. Data exploration is treated as a data-driven, online search problem, where CP machinery is used to quickly identify interesting solution candidates, and the DBMS efficiently validates them. Searchlight supports distributed computation, work balancing for both the CP-based search process and the validation, and dynamic distribution of resources between the two for better query execution times.
This framework is implemented by me in C++ as integration of an existing CP solver from OR-Tools into the query executor of SciDB, a popular open-source array DBMS. The implementation extensively reuses the existing distributed query execution infrastructure. At the same time it introduces its own layer of search-related capabilities, including dynamic search distribution, resource management and dynamic data distribution.
Semantic Windows (SW) is a novel interactive data exploration approach in which users query for multidimensional "windows" of interest via standard DBMS style queries enhanced with exploration constructs. Users can specify SWs using (i) shape-based properties, e.g., "identify all 3-by-3 windows", as well as (ii) aggregate content-based properties, e.g., "identify all windows in which the average brightness of stars exceeds 0.8". This SW approach enables the interactive processing of a host of useful exploratory queries that are difficult to express and optimize using standard DBMS techniques.
SW was implemented by me in C++ as a standalone client working with PostgreSQL via the standard driver. The client was written from scratch, including caching, network and distribution layers.
Sedna is a free native XML database which provides a full range of core database services: persistent storage, ACID transactions, security, indices, hot backup. Flexible XML processing facilities include W3C XQuery implementation, tight integration of XQuery with full-text search facilities and a node-level update language. Sedna project webpage
I participated in the project as a team member, making contributions into transaction processing, logging and recovery systems, version-based storage engine, XQuery parsing, static query optimization via query rewriting. Fully implemented the hot-backup subsystem and recovery testing system, based on controlled failures with random crashes. The work was mainly done in C++.
|Dissertation topic||Enabling Integrated Search and Exploration over Large Multidimensional Data|
|Thesis||Interactive Data Exploration Using Semantic Windows|
|Thesis||Research and Development of Transaction Processing Methods Based on Snapshots|