Thesis Defense


"Towards Accessible Data Analysis"

Emanuel Zgraggen

Monday, April 2, 2018, at 10:00 A.M.

Room 506 (CIT 5th Floor)

In today's world data is ubiquitous. Increasingly large and complex datasets are gathered across many domains. Data analysis - making sense of all this data - is exploratory by nature, demanding rapid iterations, and all but the simplest analysis tasks require humans in the loop to effectively steer the process. Current tools that support this process are built for an elite set of individuals: highly trained analysts or data scientists who have strong mathematics and computer science skills. This however presents a bottleneck. Qualified data scientists are scarce and expensive which makes it often unfeasible to inform decisions with data. How do we empower data enthusiasts, stakeholders or subject matter experts, who are not statisticians or programmers, to directly tease out insights from data? This thesis presents work towards making data analysis more accessible. We invent a set of user experiences with approachable visual metaphors where building blocks are directly manipulatable and incrementally composable to support common data analysis tasks at the pace that matches the thought process of a humans.

First, we develop a system for back-of-the-envelope calculations that revolves around handwriting recognition - all data is represented as digital ink - and gestural commands. Second, we introduce a novel pen & touch system for data exploration and analysis which is based on four core interaction concepts. The combination and interplay between those concepts supports a wide range of common analytical tasks. The interface allows for incremental and piecewise query specification where intermediate visualizations serve as feedback as well as interactive handles to adjust query parameters. Third, we present a visual query interface for event sequence data. This touch-based interface exposes the full expressive power of regular expressions in an approachable way and interleaves query specification with result visualizations. Fourth, we present the results of an experiment where we analyze how progressive visualizations affect exploratory analysis. Based on these results, which suggest that progressive visualizations are a viable solution to achieve scalability in data exploration systems, we develop a system entirely based on progressive computation that allows users to interactively build complex analytics workflows. And finally, we discuss and experimentally show that using visual analysis tools might inflate false discovery rates among user-extracted insights and suggest ways of ameliorating this problem.

Host: Andy van Dam