Sublinear Algorithms for Big Data

  • Jasper Lee
Course Home Page:
Location:Hybrid / 85 Watr room 130
Meeting Time:K hr: T, Th 2:30-3:50
Exam Group:12: 11-DEC-2020 Exam Time: 02:00:00 PM
Offered this year?Yes
When Offered?Occasionally


A huge quantity of data is worth little unless we can extract insights from it. Yet, the large quantities mean that classic algorithms (running in linear, quadratic or even more time) can be infeasible in practice. We must instead turn to new algorithmic approaches and paradigms, which allow us to answer valuable questions about our data in runtime that is still feasible even when the data set is Facebook-sized.

Surprisingly, to answer many computational and statistical questions, sometimes there is no need to read/store every piece of data! This course focuses on this exciting "sublinear" algorithmic regime. We will study practical algorithms, making clever use of randomness with strong theoretical guarantees, on the following (tentative and non-exhaustive) list of topics:

- Testing and learning information about structures such as lists (e.g. testing approximate sortedness/monotonicity) and massive graphs (e.g. testing approximate connectedness), using very few queries into the structure
- Learning information about a probability distribution from a small number of independent samples, e.g. deciding whether the distribution of users to your online platform has changed substantially since you introduced a UI change
- Computation on high-volume streams of data, whilst only maintaining a small memory buffer, e.g. approximately estimating the number of unique visitors to a webpage over a given time period without storing them one by one

The key prerequisites are 1) mathematical maturity (we will prove most if not all of the covered results), and familiarity with 2) basic probability (you should be comfortable applying Markov's and Chebyshev's inequalities) and 3) basic analysis of algorithms. See the prerequisites section for details.

Prerequisites: (CS22 or equivalent); (CS145 or APMA1650/1655 or equivalent); (CS157 or CS155). Mathematical maturity is essential: this is a theory course with proofs.

Recommended but not required: CS155, all materials not taught in CS145 or equivalent will be covered in this class.

CRN: 18537