Introduction to Prescriptive Analysis (April 23)

Guest Lecturer Serdar Kadıoğlu

Introduction

Serdar currently works at Fidelity, but previously worked at Oracle. At both of those jobs he used solvers and other similar technologies to the ones we've been learning in class.

Goal: to use sound foundations within CS to solve problems in the real world outside of CS

Examples:

Two important points are feasibility and optimization. With complex and dynamic business requirements, we must satisfy (hard) constraints, and respect (soft) preferences. These problems work over extremely large search spaces, but we need to find better solutions faster.

Take the product configuration example and imagine there are n options a customer can choose to have or not have. This creates n^2 options. To a mathemitician this may seem easy, but to a theoritician, this is nearly impossible.

Many problems like this fit into the class of NP Hard problems, which must be solved every day. Optimization technology is considered by some to be IT's best kept secret. Companies put a lot of time and funding into their optimization technologies.

Declarative Programming

The idea is to declare what the solution should look like and then allow the computer to figure out how to do it. Compare this to imperative programming, where each step that the computer must perform is explicitly written out. Java is a pretty typical imperative progrmaming language, while SQL would be classified as a declarative programming language, as it does not specify how the data must be fetched.

Declaritive programs are typically defined by model and search. They have

Boolean satisfiability (SAT) is a very low level form of declarative programming. Mathematical Programming (MP) and Local Search Metaheuristics (LS) are higher level, and Constraint Programming (CP) is even higher level.

Solution approaches necessarily contain trade-offs between strategies.

Modeling Example

Let's imagine a hiking trip with the following problem description

You may know this better as the 'Knapsack Problem.'

Let's break down our requirements. Given: a set of items with certain profit (p) and weight (w), and a knapsack with certain capacity (cap) Find: A subset of items that maximizes total profit and respects knapsack capacity.

In theory there may be nearly impossible instances of the knapsack problem to solve, but in the real world we only care about solving the instances of the problem we happen to have.

Let's take an instance: Four items with profits {16, 19, 23, 28} and weights {2, 3, 4, 3}, and a knapsack with capacity 7.

Now let's build the model: Decision variables: 0/1 binary variables to denote whether an item is selected or not. Constraints: enforce capcity Sum w_i x_i <= cap Objective: maximize profit max (Sum p_i x_i)

Because there is a finite set of solutions, we can brute force all 2^4 combinations. We enumerate all possibilities and check if feasbile, then select the solution with the best objective.

We could also construct a solution by selecting items one by one according to a heuristic (in order, most profit, least weight, etc.) and select the best solution from there.

Solver features and benefits

Side note from Tim: With boolean satisfiability it is easy to check an answer. We simply check that the constraints are all satisfied with the variable assignments. With an optimization problem, it is much harder to check that our answer is correct. Checking feasibility (meeting constraints) is easy, but checking optimality can be challenging.

Automated Test Generation

Parameterized Testing Let's say we'd like to test an application, toggling OS, browser, CPU, etc. This problem creates exponential tests, which would be incredibly cumbersome to write.

Empirical observation: most software failures are due to interaction between a small number of parameters. The takeaway is that most bugs will be exposed by pairwise coverage of paramters. Now we have a covering array problem:

In a system with five paramters and two options each, we can cover all pairwise parameters in 6 tests, rather than the 32 brute force. A system with ten parameters requires only 7 tests for pairwise coverage!

Industry Perspective

Different stages of analytics: