The Relevance of Computational Learning Theory
The field of Computational Learning Theory is immensely foundational. Broadly it fits into Machine Learning, but is well connected to a few old school philosophical questions, some of which have been explored by Scott Aaronson in Section 7 of his paper "Why Philosophers Should Care About Computational Complexity", as well as by Ronald de Wolfe in his dissertation, "Philosophical Applications of Computational Learning Theory", and further by Les Valiant in his recent book, "Probably Approximately Correct". Here, I'll give a brief overview of why I personally find the subject to be intellectually irresistible, and why it's critical to our overall scientific understanding as artificial intelligence becomes a staple in society.
The story basically goes like this:
- Assumption: Computation is a reasonable model for intelligence.
- Church-Turing thesis: Defines the notion of universal computation and introduced the field of Computability, the study of what is computable, period.
- Computational Complexity: Studies what is computable with respect to various constraints, like time and space (P vs. NP and friends). Complexity gives us a more realistic notion of what is computable, given that we exist in time and in space.
- Computational Learning Theory: The study of what is theoretically learnable by an agent given the constraints of computational complexity. The fundamental question is: what behaviors and concepts are learnable, given some amount of resources, like time, space, and a learning budget?
The Central Idea
The goal is to prove things about formal computational models of intelligence. If our models are accurate, then these proofs offer genuine insight into the nature of intelligent behavior. The model that typically arises in learning theory is the same basic model of machine learning with supervision: First, there is a training period, where the learner is taught by being shown labeled examples. This proceeds via the repetition of the following two steps:
At some point, training stops, and the learner is no longer given supervision. The learner then tests its knowledge by guessing labels on new examples. The goal of learning is for the learner to accurately predict the labels of any object it might receive:
Step (2) of Testing is the critical piece. What we really want is to determine whether or not the learner will guess the label correctly. Specifically - we care not just about how it does for any one object it grabs, but from any object it grabs from nature.
The underlying question is this: if you have a teacher that can only point to stuff and give a single word to describe it, what can be learned (in principle)? Which concepts (referred to as an object's class in the diagram) can be learned such that after a not-totally-ridiculous amount of training time, we can be confident the learner will guess future labels correctly?
It turns out that in the same way computability let us say The Halting Problem is uncomputable, and Complexity let us say that solving for optimal behavior in Chess requires more time and space than we have available, Learning Theory lets us say that learning certain concepts requires far too much training time, and thus those concepts aren't learnable in the model posed above.
Imagine you have a friend visit from a foreign country. You try to teach them the names of fruits. First, you point to a banana and say "banana". Now, from the friend's perspective, it might not be obvious that the property you're identifying is the type of fruit. Consider that the same label could have been "yellow" (you could have been teaching them colors instead).
Then you point to a second, different banana, and say "banana".
But perhaps both bananas you pointed to were a bit old and brown - you still might be teaching your friend a concept like, "brown", or "rotten".
Eventually, after enough pointing, you could teach your friend the concept "banana" such that, if they're presented with a banana that you never showed them, they can correctly label it as a banana. This inductive inference is the heart of learning: generalizing beyond the specific instances about which you were taught.
The typical model of learning theory says that we have a (slightly boring) fixed teacher: the teacher has a giant (possibly infinite) basket full of objects of a particular type. They're going to pull out an object at random and show it to you, and tell you the label. Then, they'll put it back into the basket.
The question of interest is: for that basket of objects, for the concept they're teaching you about those objects, how many examples do you need to see in order to guarantee that you'll almost always correctly guess the label for future objects pulled from that basket?
There are of course more general models, like reinforcement learners (my research area), which I'll go into another time.
Personally, I'm attracted to the field for two reasons:
- Philosophical and Scientific Understanding: Inquiries central to learning theory and related fields give us insight into the limitations of intelligence, including the problem of induction and Occam's razor.
- Applications: I'm biased on this one, but I think Artificial Intelligence will be immensely transformative to the structure of life on Earth.
To elaborate: some folks consider AI to be a path to Utopia or to Extinction, often with little room in the middle (and in the surprisingly near future). For instance, Nick Bostrom's answer to the 2009 Edge question "What Will Change Everything?" gives a summary of the arguments later presented in his 2014 book, Superintelligence.
My personal take on the future of AI is presented in detail in my research statement (forthcoming). In short: I'm optimistic that AI presents a legitimate path toward ensuring basic rights like healthcare, medicine, nutritious food, clean water, and education for our global community.
A big part of my view is that we must understand the theory underlying artificial intelligence in gory detail, from a principled mathematical and scientific perspective. Think about it like a (really powerful) hammer. If you go to build a birdhouse, you'd like a hammer whose mechanics you understand. You want to hit certain nails and not others, and that's about it. If you had a hammer that occasionally missed by even several centimeters, chosen at random, it would be pretty devastating to your construction of a bird house. AI is much more powerful than a hammer, and so the depth of understanding required is far greater. We must understand our tools, especially when they're powerful. Of course, understanding can be achieved through means other than mathematical proofs about computational models - well conducted empirical investigation yields understanding, too.
Computational Learning Theory poses a rich framework for investigating the mathematics underlying learning, resulting in understanding of physical learning systems in so far as they resemble the abstract models we work with. Using a richer learning model like that offered by reinforcement learning, we can extend this understanding to agents that have to make multiple decisions in sequence to solve more complex tasks. More on that another time.
Using a framework like learning theory, we can hone in on the basic principles of learning. Coupled with rigorous and reproducible empirical evaluation, we can isolate which assumptions are realistic and enable physically instantiated systems to learn effectively. We can make intelligence as well understood as a hammer, so when we apply it to build complicated (metaphorical) bird houses, we only hit the (metaphorical) nails we want to hit. For a more technical survey, see The Computational Complexity of Machine Learning by Michael Kearns.