The cerebral cortex (or neocortex) in an adult human can be visualized as a crumpled sheet about the size of a large dinner napkin (see Figure 1). The neocortex has a highly regular structure, at least when compared with other, older parts of the brain. It is as though nature stumbled on the design for a general-purpose modular memory and then mass produced it up to some limit imposed by weight and energy usage, considerations familiar to anyone designing mobile computing devices.
Figure 1: A cross section of the primate brain showing the folded cortical sheet |
The basic repeated structure, identified by Vernon Mountcastle, is the cortical column (see Figure 2), a configuration of neurons and associated supporting cells totaling on the order of 10^{4} cells. Sensory information is mapped onto the neocortex from receptive fields at the periphery of the body; importantly, these mappings preserve spatial and locality relationships in the sensory data. The retinoptic mappings discovered by Hubel and Weisel are perhaps the best known, but there are analogous maps for other sensory modalities. Similar structure-preserving mappings connect different areas of the cortex so that the spatial relationships characterizing the world we perceive are represented at multiple levels of resolution and abstraction.
Figure 2: A depiction of a cortical column. The red cells are pyramidal neurons. |
The cortex functions as a very robust associative memory and computing device. It integrates data from different senses, different times and different resolutions, allowing us to identify correlations across time and across sensory modalities. Not only does the cortex enable us to robustly recognize patterns in the midst of noisy backgrounds, but we can do so even if we have never encountered a given pattern at a particular scale, orientation or brightness or in the midst partially occluding distractions. The cortex continuously predicts what we are likely to perceive next and warns us if events run counter to expectations.
The primary technology goals associated with this project involve automated control, prediction and planning with an emphasis on applications in mobile robotics. Recent research in cognitive and computational neuroscience is yielding insight not only into the form and function of the human brain but into how to build systems that are well suited to the applications we are most interested in. David Mumford in the Applied Math Department at Brown University has been working on cortical modeling for more than a decade and his 2003 paper with Tai Sing Lee, one of his graduate students now on the faculty at CMU, provides an elegant mathematical model of the visual cortex and a starting point for our work. Their approach borrows from Ulf Grenander's Pattern Theory which describes physical phenomena in terms of generative statistical models.^{1} Grenander's pattern theory emphasizes the compositional properties of naturally occurring patterns and the computational benefits of hierarchical representations.^{2} The Lee and Mumford model is the starting point for our research in building computational models of the neocortex [1].
Lee and Mumford cast their computational theory in terms of graphical models, which provide compact descriptions for joint probability distributions ranging over large numbers of random variables.^{3} Figure 3 shows the first four regions of the temporal cortex -- V1, V2, V4 and the inferotemporal cortex (IT) -- illustrated, on the left, as a stack of increasingly abstract visual features and their associated processing units, and, on the right, as they are arranged on the cortical sheet.
Figure 3: The first four processing regions of the visual cortex |
Figure 4 shows a fragment of the graphical model proposed by Lee and Mumford in which the random variables representing the cortical regions (the x_{i}'s) are shown as boxes and the arrows represent dependencies between variables. The activity in the ith region is influenced by bottom-up feed-forward data x_{i-1} and top-down probabilistic priors P(x_{i}|x_{i+1}) representing the feedback from region i + 1.
Figure 4: A portion of a graphical model of the visual cortex |
While there are many inference algorithms for graphical models, none appear up to the task of learning parameters and absorbing evidence on the scale of the primate visual cortex. In [2] we present a class of generative models called pyramid-graph Bayes networks (see Figure 5) well suited to modeling perceptual processes and an algorithm for learning their parameters that promises to scale to learning very large models. The models are hierarchical, composed of multiple levels, and allow input only at the lowest level, the base of the hierarchy. Connections within a level are generally local and may or may not be directed. Connections between levels are directed and generally do not span multiple levels. The learning algorithm falls within the general family of expectation-maximization algorithms.^{4} Parameter estimation proceeds level-by-level starting with components in the lowest level and moving up the hierarchy.
Figure 5: A pyramid-graph Bayes network |
Having learned the parameters for the components in a given level, these parameters are permanently fixed and are never revisited for the purposes of learning. These parameters do, however, play an important role in learning the parameters for higher-level components by helping to generate the samples used in subsequent parameter estimation. Within levels, learning is decomposed into many local subproblems suggesting a straightforward parallel implementation. The inference required for learning is carried out by local message passing and the arrangement of connections within the underlying networks is designed to facilitate this method of inference. Learning is unsupervised but can be easily adapted to accommodate labeled data.
A subsequent paper [3] describes refinements on the basic algorithm with an emphasis on those elements that enable the algorithm to be implemented in parallel code running on commonly-available computing hardware. The primate cortex consists of approximately 10^{11} sparsely-connected neurons. Even if we assign only one variable to model each columnar structure (admittedly a rather crude approximation) we would end up with a network consisting of more than 10^{7} variables. Our immediate goal is to implement a version of the model on a 100-processor compute cluster. By assigning 100-1000 variables per processor we expect to be able to produce a working model of 10^{4} - 10^{5} columnar structures with an inference cycle on the order of a few milliseconds; that's only one thousandth to one hundredth of a primate cortex, but enough to perform some interesting experiments.
[1] | Thomas Dean. A computational model of the cerebral cortex. In the Proceedings of Twentieth National Conference on Artificial Intelligence (AAAI-05), pages 938-943, Cambridge, Massachusetts, 2005. MIT Press. (PDF) |
[2] | Thomas Dean. Hierarchical expectation refinement for learning generative perception models. Technical Report CS-05-13, Brown University Department of Computer Science, 2005. (PDF) |
[3] | Thomas Dean. Scalable inference in hierarchical generative models. In the Proceedings of the Ninth International Symposium on Artificial Intelligence and Mathematics, 2006. (PDF) |
[4] | Thomas Dean. Learning and Inference in Hierarchical Generative Models of the Necortex: Serial and Parallel Implementations in Matlab. (CODE) |
[5] | Thomas Dean. Hierarchical Bayesian Models of the Primate Visual Cortex. Keynote address at the Ninth International Symposium on Artificial Intelligence and Mathematics, January 2006. (PDF) |
^{1} A (good) generative statistical model for a class of naturally occurring patterns enables one to generate instances of the class in accord with the distribution governing the natural phenomenon. For example, a good generative statistical model of snowflakes would generate snowflake patterns in the same statistical distribution found in nature.
^{2} Generative models are often constructed hierarchically in terms of the composition of component models. For example, you might describe an automobile in terms of its engine, chassis, drive train, etc., an engine in terms of its pistons, valves, distributor, carburetor, etc., a carburetor in terms of its choke, throttle, manifold, etc., and so on.
^{3} A graphical model consists of a graph whose vertices correspond to random variables {x_{1},x_{2},...,x_{n}} and whose edges indicate dependencies between pairs of variables. This graph together with a set of conditional probability density functions that quantify the relationships among the variables specify a joint distribution, P(x_{1},x_{2},...,x_{n}), over the random variables.
^{4} Statistical models typically include both observed variables whose values are assigned according to supplied evidence and hidden (or latent) variables which we cannot observe but wish to infer the impact of the supplied evidence. In our models of the neocortex, variables in the input level are observed and all others are hidden. The expectation-maximization algorithms are used to estimate the parameters of statistical models that include hidden variables.