I’ve really enjoyed meeting with all of you and interacting with our invited speakers over the last ten weeks. At times it was exhausting keeping up with my two jobs, but I always felt invigorated following our class discussions and it was fun watching you get excited about the ideas we were exploring. We won’t be meeting formally in the final week of classes; however, I’m happy to arrange to meet with any of you to talk about your projects. I’ve got Monday and Wednesday from 3–6pm blocked off for “office hours.” I thought about having one final class where we could just sit around and talk about what we learned, what we didn’t and what we might do differently the next time the course is offered, but I’m not good about summing things up in neat little speeches or dealing with the awkwardness of last lectures. Instead, I invite you to watch one more video — Richard Feynman’s The Pleasure of Finding Things Out, and I encourage you to send me your reflections on the video, suggestions for how to improve the course, or anything else that’s on your mind.
The Feynman lecture doesn’t have anything to do with computational neuroscience per se, but it is all about doing science and I found it inspiring, educational and generally relevant to the study of neuroscience in ways that I believe you can now better appreciate. If you’re in a hurry, you might want to fast forward through the early parts of the video. The first twenty minutes or so is full of anecdotes from his early life in New York, his undergraduate career at MIT, his work on the Manhattan Project in Los Alamos and his time at Cornell before ending up at Caltech. Feynman talks about how his father — who sold military uniforms for a living — taught him about the natural world by telling him stories about how things work. He also reminisces about how he tried to teach his children and how what worked for his first child, Carl, didn’t work for his adopted daughter Michelle, and how this experience teaching his kids by telling them stories influenced his own teaching style. It wasn’t a style common among physicists at the time, but rather a discursive conversational approach borne out of an appreciation of the fact that students think and learn differently.
Again, if you don’t want to watch the whole video — it’s about fifty minutes from start to finish, I recommend watching the segment that starts 27 minutes into the video with an image of Feynman sitting at a table in his home and the caption: “Since 1950 Feynman has been Professor of Theoretical Physics at the California Institute of Technology.” I particularly like his chess-game analogy for how science progresses and how the fundamental principles or “laws” of particle physics evolve as you achieve a deeper understanding of nature. He talks about his frustration over being stuck working out the consequences of a theory. The problem was that they had a good candidate theory but not an efficient algorithm for computing the consequences of the theory so that they could compare the theory’s predictions with experimental results. He describes the effort of building a complex theory in terms of holding it in your mind like a fragile house of cards and attempting to add a new card corresponding to an observation of the target phenomena or consequence of the theory.1 It is for this reason, he claims, that it is essential you have uninterrupted periods of time in which to concentrate. To protect his precious time, he admits that he actively encourages the myth that he is irresponsible to get out of any administrative work. He realizes this is selfish, but he is not ashamed of pursuing his research.
Feynman describes how a theory — he uses the example of particles called hadrons which are composed of several different types of quarks — can get more complicated for a while until a general principle is discovered that serves to simplify the theory — if not the calculations needed to make predictions from the theory. His description reminded me of a Radiolab podcast on limits, including the limits of physical endurance, the limits of human memory and, what interested me the most, the limits of science. Hosts Robert Krulwich and Jad Abumrad interviewed Steven Strogatz a mathematician known for his research on random graphs  and asked him about the limits of science. Strogatz works on the mathematical analysis of networks: social networks, computer networks, gene networks and neural networks are just a few examples of networks he’s written about. He was concerned that perhaps some phenomena that we are investigating may be too complicated to reduce to a simple set of rules and thus, in a practical inferential sense, they may be beyond human understanding. To illustrate his point he introduced two computer scientists Hod Lipson a professor at Cornell and his graduate student Michael Schmidt who developed a program called Eureqa which attempts to infer sets of equations from raw data that can be used to predict the behavior of complex systems . Lipson describes how their program was able to come up with the equations of motion for a double pendulum that exhibits chaotic behaviour. The resulting explanation in this case was rather simple even though the behaviour quite complex.
But then they started a collaboration with a biologist Gurol Suel who had compiled data concerning cellular processes, including the availability of nutrients, proteins being transcribed, genes turning on and off, etc. He asked Lipson and Schmidt to see what Eureqa could do with the data. Strogatz relates that Eureqa came up with a set of equations that not only explained the data but was also able to predict the cellular dynamics for cases not covered by the data. The problem is that Suel couldn’t or wouldn’t publish his results because he felt that he didn’t really understand the equations — that is to say he didn’t have any insights besides the equations themselves. Strogatz suggests that this sort of outcome may become more and more common especially as we delve deeper into the messy dynamics of biological systems that evolved through natural selection. Too bad we can’t ask Richard Feynman what he thinks about theories generated by machines or collaborations of humans and machines. Feynman had a deep appreciation for computers as tools for doing science from his time at Los Alamos and subsequent collaborations with Steven Wolfram at Caltech. But he was always looking for the simplest, most elegant theory that explained the data, and he obviously loved to teach and was famous for rendering complex theories accessible. I would be interested to hear what you think about the prospects for a human-comprehensible theory of the brain, particularly in light of what we learned this quarter from Markus Meister concerning the many varieties of retinal ganglion cells  and from Steven Smith about the incredible diversity of mammalian synapses . My reading of the history of science suggests that complex phenomena often appear bewildering prior to the development of the necessary abstractions and mathematical tools and I believe that the same will be true of our attempts to understand the brain. And just as an engineer wouldn’t use quantum electrodynamics to reason about a transistor in an electrical circuit, so too our theories of the brain will involve multiple levels of abstraction to explain behavior in a manner that a human mind can grasp in tractable, bite-size pieces if not all at once.
Here is some meta advice for programming and, in particular, for those of you who are using the GPU cluster. This is the sort of thing you might fight in the bathroom stalls at Google — or over the urinals if you’re male. Every urinal and every toilet stall has a plastic holder which holds an installment of one of the various one-page bathroom journals, e.g., “Testing on the Toilet” and “Learning on the Loo” are two of the most popular. Visitors often ask “Can’t these engineers ever get away from their work?” and the answer is that the toilets are a teaching opportunity too good to be passed up — there are also installments on meditation, brain-storming, eating right, etc.
Even though CNS shelters you from a broad lass of GPU coding errors by identifying them at compile time, code running on a GPU is notoriously difficulty to debug. For this reason, it is vitally important to write unit tests for every step in the computation and in particular any computation that occurs on the GPU. Engineers at Google are expected to do this and their submitted code listings will be rejected by their peers until their tests have sufficient coverage. I’ve found that academic programmers on the whole are not “into” testing, though the best of them adopt it as a form a programming hygiene which they’ve discovered makes them more productive.
In the case of coding in CNS, you should build your models one layer at a time and one new cell type at a time. Write unit tests that check to make sure the values in cells are what you expect. Don’t take anything for granted. Your unit tests — all of them — should run every time you add to the code and rebuild your models. Make sure you initialize layers and filters — the default is that cells are initialized to zero but this may not be what you want. You should have started by exploring the demos for CNS and for any package you are planning to use. This will enable you to see how Jim organizes his models and the scripts to exercise them. The CNS documentation is pretty good and there’s an email forum you can subscribe to that has some useful posts.
If you do this conscientiously, you will avoid the dilemma of the programmer faced with a bug “somewhere” in a substantial block of code but reluctant to start all over again, which more often than not proves necessary. The best way to teach an engineer to incorporate unit tests into his or her code is to have them experience the frustration of having their code exhibit aberrant, inexplicable behaviour due to other engineers failing to incorporate appropriate tests in code that the first engineer’s code depends on.
So take your time and, if you do get stuck with a nasty bug, don’t keep beating yourself up about it — take a break or better yet get some sleep. You won’t want to do this when you’re in the throes of trying fix a nasty bug, and so make a pact with yourself beforehand to take the rational course of action when you get stuck. If you read enough about neuroscience, you will come to realize some of the limitations of your own wetware — it’s one of the side benefits of studying neuroscience. Since you’re smart, you’ll figure out ways to make your life better that take into account what we know about the brain — these lessons are likely to appear in toilets at Google with the heading “Optimize Your Life.” Best of luck coding.
P.S. I’ve used several of the CNS packages and even helped to develop one of them, but I haven’t used either the convolutional network package (CNPKG) or the Hodgkin-Huxley package (HHPKG). That said I’m happy to provide general advice about using CNS.
P.P.S. Your project writeup should be no shorter than a couple of pages — single-spaced 11 point type — and no longer than a conference paper — eight pages. You should also send me a tar ball containing your code and a description of any “materials” that you used — training data, model parameters, software libraries, etc.
Dileep George formerly Chief Technical Officer at Numenta and now at his new startup Vicarious Systems will be joining us on Wednesday. The course calendar for Wednesday includes a pointer to the DiCarlo paper  which we’ve already read but you might want to review in preparation for asking about Vicarious’ technology, and also a technical paper  on cortical circuits by Dileep and Jeff Hawkins. There is also a TED-style talk from the 2011 Singularity Summit featuring Dileep and his partner at Vicarious, Scott Brown.
Here are my transcribed and annotated notes from our class discussion with Bruno Olshausen, including feedback from Bruno on the first draft of these notes. We talked about a lot of topics and not surprisingly the points I wrote down relate to comments that I’m particularly interested in. If there were topics that Bruno touched upon that I missed but you’re interested in, please ask and I’ll do my best to address them. I welcome any insights you’ve puzzled out or papers that you think might be interesting to the class:
The book “Animal Eyes” by Michael Land and Dan Nilsson  as a starting point for a possible seminar course that builds on Land’s exposition of the diversity of biological visual sensory apparatus. Bruno said “Best (science) book I’ve ever read.” Here is the wiki for the seminar that Bruno led last term at Berkeley.
George Sperling — duration of visible persistence is inversely related to stimulus luminance — see references on the neural basis of iconic memory. Bruno said “Sperling did the study showing that the best way to compress video of ASL (american sign language) was to low pass filter and down-sample the video frames to 24 × 16 pixels and play at 15 frames per second .”
Using motion to separate figure from ground — work out of Jitendra Malik’s lab including papers by Pablo Arbelaez, Thomas Brox and Lubomir Bourdev’s poselet work [17, 54, 80]. Bruno mentioned specifically work by Patrik Sundberg  (PDF) who now works at Google.
Karen DeValois on reducing the bit rate when transmitting video of sign language over telephone lines — see also the discussion of image resolution in the Torralba et al  Tiny Images Project. The sign language work was by George Sperling . Karen DeValois showed that when you spatially lowpass-filter the frames in a movie, the content looks much sharper when you actually play it as a video sequence than when you look at each frame individually .
Work by Vijay Balasubramanian and his colleagues on the emergence of collective behaviour in large ensembles of neurons ; Bruno added “He has done work characterizing the information bandwidth of optic nerve, I’m not sure of the exact reference though. It was from about four years ago or so. He has also done some interesting work characterizing statistics of on- and off- contrasts in natural images and relating this to on- and off-cell responses in retina.”
Roland Baddeley — on random filters and kurtosis as a measure or indicator of sparsity. Bruno added that Baddeley “showed that any localized, zero mean filter can give kurtotic response histograms due to variations in contrast in natural scenes .”
Compartmentalized action potentials in the dendritic arbor — local computations carried out independent of what’s going on in the soma — see dendro-dendritic circuits in the olfactory bulb and motor cortex . The best work on this is by Bartlett Mell . Bruno added “that dendro-dendritic interactions take place in amacrine cells in retina and many other places in the nervous system, and especially in insect glomeruli. I don’t have a specific reference for this.”
Manifolds and sparsity — low-dimensional embedding at least locally — see Sam Roweis’ page on locally linear embedding — this is essentially what sparsity induces — as you move around on the surface the non-zero elements change and hence the basis for the local subspace changes — see related work by Hyvärinen et al and Karklin and Lewicki [59, 60, 66]. Bruno said “No reference for this really, its just my own idea, and how I think of Aapo’s bubbles work. It is also how I motivate what we are doing with Charles Cadieu’s model .”
Random filters yield an approximately orthogonal basis — at least in the case of convolutional models small filters tend to result in optimal stimuli that capture local image statistics but not so for larger filters . Bruno added “I can’t recall exactly what we were discussing here, but it’s related to Baddeley’s work above, i.e., the filter has to be localized in order for random filters to give kurtotic outputs. Also if you just look at the Fourier transform of a zero mean, random filter you will see peaks in the power spectrum that favor certain orientations. These get accentuated when you do the divisive normalization I believe.”
Finally, Bruno added “Oh, and here’s something  we didn’t discuss but which I learned about in the past couple days, and which may be interesting to your group since you guys are interested in faces (URL). The work won (only) the 2nd place prize in the best illusion contest at VSS this year. It is related to Mike Webster’s face adaptation illusions, but here the effect is much more profound (and freaky).”
I recently stumbled upon the work of Keith Stanovich on rational thinking and intelligence tests. Stanovich is credited with coining the term dysrationalia to characterize the human tendency to think — solve problems and make decisions — relying primarily on (what some would characterise as primitive) cognitive machinery that is quick and facile but not necessarily rational. Curious, since his research addresses the notion of executive function [29, 73, 75, 74] which I encountered back in the Fall preparing for CS379C, I took a look at several of Stanovich’s publications [110, 112, 111] and here are some excerpts that I found interesting enough to write down, starting with a popular piece that he wrote for Scientific American:
From: Rational and Irrational Thought: The Thinking that IQ Tests Miss. Scientific American. Volume 34. November/December 2009 (PDF).
IQ tests do not measure dysrationalia. But as I show in my new book, What Intelligence Tests Miss: The Psychology of Rational Thought, there are ways to measure it and ways to correct it. Decades of research in cognitive psychology have suggested two causes of dysrationalia. One is a processing problem, the other a content problem. Much is known about both of them.
The processing problem comes about because we tend to be cognitive misers. When approaching a problem, we can choose from any of several cognitive mechanisms. Some mechanisms have great computational power, letting us solve many problems with great accuracy, but they are slow, require much concentration and can interfere with other cognitive tasks. Others are comparatively low in computational power, but they are fast, require little concentration and do not interfere with other ongoing cognition. Humans are cognitive misers because our basic tendency is to default to the processing mechanisms that require less computational effort, even if they are less accurate.
The second source of dysrationalia is a content problem. We need to acquire specific knowledge to think and act rationally. Harvard cognitive scientist David Perkins coined the term mindware to refer to the rules, data, procedures, strategies and other cognitive tools (knowledge of probability, logic and scientific inference) that must be retrieved from memory to think rationally. The absence of this knowledge creates a mindware gap-again, something that is not tested on typical intelligence tests. One aspect of mindware is probabilistic thinking, which can be measured.
The modern period of intelligence research was inaugurated by Charles Spearman in a famous paper published in 1904 in the American Journal of Psychology. Spearman found that performance on one cognitive task tends to correlate with performance on other cognitive tasks. He termed this correlation the positive manifold, the belief that all cognitive skills will show substantial correlations with one another. This belief has dominated the field ever since. Yet as research in my lab and elsewhere has shown, rational thinking can be surprisingly dissociated from intelligence. Individuals with high IQs are no less likely to be cognitive misers than those with lower IQs. In a Levesque problem [introduced earlier in the article], for instance (the “Jack is looking at Anne, who is looking at George” problem discussed earlier), high IQ is no guarantee against the tendency to take the easy way out. No matter what their IQ, most people need to be told that fully disjunctive reasoning will be necessary to solve the puzzle, or else they won’t bother to use it. Maggie Toplak of York University in Toronto, West and I have shown that high-IQ people are only slightly more likely to spontaneously adopt disjunctive reasoning in situations that do not explicitly demand it.
For the second source of dysrationalia, mindware deficits, we would expect to see some correlation with intelligence, because gaps in mindware often arise from lack of education, and education tends to be reflected in IQ scores. But the knowledge and thinking styles relevant to dysrationalia are often not picked up until rather late in life. It is quite possible for intelligent people to go through school and never be taught the tools of mindware, such as probabilistic thinking, scientific reasoning, and other strategies measured by the XYZ virus puzzle and the fourcard selection task described earlier. When rational thinking is correlated with intelligence, the correlation is usually quite modest. Avoidance of cognitive miserliness has a correlation with IQ in the range of 0.20 to 0.30 (on the scale of correlation coefficients that runs from 0 to 1.0). Sufficient mindware has a similar modest correlation, in the range of 0.25 to 0.35. These correlations allow for substantial discrepancies between intelligence and rationality.
Summarizing the article, he writes:
Traditional IQ tests miss some of the most important aspects of real-world intelligence. It is possible to test high in IQ yet to suffer from the logical-thought defect known as dysrationalia.
One cause of dysrationalia is that people tend to be cognitive misers, meaning that they take the easy way out when trying to solve problems, often leading to solutions that are illogical and wrong.
Another cause of dysrationalia is the mindware gap, which occurs when people lack the specific knowledge, rules and strategies needed to think rationally.
Tests do exist that can measure dysrationalia, and they should be given more often to pick up the deficiencies that IQ tests miss.
From: “What Intelligence Tests Miss: The Psychology of Rational Thought” — Yale University Press, New Haven, CT, 2009.
In short there is considerable agreement that President Bush’s thinking has several problematic aspects: lack of intellectual engagement, cognitive inflexibility, need for closure, belief perseverance, confirmation bias, over confidence, and insensitivity to inconsistency. These are all cognitive characteristics that have been studied by psychologists and that can be measured with at least some precision. However, they are all examples of thinking styles that are not tapped by IQ tests. — Page 2
For example many largely noncognitive domains such as socioemotional abilities, motivation, empathy and interpersonal skills are almost entirely unassessed by tests of cognitive ability. However these standard critiques of intelligence tests often contain the unstated assumption that although intelligence tests miss certain key noncognitive areas, they encompass most of what is important cognitively. — Page 5
Adaptive behavioral acts, judicious decision making, efficient behavioral regulation, sensible goal prioritization, reflectivity, the proper calibration of evidence — all of the characteristics that are lacking when we call an action foolish, dumb, or stupid — are precisely the characteristics that cognitive scientists study when the study rational thought. — Page 15
The defining feature of of Type 1 processing is its autonomy. Type 1 processes are termed autonomous because: 1) their execution is rapid, 2) their execution is mandatory when the triggering stimuli are encountered, 3) they do not put a heavy load on central processing capacity (that is, they do not require conscious attention), 4) they are not dependent on input from high-level control systems, and 5) they can operate in parallel without interfering with each other of with Type 2 processing. Type 1 processing would include behavioral regulation by the emotions; the encapsulated modules for solving specific adaptive problems that have been posited by evolutionary psychologists; processes of implicit learning; and the automatic firing of overlearned associations. Type 1 processing, because of its computational ease, is a common processing default. Type 1 processes are sometimes termed the adaptive unconscious in order to emphasize that Type 1 processes accomplish a host of useful things — face recognition, proprioception, language ambiguity resolution, depth perception, etc. — all of which are beyond our awareness. Heuristic processing is a term often used for Type 1 processing — processing that is fast, automatic, and computationally inexpensive, and that does not engage in extensive analysis of all the possibilities. — Page 22
Note that one of the key roles for Type 2 processing in rational thought involves turning off Type 1 (autonomous) processing when appropriate.
Such processes — face recognition, syntactic processing, detection of gaze direction, kin recognition — are all parts of the machinery of the brain. They are also described as being part of human intelligence. Yet none of these processes are ever tapped on intelligence tests. What is going on here? Is there not a contradiction. In fact, there is not a contradiction at all if we understand that intelligent tests assess only those aspects of cognitive function on which people tend to show large differences. [...] Intelligent tests are a bit like the personal ads in the newspaper — they are about the things that distinguish people, not what makes them similar. That is why the personals contain entries like “enjoy listening to Miles Davis” but not “enjoy drinking when I’m thirsty.” — 27
To be rational, a person must have well-calibrated beliefs and must act appropriately on those beliefs achieve goals — both properties of the reflective mind. The person must, of course, have the algorithmic-level machinery that enables him or her to carry out the actions and to process the environment in a way that enables the correct beliefs to be fixed and the correct actions to be taken. Thus, individual differences in rational thought and action can arise because of individual differences in intelligence (the algorithmic mind) or because of individual differences in thinking dispositions (the reflective mind). — Page 33
Nonetheless, we have consistently found that, even after statistically controlling for intelligence, individual differences on our index of argument-driven processing can be predicted by a variety of thinking dispositions, including: measures of dogmatism and absolutism; categorical thinking; flexible thinking; belief identification; counterfactual thinking; superstitious thinking; and actively open-minded thinking. It is likewise with other aspects of rational thinking. For example, researchers have studied situations where people display a particular type of irrational judgements — they are overly influenced by vivid but unrepresentative personal and testimonial evidence and are under-influenced by more representative and statistical evidence. We have studied a variety of situations in my own laboratory and have consistently found that dispositions toward actively open-minded thinking are consistently associated with reliance on the statistical evidence rather than the testimonial evidence. — Page 36
Note that to be effective the disposition to use statistical evidence requires an algorithmic and perhaps autonomous (Type 1) level of processing beyond many individuals’ cognitive capacity and so the former may be necessary but not sufficient for a given level of performance — see Tversky and Kahneman .
In an important study, Angela Duckworth and Martin Seligman found that the grade point average of a group of eighth graders were predicted by measure of self-discipline (that is, indicators of response regulation and inhibition at the reflective level) after the variance due to intelligence was partialled out. A longitudinal analysis showed that self-discipline was a better predictor of the changes in grade point average across the school year than was intelligence. — Page 37
A while after the Nishimoto et al  paper was published, two of my colleagues at Google sent me links to on-line news and university PR pages that touted the paper as performing some sort of mind-reading — for instance, the article on KurzweilAI entitled “How to make movies of what the brain sees” summarizing the paper in somewhat sensational terms is available here. I responded with the following note displaying some reservations and skepticism regarding the hype surrounding the paper:
I am looking forward to reading the paper. Jack Gallant’s lab has provided some very interesting — if often sensational — results over the years. I’m not too surprised that they are able to find locations in cortex where retinal images are reproduced with enough fidelity that we can recognize the correspondence — that is the human visual system is capable of detecting signals suggestive of original video. Back in the 1999, another lab at Berkeley demonstrated that you could show cats images, record from cells in the lateral geniculate nucleus (LGN) and then played them back to provide similar fuzzy reproductions of the image .Having since read the paper and discussed it with Greg Corrado and Jon Shlens, my general impressions have only been slightly amended. For your benefit, I’ve summarized some of my conclusions below along with some links and references you might find useful or interesting:
The LGN is one of dozens of nuclei in the thalamus which has been ignored historically having written it off as just a relay. The Stanley et al work didn’t really help matters and, to make things worse, it reinforced the antiquated notion of the “Cartesian Theatre” of the mind which Descartes introduced along with its homunculus viewer and has been roundly rejected by modern neuroscience. Today we have a much more nuanced understanding of the LGN and the thalamus in general but it still remains largely a mystery.
I’m assuming that the new work from Gallant’s lab records from cortex, but in this case I expect the images were reproduced in the early visual cortex — the striate cortex (V1) or one of the adjoining areas which are also retinotopicaly mapped. That we can use fMRI to “read brains” as it were is not new. Back in 2008 there were two papers in major journals — one of them from Gallant’s lab and the other from Tom Mitchell and his team at CMU — that showed you can train a classifier to read from the fMRI data, and that you could even train on one brain and read from another though with less accuracy.2 The two papers [68, 82] were much touted by their respective institutions and written up in the New York Times and talked about on NPR Science Friday.
What is most revealing about these papers is that the useful signals were not found in regions further along the visual pathways but very early in visual processing, or, more generally in the case of the Mitchell et al paper, in areas of cortex closer to the periphery — where incoming axons are mapped topographically to reflect their spatial relationships in our peripheral sensory apparatus. In hindsight this is not surprising since the primary sensory cortex and V1 in particular tend to be mapped out similarly across species: cats, macaque monkeys and other primates. All the useful cross-individual signal was located in primary sensory cortex, since there is a great deal of variation in the “higher” or more abstract association areas across individuals.
Nishimoto et al  describe a method for learning a model which enables them recover video imagery from BOLD (blood oxygen level-dependent) signals derived from fMRI data of subjects watching videos. BOLD signal processing has become an important technique for cognitive neuroscientists 3 but has limitations in resolving power; most current fMRI techniques are limited by a temporal resolution on the order of 100 milliseconds and a spatial resolution of 1-2 millimeters. Nishimoto et al compensate for the relatively slow-changing BOLD signals by learning a model that exploits not only data originating in the ventral visual stream including the so-called  the ventral V2 and ventral V3 areas — receiving input from striate (V1) cortex and projecting to the inferior temporal cortex, but also from the dorsal stream including the dorsal V2 and dorsal V3 areas — which also receive input from V1 but project to the posterior parietal cortex.
The traditional what-versus-where division into ventral and dorsal pathways is controversial and gradually giving way to a more integrated model, but there appears to be some truth to the idea that the dorsal pathway bears much of the burden for encoding motion. While it seems likely that motion is most fully integrated in the medial temporal (MT) and medial superior temporal (MST) areas , a significant fraction of neurons earlier in the visual stream are responsive to motion.
Nishimoto et al develop a predictive model that makes use of the motion-energy models developed by Adelson and Bergen 4 and Watson and Ahumada 5 — both published in 1985 in the same issue of the Journal of the Optical Society of America. In particular, they pass luminance data derived from BOLD signals “through a bank of 6,555 spatiotemporal Gabor filters differing in position, orientation, direction, spatial and temporal frequency”, calculate the motion energy by squaring and summing Gabor filters in quadrature and then subsequently pass the results “through a set of hemodynamic filters fit separately to each voxel” of the fMRI data. This looks very much like the Adelson and Bergen model and the supplemental materials provide more detail. The novel idea in this paper involves using this motion-energy model to predict the BOLD signals. Additionally, their method of “[P]rojecting the optimal speed of the voxels onto a flattened map of the cortical surface revealed a significant positive correlation between eccentricity and optimal speed: relatively more peripheral voxels were tuned for relatively higher speeds.”
It is interesting they were able to achieve the level of fidelity evident in the reconstructions. That said this paper leaves most of important questions unanswered and should be seen primarily as a tantalizing suggestion that future non-invasive technologies for recording from the brain may be able to reveal much more than previously thought possible. It also attests to the value of the motion-energy models and suggests that we might avoid the difficulties of learning space-time filters by using a variant of the Adelson and Bergen model.
Ben Poole mentioned that “there’s been a bit of work on average models but I don’t think they work out that well due to large between-subject variability. There is some neat work on “hyper-alignment” where they leverage other subjects’ data for prediction, but they’re still estimating parameters for each subject” — see here. We also talked briefly about methods that combine semantic and low-level visual features by specifically recording from relevant areas of cortex and I mentioned the work of Naselaris et al  — also out of Gallant’s lab — which combines fMRI signals from earl and anterior visual areas and also explores the use of different priors in a Bayesian decoder that attempts to reconstruct complex natural images.
Near the end of class, I mentioned Greg Egan’s Zendegi which Steven Smith recommended and that introduces the idea of “side-loading” defined as “the process of training a neural network to mimic a particular organic brain, based on a rich set of non-intrusive scans of the brain in action.” This might turn out to be an example of science-fiction inspiring real science as in the case of Arthur C. Clarke’s early prediction of geostationary satellites. You might also be interested in the virtual brain project as an alternative approach to modeling the whole brain but with a more aggressive time line in large part because it relies on existing off-the-shelf technologies. The project has recently received a large infusion of funding which makes it all the more attractive.
I mentioned that the idea of programs that write other programs is quite common. So common that one of the earliest computer languages (Lisp) had this capability as a key characteristic. In the first chapter of an introductory text I gave a very simple example:
ls | sed ’s/html//’ | awk ’print "mv " 1 "html " 1 "htm"’ |It is simple but idiomatic. Many tools used by software engineers involve meta programming. Perhaps the most most commonly used class of meta programs are compilers that take as input a program in one language — such as Java or C++ — and generate another program in a second language — usually assembly language — for a particular target machine. There is also a class of auto-configuration tools that are used to automatically generate Make files which analyze a computer to determine characteristics of its hardware and installed software and then generate another program — the Make file — which is used to build an executable for a particular application.
Meta programming has a long history in computer science and artificial intelligence. John Conway’s Game of Life is in fact a program for generating other programs through simulated natural selection. John von Neumann was interested in automata capable of reproducing themselves. Several scientists including Marvin Minsky and Douglas Hofstadter have characterized intelligence in terms of meta programming with the deeper the recursion — for instance, meta meta meta ... programming, referring to the ability to write programs that write programs that write programs ... — the more capable and potentially intelligent the programmer.
At first blush, it may seem complicated to build such programs but it is really quite natural for programmers; once you have a tool that implements one meta program, it is easy enough to use that tool as part of another meta program and in this way the programmer extends the recursion one level deeper.
What have we learned from the large-scale brain simulation work of Markram, Izhikevich and Modha? In our discussion with Greg Corrado, we focused on how we might evaluate such models, and, for the most part, we failed to come up useful criteria that we could apply directly to existing models.
From a practical standpoint, we couldn’t think of an application that would demonstrate the advantages of such a model or that we thought might be able to compete with state-of-the-art alternative technologies. In particular, we didn’t see how such models could be used to explain, diagnose or otherwise provide insight into brain disorders, and meaningful side-by-side comparisons of simulated and actual neurons seemed limited by current recording technologies and the logistics of creating a close correspondence between the simulated and biological tissue sample. From a theoretical standpoint, lack of relevant mathematics limited what we could infer about the observed behavior of large-scale simulations
Before we dismiss this work out of hand, I think it worth a little more time to reflect on the potential value of large-scale neural simulation work, and, in particular, the sort of models that we have discussed in class.
Simulations as a means of exploring physical systems has a long history. A poor simulation can yield conclusions that have little or no relevant to the modeled system. Even the most accurate simulations leave out details and introduce artifacts. That said, simulations of the weather, protein folding, geology, cosmology, epidemics, and the like have yielded testable hypotheses that have proved enormously useful. In the case of studying the brain, we are not able record simultaneously from large numbers of neurons or perform multiple trials starting from the same initial conditions.
We talked about the need for better mathematical foundations and the work of Bill Bialek , Nancy Kopell  and other physicists and mathematicians who are interested in understanding the dynamics of large ensembles of neurons. But mathematician need data or at least interesting observations to guide their investigations and development of new mathematics. The algorithm that Brin and Page  developed for computing PageRank relied on simulating users surfing the web; Jon Kleinberg’s work  relied on spectral analysis and more sophisticated mathematics but arrived at roughly the same conclusions. Much of the work on random graphs to model networks started with clumsy simulations. Whole brain simulations could provide a similar stepping stone to developing mathematical models.
I didn’t trot out Modha’s work as a whipping horse. I actually believe that he has made an important contribution both in providing another indication of what is possible and in extrapolating from his results to show what might be possible using these models as computing technology relentlessly scales according to Moore’s law. Science often advances significantly when someone shows that something formerly thought to be impossible or impractical is concretely demonstrated to be feasible. When we observe interesting dynamical systems behavior in Izhikevich’s or Modha’s simulations, we have to wonder if we actually captured an important property of the dynamics of neural systems in our simple models. It may be some time before simulations yield deep insights into biological systems, but I think it is a good thing we are building such models and challenging scientists and mathematicians to critique them.
I noticed that some students in the class don’t know much about artificial intelligence or the test designed by Alan Turing for distinguishing humans from machines. A quick introduction to the subject is available in this Radiolab podcast entitled “Talking to Machines” which aired last year and has a very different perspective than my book entitled“Talking With Computers.”
In addition to the Turing test, the podcast describes Joseph Weizenbaum’s ELIZA program which is the basis for the chatbots that have dominated recent Loebner Prize competitions. Weizenbaum died in 2005 but there is an interesting interview with Sherry Turkle who knew and worked with Wiezenbaum when he was at MIT. Turkle’s new book entitled “Alone Together” touches on several of the issues raised in her interview.
The podcast also covers humanoid robots vying for realism such as David Hanson’s human-looking robotic heads with synthetic flesh and dozens of actuators to simulate facial muscles and convey emotional states. Hanson’s robot and others like them are quickly approaching the so-called uncanny valley of realistic animation. Some of hosts Robert Krulwich and Jad Abumrad’s commentary is inane but that may be just part of their interviewing strategy to get their guests to open up.
Apropos Bob’s comments about the many aspects of vision that DiCarlo’s model ignores, you might want to think about the visual angle spanned by the fovea. It was mentioned in class that your thumb subtends about two degrees when held at arm length and this is a good way of estimating the region of an image in which you have the greatest acuity. If you want a little more detail check out this page by Bruno Olshausen. And DiCarlo doesn’t begin to address what we are now learning about the varieties of retinal ganglion neurons, and, in particular, the class of RGN with eccentric dendritic arbors that we heard about in Markus Meister’s talk. There are also some subtle issues concerned with temporal versus spatial sampling that complicate the picture.
I mentioned in class that deaf students often exhibit learning deficits that have been linked to their early exposure to language — or lack thereof — whether through reading or signing. If you’re interested, check out Harley Hamilton’s  paper on short-term and working memory learning deficits among the deaf. We take for granted the number seven plus or minus two — also the title of one of the most cited papers in psychology — as defining the limits of our working memory, but we are coming to understand that this is not a lower bound and it appears that we have to work to achieve this capacity — work which for many of us happens in the course of learning to speak or sign language.
Dick Lyon’s slides are available here. Dick’s history of scientists contributing to hearing included J.C.R. Licklider who is widely credited with encouraging government investment in the research that led to the internet and many of the advances in computer technology that led to time sharing, WIMP — windows, icons, mouse and pointer — and the personal computer. Hans Weber sent this link to a recent article reporting on the mating sounds of mice which are pitched in the ultrasonic range; you can hear a frequency-adjusted rendition of a wild deer mouse singing in this video.
Since a lot of our class discussions have emphasized the development of new cell imaging and recording technologies, I thought it might be worth mentioning a couple of the technologies that figured prominently in DiCarlo’s analysis of circuits in inferotemporal (IT) cortex. Yamane et al [134, 133] and Tsunoda et al  use optical imaging based on intrinsic signals which are good at visualizing active cortical regions at a spatial resolution of greater than 50 μm. This techniques exploits small optical changes associated with metabolic activity. Unfortunately the signals are somewhat noisy and thus one typically has to present the target stimuli multiple times, imaging the tissue during each trial and then averaging the results to obtain a more robust signal. As you might guess, temporal resolution suffers in applying this strategy. The alternative is to utilize fast extrinsic probes:
In such experiments the preparation under study is first stained with a suitable voltage-sensitive dye. The dye molecules bind to the external surface of excitable membranes and act as molecular transducers that transform changes in membrane potential into optical signals. The resulting changes in the absorption or the emitted fluorescence occur in microseconds and are linearly correlated with the membrane potential changes of the stained cells. These changes are then monitored with light measuring devices. By using an array of photo-detectors positioned in the microscope image plane, the electrical activity of many targets can be detected simultaneously [...] The development of suitable voltage-sensitive-dyes has been the key to the successful application of optical recording, because different preparations often required dyes with different properties [...]. Optical imaging with voltage-sensitive dyes permits the visualization of cortical activity with a sub millisecond time resolution and a spatial resolution of 50-100 microns. — see Grinvald et al  for more detail (PDF)Combined with more conventional single-cell-recording, newer multi-probe-array technology and micro-stimulation techniques6 we are beginning to scale up the analysis of circuit-level structures involved in neural coding. The slides discussing work from Manabu Tanifuji’s lab at the Riken Brain Science Institute and David Cox’s lab at the Roland Institute at Harvard are available here — see Slides 46–62.
Stacey asked three questions following class on Monday that I thought others of you might be interested in hearing answers to. My answers (below) are largely self contained, but I’ve added Stacey’s question as a footnote7 because they do a good job of framing the issues. Here are my answers:
In answer to (2), something like you mention — using local descriptors based on aggregated gradient information — has been used to represent object parts which are combined to build models of objects as ensembles of parts [43, 41, 42, 40, 39]. You can think of these basic models as the nouns in a scene. These visual nouns are then combined into visual phrases  which are ensembles of part-based models (PDF). Geometry is used to relate parts to one another — the leg is attached to the torso — and objects to one another — the person is sitting on the horse (HTML).
If you’re interested in this idea, then definitely check out Sadeghi et al . Not so much for the specific techniques — though their technique has merit, but rather for the discussion about composite — what they call phrasal — features. At first blush, detecting a visual phrase like “a person sitting on a horse” would seem to require recognizing a person and recognizing a horse, and one might even have to consider the cross product of all possible horse detectors and all possible person detectors. However, as the authors point out, the concept of a horse and rider highly restricts how the pose of the rider is related to the pose of the horse. The paper explores just a few of the possible strategies for exploiting the relationship between horse and rider to speed things up and actually improve recognition accuracy over methods that attempt to detect the horse and rider independently.
Regarding (3), only recently has the computer vision community had the computational power at its disposal to exploit temporal coherence in video and really only large industrial labs with lots of computational resources have the ability to sample anything approaching what a child is exposed to during the first few years of life. We have several projects at Google that are mining YouTube data using unsupervised learning but there is still a need for some annotated labeling and the problem is very challenging.
Finally, with respect to (3), follow this research link on Aude Oliva’s lab page at MIT, and scroll down to the section titled “Computational Visual Cognition” where there is a summary of her work on using a whole-image descriptor called the gist that she developed with Antonio Torralba and then later demonstrated how it could be used to introduce contextual information into object detection and recognition. In particular, see her 2008 paper with Torralba in Trends in Cognitive Science (PDF).
In the PASCAL VOC challenge, the classification task is defined as determining if an object is present in an image without having to specify where, while the detection task is defined as specifying a bounding box containing an instance of the object category. Top performing entries for the classification task have used so-called bag-of-words models where in this case the words are local visual features like patches of texture. Top performing entries for the detection task often use parts models of the sort described above plus the best solution to the classification task to provide context.
Markus Meister’s lecture at the Allen Institute and the related paper by Kay et al  reminded me of the famous paper by Lettvin et al  entitled “What the Frog’s Eye Tells the Frog’s Brain.” The Lettvin et al paper is usually required reading for this course but Meister’s lecture and the publications from his lab cover much of the same territory sans the insights of Lettvin and his colleagues separated by half a century. There are two important points made in each of these two pieces of research: first, that interesting, special-purpose, survival-related processing occurs in the periphery as well as in the brain proper, and, second, that visual processing, the needs of the organism and the environment in which it is situated are inherently ecological in the sense of J. J. Gibson [48, 49].
It is interesting to note that at least in the retina the diversity in cell types seems to correspond to a diversity in functional types. The retina is not nearly as simple in its coding of the visual signal as we initially imagined; moreover, the mouse-eye-cortex system resembles the frog-eye-tectum system more than we originally thought. The off-center dendritic arbors in mouse retinal ganglion cells underscore Sebastian Seung’s focus on connectomics — or at least the utility of local connectomes that span a few millimeters, and the functional diversity identified by Meister in retinal cells begs the obvious question regarding the synaptic diversity in the cortex identified by Steven Smith. But notice that it was optogenetics and not slice-scan-and-segment-style connectomics that provided the stunning evidence of the off-center dendritic arbors. It is important to bear in mind that differences in gene expression control not only function in the strict routine-adult-information-processing sense, but also developmental processes that direct axonal projection to specific post synaptic locations and signalling both during development and later-stage-fine-tuning of connections that support retinotopy and other topographic mappings.
In discussing the DiCarlo paper , I mentioned the work of Olshausen and Field [91, 92] and Hyvärinen [57, 58] on sparse coding as a general principle for visual information processing. Horace Barlow — the great-grandson of Charles Darwin — cast visual processing — often referred to as visual coding — in terms of reducing of redundancy. He was one of the first neuroscientists to promote studying the statistics of natural scenes and introduce notions from information theory . Bruno Olshausen and David Field were continuing Barlow’s line of inquiry when they developed their sparse coding account of visual processing in the early visual pathways. Aapo Hyvärinen has pushed this agenda even further with his advocacy of statistical coding theory to learning topographic maps and his textbook on the subject with Patrik Hoyer is a wonderful introduction to this rich area of research .
For Monday watch Markus Meister’s video and read the DiCarlo paper available on the class calendar. I want to focus our class discussion on the sort of models that DiCarlo emphasizes in his paper and why these are at least as valuable scientifically as the sort we have looked at in previous classes. Send your questions to Sanjeev (email@example.com) copied to me.
For Wednesday, Dick Lyon will be giving a more conventional lecture which I think you’ll find very interesting. Dick has been working on a new book on hearing and has some really interesting insights into human audition. There will be plenty of opportunities for you to ask questions during the lecture. In addition to knowing about biological hearing, Dick has a lot of experience applying biological principles to build Google-scale tools for analyzing music files, the audio track of YouTube videos, etc. Send your questions to Dan (firstname.lastname@example.org) with a copy to me.
As a little side feature, you might find this Radiolab podcast on music and its relationship to language and emotion. The excerpts featuring science writer Jonah Lehrer and neuroscientist Mark Jude Tramo focus on auditory cortex and how “discordant” tones affect us emotionally.
The Oxford Future of Humanity Institute report entitled “Whole Brain Emulation: A Roadmap” is a little dated, but the document provides a nice overview and is well worth skimming. It categorizes a wide range of models and summarizes the state-of-the-art circa 2007. You can find it here and I think you’ll find it quite readable given what you’ve learned in class so far.
Here’s a pointer to a recent article on ephaptic coupling where extracellular fields are determined by the influence of ions flowing across the cellular membranes of nearby neurons firing in response to synaptic connections and these feed back to influence other nearby neurons in combination with their current synaptic activity. This paper includes Koch and Markram as co-authors but references other papers that Ed mentioned including the McCormick work.
The concept of “opto fMRI” and how it might be used to help us understand what BOLD signals actually tell us about synaptic transmission. The application of human opsins for studies involving human subjects in which we want to minimize the chances of immunological rejection. Nancy Koppell’s work on the number of cell assemblies relates to the information capacity of the networks coordinated by rhythmic oscillations. Friedrich and Laurent on the role of oscillations in the olfactory bulb and related work by Beshel, Kopell and Kay.
Bob’s question concerning recurrent networks and moving beyond classical Hebbian learning at which point Ed talked talked about the role of homeostasis and sleep in managing our finite synaptic capacity and gave an interesting example of how the same network can be activated to fire in a different order in response to similar stimuli. Mechanisms whereby networks composed of neurons with differing cell properties can achieve functional equivalence by modulating ion channel expression and synaptic strength in this article by Grashow, Brookings and Marder.
Ed mentioned Anabel — one of his former graduate students — and sharp wave replay as possible mechanism for long-term memory consolidation. Greenberg and Werblin’s work on single-neuron computation of center-surround function. Work by Charles Schroeder and his colleagues on oscillations in V1 and V2 serving in an attentional or synchronizing role. Ed suggested that perhaps oscillations can serve as master controller that reconfigures the system depending on function, even determining how feed-forward bottom-up and feedback top-down influences are mediated.
Yan was intrigued with Ed’s response to my question about new technologies or scaling old ones to dramatically speed up the effort to understand the brain at the circuit level. I’ve heard scientists and engineers working on nanotechnology speak about the possibility of using nanobots to either record in vivo and subsequently be harvested from the tissue — even the prospect of continuous recording using some form of noninvasive tomography — or providing the nanobots with some method of communicating with receivers outside the skull. Such musing has always struck me as idle speculation. I did find this article entitled “Why it will take 220 years to monitor every neuron in the brain” and a pointer to the Brain Emulation Roadmap document produced by the Future of Humanity Institute at Oxford. The Roadmap document is a little dated — it was published in 2007 — but interesting to skim.
On a completely different subject, I see my job as creating a rich and exciting learning experience for you. This class is particularly aimed at providing you with direct access to some of the leading scientists in systems neuroscience with the hope that some of you will be so excited by the opportunities in this field that you’ll shape your education at Stanford and fill the ranks of the next generation of leading engineers, scientists and entrepreneurs in this field. Your job is to take advantage of the opportunities to meet with these scientists. A little work digging deeper into the papers, following leads from the recorded lectures, and generating and asking good questions will pay dividends in terms of your retaining what you’ve learned and integrating it usefully with other things that you know.
Ed spends a good portion of his invited talk at Case Western focusing on work in his lab on optogenetics. If you’re not already familiar with this technology, you might want to check out the Wikipedia articles on channelrhodopsins and light-gated ion channels. Think about cases where these methods are an improvement on implanting electrodes for single- and multi-cell recordings, and how the probes that that Ed describes having both light guides and electrical sensing capabilities might be used. The questions and Ed’s answers at the end of his talk are very interesting; Case Western has a bunch of good neuroscientists, e.g., Jerry Silver, working in optogenetics and other related technologies that we’ve been discussing in this class.
I listened to a Radiolab podcast on memory and forgetting during my run this morning. Most of what they — Robert Krulwich, Jad Abumrad, Jonah Lehrer, Joseph LeDoux, Elizabeth Loftus, Oliver Sacks — talked about was old news to me, but one of the primary lessons they emphasized bears repeating, namely that memory is a constructive, creative process and each time that you recall some episode from your life your memory of that episode will be changed, possibly even distorted by the context and emotions in play at the time of recall. The first time this lesson was deeply impressed upon me was in reading Donald Hoffman’s aptly titled “Visual Intelligence: How We Create What We See” . In this book, he describes many of the illusions, pathologies, and everyday examples that characterize a vision system whose foibles we tend to overlook because what — and how — we see determines what we know and remember. I highly recommend this book for anyone one interested in human visual perception and visual memory.
Science fiction and science fact are more often than not right on the heels of one another — at least in the hands of the best science-fiction writers. Here are two SF writers who mostly get it right: If you crave a science-fiction account of where Sebastian Seung’s work on the Connectome is taking us, check out Greg Egan’s Zendegi — you can get the Kindle version for one cent on Amazon or read it on your iPhone for the same low price. Apropos our class discussion yesterday, in the first chapter of Accelerando by Charles Stross, we learn early on that researchers in San Diego are uploading lobsters into cyberspace, starting with the stomatogastric ganglion, one neuron at a time, and then a little later we read that the main protagonist, Manfred Macx, has patented the idea of using lobster-derived AI autopilots for spacecraft. Stross is very inventive and this is one of his best books.
For a quick introduction to the real science behind Stross’s interest in lobster neurons, check out this short article on Wikipedia. For a more in-depth discussion, see this Scholarpedia article which provides an overview of the work by Marder that Jon mentioned — Marder, E., Bucher, D. (2001) Central pattern generators and the control of rhythmic movements. Current Biology 11:R986-R996. Check out David McCormick’s lab at Yale for more information on propagation of attenuated signals, and I missed Jon’s reference but there are a lot of papers referring to basket cells sorting / routing based on frequency.
You can find the supplement for Edelman’s “Learning in and from Brain-based Devices” here. This document includes lots of details about the modeled nervous system including its hippocampal function, learning and synaptic plasticity model, and the inputs and outputs to the robotic hardware. If you’re interested in a project along similar lines, think in terms of simulated robot unless you already have access to robot hardware and are very familiar with using it — otherwise you’ll very likely spend too much time messing around with finicky hardware that is not directly relevant to your project. If you’re interested in reading the classic papers of Hodgkin and Huxley, you can find PDF versions on this Society for Neuroscience page. The particular paper that Jon mentioned is available here.
Jonathon Shlens who works for me at Google will be joining us for Monday’s discussion. Jon is an accomplished neuroscientist who has worked with E.J. Chichilnisky and Eero Simoncelli among others. His tutorials on information theory, principal components and other topics on the mathematics related to computational neuroscience are renowned for their clarity and insight. He is also a superb software engineer and statistician. You can find pointers to his excellent tutorials and a list of his publications on his Salk Institute home page. He knows a great deal about neural modeling and will help us to critique the Edelman and Izhikevich paper and perhaps revisit some of the claims in Markram’s talk.
In other news, thanks to Rohan’s efforts, a $3,000 gift from Google to Stanford for GPU hardware, and Andrew Ng’s generosity and that of his students, we will soon have five linux boxes each equipped with a powerful GPU and available for your use in Gates B38. If you’re planning to use these machines, please contact Rohan and volunteer to help out. More details to follow soon.
A new fine-scale, gene-expression analysis was recently released by the Allen Institute for Brain Science. The analysis includes over 1000 genes and includes cross species — human and mouse — genomic comparisons. Weeden et al  provides interesting insights into how the major fiber cortical tracts are arranged geometrically. The authors write in the abstract: “This architecture naturally supports functional spatio-temporal coherence, developmental path-finding, and incremental rewiring with correlated adaptation of structure and function in cerebral plasticity and evolution.”
I was looking for a talk by György Buzsáki  and I ran across the Allen Institute for Brain Science’s YouTube channel which offers a wealth of good neuroscience talks. There you can find Buzsáki’s talk entitled “Internally evolving cell assembly sequences in the service of cognition” as well as a talk by Idan Segev on dendritic inhibition — Segev is one of the authors of  which we read along with Markram’s video. I finally found the reference to Izhikevich and Edelman’s brain-simulation work that I was looking for in Buzsáki’s 2006 book . The relevant excerpt concerns Izhikevich and Edelman’s account of spontaneous activity and rhythmic patterns in their simulation experiments — the title of Buzsáki’s book is “Rhythms of the Brain.” I’ve included the excerpt as a footnote.8 You can find a PDF version of the book online if you search for it, but I have no idea if the document is free of copyright restrictions and so I have not included it here.
The paper by Izhikevich and Edelman  for Monday’s class is linked off the course calendar page as usual. Stacey Svetlichnaya (email@example.com) is the host for Monday’s discussion and so send her your questions with a copy to me. We didn’t get to all of your questions regarding Professor Smith’s work and on Monday we can discuss any of those that were unresolved following our Wednesday discussion with Steven.
One recurring question that has come up concerns how we might know if a complex simulation is doing the “right thing.” As an example of a robotics-based approach to performing such verification, you might check out the “Brain-Based Device for Playing Soccer” project page for a spiking neuron model model that controls a Segway to play soccer. There are additional papers on The Neurosciences Institute publications page that explore devices controlled by biologically-motivated artificial neural circuitry.
Professor Kwabena Boahen joined our class discussion on Wednesday. He is known for his work on neuromorphic circuits for biologically-inspired computing. You might find it interesting to check out Kwabena’s “Googling the Brain on a Chip” video on Stanford’s YouTube channel. For another Stanford professor doing research relevant to the topics of this class, you might also enjoy Jennifer Raymond’s “Building a Circuit-Diagram for the Brain” video which is on the same YouTube channel.
You might have noticed Smith reiterating one of the themes of the first lecture, namely, that there are many different sources of evidence that one can bring to bear and these sources span several orders of magnitude in terms of the scale at which operate. I appreciated his suggestion that more careful categorization of synapses would help to resolve disputes about the response of neurons to stimuli or results from proteomic assays that stem from two labs working on different types of neurons or the same type of neuron but different types of synapse. The separation of a single dendrite and highlighting its synapses is quite a dramatic demonstration of just how dense are the working parts of neurons — and Smith warns us that the graphics are grossly simplified to make the images comprehensible. Another dramatic visual insight was the imagery around 40 minutes into the video showing arc gene products which serve as markers for synaptic plasticity. Listening to Steven’s lecture, it is hard to resist the conclusion that his combination of immunofluorescence for protein tagging and synapse identification and electron-microscopy for cell-body segmentation would solve many of the problems Sebastian mentioned in his papers.
As fodder for your summary exercise, check out this Wikpedia article on the Immunofluorescence Protocol and another on the mouse barrel cortex showing the characteristic cell layers in the cortical sheet. Note the use of state-of-the-art machine-learning tools including support vector machines — not so new — and random forests — all the rage and the best-performing model in Smith’s experiments. The important point is that Smith’s lab is using modern ML tools illustrating one important way that computer science is changing the way we conduct neuroscience. On a different note, here is a fMRI-based approach to mapping functional circuits in cortex which recently received a large infusion of private-sector funding. It is obviously a very different approach from the modeling-individual-neurons efforts that we’ve been discussing in class, but it has the advantage that it is more likely to yield interesting results in the near term — at least it will “fail fast”, since given their goals it should be relatively easy to measure success.
Earlier for Sebastian’s talk, Sanjeev asked for an elaboration to one of the audience’s questions — somebody was asking why getting only the connectome information would be sufficient. After listening to Steven’s lecture and reading the paper, he asked “It appears that the proteomic diversity within a synapse is involved in both memory formation and neural disorders. This seems to imply then that the connectome information alone would not be sufficient to ‘read off’ memories or detect connectopathies. Is this correct?” Here is my response:
I don’t think anyone knows. The only support offered by Sebastian is the bird song example in which the engram for the song memory is — at least according to Sebastian — highly correlated with a particular graph structure. It is highly speculative to suppose that such a graph structure is either necessary or sufficient for identifying song engrams much less the particular song of, say, a goldfinch. Once we’ve analyzed a few real connectomes maybe we’ll discover a rich collection of distinguishable network structures, each of which highly constrain the types of memories it can encode. I’m skeptical that this will enable us to read memories from connectomes but it is an interesting scientific hypothesis and one worth pursuing.
As for the diversity of synapses, we have a long way to go before we know how much more information labeling the synapses will add to the connectome, at least with respect to our ability to understand the basic computations performed by the brain. Perhaps diversity plays only a supporting role such as making us less vulnerable to viral infections that target specific synaptic proteins. My guess is that the brain makes use of these different signaling pathways for critical computational reasons. But as Steven says at the beginning of his talk, we have to quantify the synaptic diversity before we can consider whether the resulting variation can be collapsed back into a simpler model for explaining neural function.
Project proposals are due by class time on Monday, May 7. Check out the course project page if you haven’t already. If you have any questions about what’s required for a project, please send them along. I welcome first drafts if you want feedback before submitting the final draft of your proposal. I’m happy to meet before or after class to discuss as well. Take this seriously, this is 20% of your grade. Just as coming up with good questions is an important part of doing science so too is placing your bets on what to work on. It is important to have a clear idea of what you want to accomplish, how you’ll know if you succeed, and a realistic estimate of how much effort is required before you start and to update your goals and projected effort as the project unfolds.
The artificial neuron models that you are most likely to have encountered typically involve a simple sum of synaptic weights passed through a threshold or sigmoid function: yi = σ(∑j wi,j xj) where σ(t) = 1 / (1 + e-t). So called kinetic models such as Hodgkin-Huxley replace the simpler models in more sophisticated simulations such as Markram’s Blue Brain.
In a kinetic model, the input to a neuron is described by a current through the cell membrane that occurs when neurotransmitters cause the activation of ion channels in the cell. The model describes changes within the neuron in terms of an electrical circuit where biological components are modeled either as a capacitance — the cell membrane — or a conductance — the ion channels. This circuit is modeled as a (system of) partial differential equations that is solved to model the electrical behavior of the neuron over time. The output of the model is a change in voltage, corresponding to the difference in potential energy between the cell and the surrounding area which, under appropriate circumstances, results in a voltage spike that we call an action potential. The basic equations of the model are essentially linear with additional nonlinear components that determine the behavior of the ion channels as switches that turn on and off the ion-channel conductance properties.
More sophisticated models of neurons are able to account for the structure of axons and dendrites. Markram has contributed to the development of a number of such compartmental models and uses them in his cortical-column simulations. Compartmental models describe the dendritic arbor as a branching structure of cylinders and can be used to account for tree topologies with arbitrary branching and cylinder lengths. The response of a neuron to individual neurotransmitters can be modeled as an extension of the classical Hodgkin-Huxley model. Druckmann et al  model AMPA receptors which mediate fast excitatory currents and NMDA receptors which mediate slower currents. For more detail on Hodgkin-Huxley-type models see the tutorial on this EPFL web page.
In class I mentioned the dynamics of ion channels with conformational sub states and how the channel has to sequence through the sub states in a fixed order at it changes from the “on” state — allowing ions specific to the channel type to pass — to the “off” state — in which it prevent the passage of ions. The saga surrounding this discovery and the key role played by X-ray diffraction analysis span a fascinating chapter in biophysics and is well worth checking out in detail. Alas, I don’t have a definitive on-line history to recommend, but for a good technical introduction check out the relevant chapters in Bear, Connors and Paradiso  or take a look at this paper as an interesting entry into the literature. For detailed discussion, see Dayan and Abbott  or Kandel, Schwartz and Jessell .
When I suggested to Sebastian that saying most neuroscientists don’t understand mathematical or computational models wasn’t going to win him any friends and mentioned that the likes of Bruno Olshausen and Mike Lewicki surely understood such models, he countered “you have no idea that computational neuroscience is a vanishingly small part of neuroscience as a whole.” At the annual meeting of the Society of Neuroscience in 2011, there were nearly 35,000 attendees of which 25,000 were scientists — see here for the annual meeting attendance statistics. As you might imagine, it is a huge conference with scores of parallel sessions and a good deal of politics that takes place on the side.
I can empathize with Sebastian from my own experience in promoting the use of probability theory in artificial intelligence; today, probability in AI is common place, and every modern introductory text has multiple chapters devoted to algorithms and representations that make use of probabilities. But at least my audience in 1985 understood the central role of computation. The audience of peers in neuroscience that Sebastian faces are primarily scientists and physicians trained in medicine and the traditional fields of biology. I’m sure that the entrenched interests carefully guard their turf and in particular their influence on funding agencies such as NIH. That said, as mentioned in class, there are many working biophysicists — including Henry Markram — who spend their careers developing models like Hodgkin-Huxley and its successors that are indisputably very sophisticated mathematical and computational models, and, I might add, successful in that they accurately account for biological phenomena, if not those phenomena that are Sebastian’s central concern.
Here’s the note that I sent to Stephen Smith Tuesday in preparation for his visit and class discussion on Wednesday:
Your Scientific Computing and Imaging (SCI) Institute lecture at fits in well with the themes of the class and comes at a good point in the sequence of lectures and invited talks. For Monday’s class we watched Henry Markram’s invited talk at ISC last year and then read a couple of his papers on learning compartmental conductance-based models. We had a lively discussion in class about the prospects of his EPFL Blue Brain effort successfully modeling a chunk of cortex — and how one would go about measuring progress. In particular, we were somewhat skeptical of his team’s ability to nail down the parameters of these models, e.g., accounting for the variation and spatial distribution of ion channels on the cell membrane; intra- and extra-cellular diffusion rates with model compartments; refractory periods complicated by hysteresis related to ion-channel conformation changes, etc.
Your discussion concerning the possibility of signatures that encode pre- and post-synaptic proteins addresses some of these issues and could possibly help Markram in building accurate models. In class, we’re looking at a range of large-scale simulation efforts from those that seek to keep as close to the biology as possible like Markram’s to much simpler models like Izhikevich and Edelman’s use of leaky-integrate-and-fire neurons that are computationally more tractable but much further from the biology. Students in the class are curious if we will learn anything interesting about neural computation from these simpler models given that they depend on generic neuron models.
One question that will likely come up on Wednesday can be summarized as follows: “Perhaps the diversity in ion channels and synaptic proteins is just another example of nature hedging its bets against pathogens and random mutations — the greater the diversity of neurotransmitters and synapses the less dangerous a virus that attacks a specific pathway. Why can’t we just learn an ‘averaged’ model and then replicate it a billion times to build a brain?” The one-size-fits-all version of the proposal seems obviously doomed to fail, but there are many variants between the extremes of capturing all the diversity in a human brain and collapsing it into a single, averaged model. I know we’d all love to hear your views on this and related questions.
Following a series of announcments by IBM, DARPA and IBM research director and DARPA principal investigator Dharmendra Modha, Henry Markram wrote the following letter and sent it to IBM CTO Bernard Meyerson, with copies to many members of the media, including reporters from the UK Daily Mail, Die Zeit, Wired, Discover, Forbes, and IEEE Spectrum [Note: STDP stands for “spike-timing-dependent plasticity”]:Dear Bernie,
You told me you would string this guy up by the toes the last time Mohda made his stupid statement about simulating the mouse’s brain.
I thought that having gone through Blue Brain so carefully, journalists would be able to recognize that what IBM reported is a scam - no where near a cat-scale brain simulation, but somehow they are totally deceived by these incredible statements.
I am absolutely shocked at this announcement. Not because it is any kind of technical feat, but because of the mass deception of the public.
These are point neurons (missing 99.999% of the brain; no branches; no detailed ion channels; the simplest possible equation you can imagine to simulate a neuron, totally trivial synapses; and using the STDP learning rule I discovered in this way is also is a joke).
All these kinds of simulations are trivial and have been around for decades — simply called artificial neural network (ANN) simulations. We even stooped to doing these kinds of simulations as bench mark tests 4 years ago with 10’s of millions of such points before we bought the Blue Gene/L. If we (or anyone else) wanted to we could easily do this for a billion “points”, but we would certainly not call it a cat-scale simulation. It is really no big deal to simulate a billion points interacting if you have a big enough computer. The only step here is that they have at their disposal a big computer. For a grown up “researcher” to get excited because one can simulate billions of points interacting is ludicrous.
It is not even an innovation in simulation technology. You don’t need any special “C2 simulator”, this is just a hoax and a PR stunt. Most neural network simulators for parallel machines can can do this today. Nest, pNeuron, SPIKE, CSIM, etc, etc. all of them can do this! We could do the same simulation immediately, this very second by just loading up some network of points on such a machine, but it would just be a complete waste of time — and again, I would consider it shameful and unethical to call it a cat simulation.
This is light years away from a cat brain, not even close to an ants brain in complexity. It is highly unethical of Mohda to mislead the public in making people believe they have actually simulated a cat’s brain. Absolutely shocking.
There is no qualified neuroscientist on the planet that would agree that this is even close to a cat’s brain. I see he did not stop making such stupid statements after they claimed they simulated a mouse’s brain.
You should also ask Mohda where he got the notion of “reverse engineering” from, when he does not even know what it means — look the the models — this has nothing to do with reverse engineering. And mouse, rat, cat, primate, human — ask him where he took that from? Simply a PR stunt here to ride on Blue Brain.
That IBM and DARPA would support such deceptive announcements is even more shocking.
That the Bell prize would be awarded for such nonsense is beyond belief. I never realized that such trivial and unethical behavior would actually be rewarded. I would have expected an ethics committee to string this guy up by the toes.
I suppose it is up to me to let the “cat out of the bag” about this outright deception of the public.
Competition is great, but this is a disgrace and extremely harmful to the field. Obviously Mohda would like to claim he simulated the Human brain next — I really hope someone does some scientific and ethical checking up on this guy.
All the best,
For tomorrow, first watch Markram’s invited talk at the 2011 International Supercomputing Conference, then read the Druckmann et al  paper, and, finally, read the Khazen et al  paper. The calendar entry for Monday includes links for all three. Send your questions to Stephen Trusheim (firstname.lastname@example.org) and copy me. Then read the following missive which attempts to provides some perspective on what this class is about and what I expect you to learn during the quarter.
Some students really like to understand the big picture as it helps them to organize what they’ve learned and provides a way of focusing their attention and gauging progress. This course is primarily about what it takes to simulate a brain — where simulation is a proxy for understanding.
For many computer scientists, accurate simulation is the gold standard for deep understanding. Scale enters into it not only in terms of the act of simulation but also and most importantly in the development and verification of the underlying computational model. The model is the theory in our case and science advances following familiar steps: propose a model, collect data, evaluate the model, iterate.
Traditional theory formulation and evidence gathering has to be partially automated if we are to have any chance of understanding the brain. The problem of scalable model formulation motivates the application of machine learning. The problem of scalable evidence gathering motivates the application of computer vision to histology — cell body segmentation in digital micrographs — and robotics to electrophysiology — robot-controlled patch clamping in vivo.
The knowledge required to understand all of the science and technology covered in our discussions is vast and expanding exponentially. What you should take away from this class is an understanding of how the many pieces fit together and the opportunities that exist for you to learn essential skills and knowledge and contribute to this undertaking. For those of you in computer science, it should be apparent that there are many avenues open to you to make fundamental contributions from accelerating basic science to devising new models for public funding and participation.
I really did plan the syllabus. No, really. Well, sort of. I have to say that accommodating the speakers’ schedules did require a little creative reordering of the topics. And I admit that there are as many different opinions on the central themes as there are scientists in the associated disciplines. But there was some method to my madness. Here are the questions that I set out to explore and the readings and scientists I chose to sample from as a way of introducing you to the relevant science, technology and, as you might expect, controversy and diversity of opinion:
What are the biggest challenges? Infer the connectivity? Infer behaviour of neurons? — Movshon’s critique of Seung
Is it enough to read off the connectome and learn models for each type of neuron? — Seung’s case for connectome
How would you know if, say, a simulated model of cat cortex is an accurate model? — Modha’s and Izhikevich’s models
How detailed do the models have to be? How will you know when enough is enough? — Markram’s critique of Modha
How do you determine what features are required to build accurate neural models? — Smith’s diversity of synapses
How do you obtain the data necessary to learn neuron models? How does it scale? — Boyden’s robotic cell recording
Isn’t it possible we could be fooled into thinking we have it all figured out? — Meister’s retinal computation
What is an appropriate level of abstraction for large-scale neural computation? — DiCarlo’s ventral visual pathway model
Is it possible to quantify what we don’t know about, say, biological vision? — Olshausen’s the other 85% of V1
Why not treat the whole brain as a black box and learn an input-output function? — Gallant’s and Mitchell’s mind reading
What is an appropriate objective function to evaluate large-scale simulations? — Turing tests and robot soccer
If you apply yourself diligently to the readings and class discussions, you should come away from this course with more not fewer questions than you started. I’ve asked you to write down questions as they occur to you while reading the papers and watching the videos not because I expect to be able to answer them all, but rather because generating good questions is as much a part of doing good science as answering them.
Get a lab book and write down your questions, your theories, your ideas for experiments, algorithms, financial models, etc. Paper works fine for this but mobile voice-recorder and note-taker apps are almost as convenient and have the added benefit that you won’t lose your notes if you store your notes in the cloud.
Keeping a lab book is Practical Science 101 and I’ll bet not half of you have established such a discipline as part of your daily routine. As a starter exercise, do a little research to find out why this idea makes so much sense given our particular cognitive strengths and weaknesses. Then build a better app and make a million dollars.
The papers we read in class are examples of papers that a determined student with basic high-school science and math should be able to extract the essence from and write a paragraph summary. How could this be given that these are papers aimed at scientists and appearing in the best journals? The answer is clear when you take into account the multi-disciplinary nature of this field. Seung’s paper on learning to segment micrographs is relatively basic in its discussion of computer vision. Markram’s papers employ relatively simple machine learning techniques. It is to their advantage to make their work accessible to specialists in these related fields.
A decade ago your job would have required numerous trips to the library and a good deal of time searching for good introductions to obscure technical concepts. Using Wikipedia, Scholarpedia and other on-line sources, you can now quickly compile a dossier on a paper documenting the basic concepts and in no time produce a summary that suits your present needs and grasp of the relevant disciplines. This sort of wading into murky waters is not something that will necessarily occur less frequently as you learn more. There will be patches of water where you see past the surface and even an occasional glimpse of the bottom, but you’ll learn that the clear patches have nothing new to add and you’ll constantly be drawn to explore the murky depths — sorry for the protracted metaphor. Here’s an example of how you might wade into Druckmann et al :
What is a conductance-based model (CBM) and how have models like Hodgkin-Huxley been applied in simulations?
Why might it be the case that a CBM trained on ramp stimuli does not perform well when tested on step stimuli?
Having answered these questions to your satisfaction and assuming some basic statistics and machine learning, you could now probably write a good summary suitable for sharing with the rest of the students in class. You might even have some suggestions for Markram and his co-authors on how to improve their models.
The theme of Monday’s and Wednesday’s videos, papers and discussions was pretty clear, since they all emphasized connectomics in one way or another. Next week we begin with one of the boldest proposals yet for simulating neural circuitry: Henry Markram’s bid for a billion EURO award to simulate a human brain. He hasn’t yet reached his intermediate goal of simulating a single cortical column, and already he is pushing to fund a project orders of magnitude more complex.
Henry makes his case in his invited talk at the International Supercomputing Conference, and the two papers illustrate methods he’s using to build the constituent models. In the 2011 paper, he describes a method for learning neural dynamics. In the 2012 paper, he discuses an approach to accounting for the diversity of genetic pathways controlling the expression of proteins implementing ion channels.
In Wednesday’s video and paper, Steven Smith describes the incredible diversity of mammalian synapses, how this diversity relates to function, and why we — and perhaps Henry Markram — should care. As you read the paper and listen to the video, ask if the differences in synaptic signalling pathways really matter for large-scale neural simulations or do they simply average out in some “convenient and unified” fashion — to borrow Steven’s Cajal quotation. If the differences do matter, how hard would it be to build models that reflect this diversity in synapses. What questions might we ask Professor Smith to better understand the challenges facing Henry Markram’s team at EPFL and other efforts that we’ll explore later in the quarter.
Here are some comments that I passed along to Sebastian this morning. The comments are a mix of technical observations and strategies for making the case for connectomics:
I don’t think it is productive to claim that most neuroscientists don’t understand computational models, even if it is true for some version of “most”. The current generation of “computational” neuroscientists including Bruno Olshausen, Jim DiCarlo, Eero Simoncelli, Tai-sing Lee and Mike Lewicki certainly understand computational models. You might complain that their computational theories don’t achieve the right level of abstraction to explain the brain in terms that provide insight into how biology computes or help in diagnosing and treating disorders. They might counter that their models provide insights into what is being computed — the algorithm at some abstract level of inputs and outputs — if not exactly how — the implementation of that algorithm in the neural substrate.
I know from experience that using computer hardware analogies is fraught with opportunities for misunderstanding, but, at least for a computer science audience, I suggest that they think not of the computer on their desktop or even a graphics processor (GPU), but rather architectures like that of Thinking Machines’ Connection Machine with its 216 one-bit processors linked by a fixed-topology peer-to-peer network and more complicated routing hardware for longer distance communication. Still nothing comparable to the scale of a brain, but at least it makes clear the importance of the network in performing computation — writing this reminded me of Scott McNealy’s pithy “It’s the network, stupid.” Even among most computer scientists — and I’m revealing my broad-brush biases in this use of “most” — the algorithmic consequences of caches, memory bus width, the number and width of registers and SIMD lanes, etc are only dimly understood and it is really only hard-core engineers working “close-to-the-metal” who really get it.
Another issue pertinent to the case you are making concerns how large a block of contiguous tissue to scan and then segment. An analogy to the study of the web graph comes to mind. Early on physicists and theoretical computer scientists tried and failed to apply the theory of random graphs developed by Erdös and Reny [35, 34] to the study of the web graph. Then they developed the modern power-law, self-similarity, random-graph mathematics of Broder, Kleinberg, Raghavan, Upfal and others, which provided the basis for generating random graphs with characteristics of the web graph [47, 72, 15, 16, 69].
These families of random graphs are used to analyze — characterize asymptotics — and test graph algorithms for social networks and the world wide web graph as well as the spread of disease. The point being that it didn’t take too many full scrapes of the web to confirm that the new classes of random models were able to capture the characteristics of web and social networks well enough to build better algorithms and make predictions about pathological sub networks. The local branching properties — the in- and out-degree probability distributions — of individual vertices gave rise to global properties of the web graph.
One important caveat: my example concerns algorithms that compute functions of very large graphs, and not algorithms that run on a computational substrate that is best characterized as a very large (dynamic) graph. There may be properties of connectomes that cannot be well characterized as local properties of individual or even small cliques of vertices. For example, the neural equivalent of quantum “spooky action at a distance” in which unobservable forces such as diffuse signalling with blanket-broadcast neurotransmitters or rhythmic synchronization result in behaviors that can’t be accounted for by local connection patterns and the longer-distance connections we can observe using diffusion MRI.9 If anything, my comments regarding the web graph reinforce your case for producing connectomes of larger blocks of tissues and from multiple individuals and diverse organisms: we needed those early scrapes of the web to make substantive progress understanding the static web graph, and subsequent scrapes of the evolving web are driving the study of the dynamics of the web.
After class on Monday afternoon, Konstantin and I discussed the use of principal components as features for face recognition. The most relevant paper I could think of is the Eignefaces work of Matthew Turk and Sandy Pentland  (PDF). Their primary contribution was to take the mathematical framework developed by Kirby and Sirovich  and use it to implement a robust, real-time face recognition system, a variant of which comes packaged with OpenCV. The basic idea is that you use the eigen vectors, i.e., principal components computed using PCA or SVD (singular value decomposition), of a diverse set of face images to comprise a set of features so that any given face — not just those in the training data — can be realized as a linear combination of the features — see the excellent tutorial on PCA written by Jon Shlens, a neuroscientist who works for me at Google PDF.
There is also interesting follow-on work emphasizing view-dependent models out of Pentland’s lab  and interesting extensions to tracking human figures by my colleague at Brown University, Michael Black . Principal components is another example of a computational model that is not well known in the neuroscience community, unless you studied with someone like Eero Simoncelli, David Heeger or other scientists of similar pedigree. It is not such a stretch to imagine that brains compute principal components and perform some analog of Hilbert transforms to map from one vector space to another. If time permits in this quarter, perhaps I can get Jon Shlens to drop by and discuss the relevance of such mathematical models to neuroscience.
An example of a computational-mathematical principle that is well represented in neuroscience is contrastive (or divisive) normalization, to which both Heeger and Simoncelli have contributed [24, 127]. Another example of a principle that has not received the attention it deserves is rank correlation which measures the degree of similarity between two rankings. Rank-correlation-based metrics provide the basis for image descriptors that are more robust to small variations in pixel values, and that can be implemented with simple comparisons rather than more expensive dot products .
This may sound like fodder for a science-fiction story, but I wonder who is thinking about using some advanced form of 3-D printing to simultaneously scan a pickled neural tissue block and print a simulacrum on some sort of computational substrate intended to simulate the neural circuits. [Note: As it turns out, Chapter 15 of  “Save as ...” discusses this topic briefly.] In such a strategy, it wouldn’t be necessary to ascertain for certain whether one axon has an active synapse on another cell body; rather, proximity would serve as a proxy for a probability of connection. Realistically, it would be necessary to address the problem of heat dissipation in the 3-D computational substrate, but perhaps one could identify capillaries and use their networks to direct cooling. In any case, it sounds like something that Hans Moravec might have come up with [84, 83].
Also, the idea of assigning a probability of connection to each putative synapse is an obvious generalization of trying to infer a binary affinity matrix. I’m sure that researchers have considered this — I love Sebastian’s analogy about coming up with good ideas and finding and twenty-dollar bills on the sidewalk — and I started thinking about how one might assess such a probability. Jon Shlens is friends with a researcher working in Winfried Denk’s lab who told the story of how a bunch of the experts in the lab — expert not only in neurophysiology, but also in the detailed physics of the electron-microscopy technology used to image brain slices, frustrated with the performance of human annotators, tried segmenting a set of EM images by hand and were chagrined when they discovered significant disagreement among their individual annotations.
Here are some interesting excerpts from Sebastian’s new book  that relate to questions raised in our class discussion:
In “The Great Brain Mapping Debate”, Movshon and Seung discussed the issue of public funding in addition to issues pertaining to the scientific value of connectomics. Neuroscience is an expensive undertaking and public funding is relatively scarce given the many scientists seeking it. In his book, Sebastian cleared up one related point: “The 30M$ Human Connectome Project launched in 2010 by the National Institutes of Health (NIH) is only about regional connectomes and is not attempting to find neural connectomes.” — page 181 .
A natural question to ask with regard to the C. elegans connectome is just how regular is the pattern of connectivity, does it vary substantially among healthy individuals and if so how. While we don’t have whole connectomes for multiple organisms to make the necessary comparisons, Sebastian mentions one project that takes an interesting shortcut: “David Hall and Richard Russell took the shortcut of comparing partial connectomes from the tail ends of worms. They didn’t find a perfect match. If two neurons were connected by many synapses in one worm, in all likelihood they were also linked in another worm. But if two neurons were connected by a single synapse in one worm, there might be no synapse at all between them in another.” — page 203 .
Regarding the question of what good are connectomes, Sebastian offers the following suggestion which reminded me of the Human Genome Project’s strategy of sequencing the DNA of millions of people to identify the genetic markers for disease and then using simpler, targeted assays as part of routine diagnostic testing: “I argued earlier that microscopy of dead brains, with its high spatial resolution will be necessary for determining whether a brain disorder is caused by a connectopathy. The method might yield good science, but by itself it will be useless for medical diagnosis. That being said, once a connectopathy has been fully characterized by microscopy of dead brains, it should become easier to use diffusion MRI to diagnose it in living brains. In general, it’s easier to detect something if you know exactly what you are looking for.” — page 220 . “Essentially every neuron (along with its twin of the other side of the body) is its own type. If every neuron ends up requiring its own model, the total information in all models might exceed that in the connectome. So ‘You are your connectome’ woud be a terrible approximation for a worm, even though it might be almost perfect for us.” — see page 268 in .
The question of how memories are stored is central to neuroscience and a substantial amount of the effort has focused on visual memory, since the relevant stimuli are particularly easy to manipulate in a controlled setting. One long-standing question revolves around the question of whether complex objects are represented as compositions of simpler representations. Such part-whole representations are common in computer vision and neuroscientists are searching evidence of their encoding in several sub regions of the inferotemporal cortex (IT) — see, for example, the paper out of DiCarlo’s lab that Jim Mutch mentioned in class . In his book, Sebastian shows how connectomics might be combined with other technologies such as using optogenetics to “light up” active neurons so they can be seen by light microscopy — as opposed to electron microscopy used in generating dense connectomes — to help resolve this question: “The first step is to determine the functions of neurons in perception by measuring their spiking in response to various kinds of stimuli, as in the Jennifer Aniston experiment. This is done as described earlier, by staining the neurons that they blink when active, and observing the neurons through a light microscope. Then researchers image this particular chunk of the brain using an electron microscope to discover how the neurons are connected. Kevin Briggman and Moritz Helmstaedter have accomplished this this feat with retinal neurons working with Winfried Denk. Studies of neurons in the primary visual cortex have been performed by Davi Bock, Clay Reid and their collaborators. The approach, as it develops, will make it possible to see whether there are in fact connections between neurons that detect parts and wholes.” — page 198 .
I thought it might be worthwhile to briefly answer a few questions that have cropped up more than once either directly or indirectly in my conversations with students:
Where do proteins get manufactured in a neuron? — Most — but not all — proteins are manufactured in the cell body or soma. Manufacture in the cell body involves several cellular components in addition to the nucleus, including specialized manufacturies for protein folding and other post-transcriptional processes.
How do proteins get transported to distant axons? — Proteins called kinesins act like little molecular freight trains powered by ATP to transport large molecules — smaller molecules like glucose move around by diffusion — along molecular rails called microtubules to deliver their cargo to the axon.
How are signals transmitted across the synapses? — Multiple mechanisms including chemical — involving neurotransmitters that are released into the gap or cleft at the end of the axon terminal of the sending neuron, conveyed across the synaptic cleft by diffusion, and bind to receptors on dendrites of the receiving neuron — and electrical — utilizing specialized ion channels called gap junctions which span the synaptic cleft and allow for the direct transfer of ions between the sending and receiving cells.
How is information — state — stored in neurons? — Short-term state can be maintained by recurrent activation in an ensemble of neurons that comprise a temporary circuit. Long-term memory is not completely understood but is likely to involve genetic pathways in which certain proteins are expressed by organelles located in the axon terminal, and possibly by self-sustaining complexes of proteins or oligomers which have recently drawn the attention of scientists.
It is too late to be of much use this year, but, for future instantiations of this class, I will create a talk out of these slides (neurons, glia, resting and action potentials) and perhaps also these (synaptic transmission, neurotransmitters) and these (development, neurogenesis, learning, memory) — which I’ve used in the past to provide a quick introductions to various topics in neuroscience. I’ve suggested to a couple of students seeking a more in-depth understanding of the underlying biology that they read Chapters 2–7 and 23–25 of , material from which I adapted my 2010 lectures.
In one of Sebastian Seung’s papers discussing Lashley’s doctrine of equipotentiality, Sebastian, citing the experiments of Gerald Schneider and Mriganka Sur and his collaborators rewiring, respectively, hamsters and ferrets, suggests an interesting qualification: “a cortical area indeed has the potential to learn any function, but only if the necessary wiring with other brain regions exists.” Anatomically, it would be impossible for every cortical area to be wired to all the other cortical areas and all the subcortical areas for which some connection exists — the brain would swell beyond the capacity of a normal skull to contain it. In particular, not all areas — visual and auditory cortex for example — have connections to the cerebellum such as are available to prefrontal and motor cortex, and hence one wouldn’t expect these “functionally deprived” areas to be capable of performing computations — implementing algorithms — that require functionality derived from these extra-cortical areas.
How does this inform our discussion of the single-algorithm hypothesis? One might easily imagine an algorithm, e.g., the fast Fourier transform, that accepts a rather general input and provides a rather general output, thus allowing it work in concert with different auxiliary pre- and post-processing subroutines. Such an algorithm could also be specialized to call different subroutines as an intermediate step in its execution. Algorithmically, this is quite common — think of an “object-oriented” sorting algorithm that takes as one of its inputs a function which accepts two objects and returns “true” if the first object should be ordered before the second and “false” otherwise. If we allow for such flexible algorithms, then it is much easier — for me at least — to imagine a single algorithm that spans all cortical areas. The fact that the subroutine substantially alters the behavior of the general-purpose algorithm does not take away from fact that it is still a single algorithm.
There is an interesting new paper out of Henry Markram’s lab in which he claims to have analyzed expression data from ion-channel genes in several neocortical neuronal types and used it to develop a model which, given a neuron’s morphology, electrical behaviour and a few key genes that it expresses, is able to predict the expression patterns for the neuronal types with an accuracy of 87% — see Khazen et al . Markram claims somewhat grandly that this opens the door to “a world of predictive biology. We’ll discuss Markram’s large-scale cortical modeling effort later in the quarter, but this result addresses — note that I didn’t say “resolves” — some issues that other neuroscientists have raised concerning the ability of current knowledge and technology to unravel the complex tangle of neuronal signaling pathways.
Three messages relating to classes next week:
Most of the talks are now assigned hosts. I doubled up on two of the days. May 21 (Gallant and Mitchell) was a natural for sharing since the two approaches — out of Gallant’s lab at Berkeley and Mitchell’s at CMU — are substantially different. There are also two of you assigned to May 2 (Ed Boyden) since he has several very interesting projects going on his lab — I’ll finalize the papers for Ed in the next week.
Twelve of you replied with your choices for hosting talks and there are fifteen shown on Axess and so I assume that those of you who didn’t reply are either not taking the course for credit, taking it for less than full credit or are planning to talk with me at some point relatively soon. If not, April 25, Steven Smith, is still available and there are some additional possibilities for doubling up.
For Monday, you’re to watch “The Great Brain Mapping Debate” which is linked to the calendar of invited talks and send me questions and come prepared to discuss. For Wednesday there are two papers linked from the calendar that you are expected to read before class. Sebastian Seung will be joining us on Wednesday afternoon to discuss the papers and topics related to his debate with Tony Movhon. Those of you thinking about connectome-related projects should check out EyeWire. Send your questions for Wednesday’s class to both me and Sebastian’s “host” Juan Batiz-Benet:
Here are the current host assignments:
April 16, Monday: Anthony Movshon and Sebastian Seung — The Great Brain Mapping Debate —
April 18, Wednesday: Sebastian Seung — Connectomics — Juan Batiz-Benet
April 23, Monday: Henry Markram — Blue Brain — Stephen Trusheim
April 25, Wednesday: Stephen Smith — Proteomics — Kyunghee Kim
April 30, Monday: Eugene Izhikevich — Large-scale Cortical Model — Stacey Svetlichnaya
May 2, Wednesday: Ed Boyden — Robotics Single Cell Recording — Henryk Blasinski
May 7, Monday: Markus Meister — Retinal Computation — Sanjeev Satheesh
May 9, Wednesday: Dick Lyon — Hearing and the Auditory Cortex — Dan Huang
May 14, Monday: Dharmendra Modha — Reverse Engineering the Brain — Konstantin Bayandin
May 16, Wednesday: David Cox — Reverse Engineering the Visual System — Karl William Cobbe
May 21, Monday: Jack Gallant and Tom Mitchell — Predicting Brain Activity — Nicholas Borg (Gallant), Ben Poole (Mitchell)
May 23, Wednesday: Bruno Olshausen — Seeing and the Visual Cortex — Yan Largman
May 30, Wednesday: Dileep George — Cortical Micro Circuits — Daniel Selsam
Jim mentioned work from DiCarlo’s lab on invariants, temporal coherance, and coding in the inferotemporal cortex (IT). Check out the papers on DiCarlo’s publication page. I have pretty good luck grabbing the paper’s title and pasting it into the Google search box, e.g., from DiCarlo’s publication page I grabbed the title of his 2012 paper in Neuron, “How Does the Brain Solve Visual Object Recognition?” and found a PDF on a course page at UC San Diego. If you can’t find a PDF for a journal paper that you’re interested in, I’m sure you can get the paper using the Stanford Libraries online publication resources.
Here is a quick first attempt to drill down with respect to some of the questions and topics raised in Wednesday’s class. If you have additional questions or requests for related papers or labs working in a given area, please don’t hesitate to ask:
Models with lots of free parameters — from learning theory we know that such models can over-fit the data. With a rich enough model and a lot of free parameters you can use it to explain just about anything. Physicists don’t like theories with free parameters — think of Einstein’s regret in adding a cosmological constant to get his theory to agree with the — now believed erroneous — prediction of a stationary universe.
Temporal coherence — exploiting the fact that often the features that are most useful are the ones that are stable or persist over time — Wiskott and Sejnowski [131, 8] call this slow feature analysis and provide a mathematical model building on earlier work by Peter Földiák .
Neurogrid chips — “simulate one million neurons with two subcellular compartments each, a choice motivated by neurophysiological studies. Nonlinear interactions between projections that terminate in distinct cortical layers have been replicated in a pyramidal-cell model with just two compartments.”
Tradeoff between invariance and selectivity — Jim’s comment about “or” layers — which combine similar features to induce invariance — and “and” layers — which compose features to increase selectivity. In a decision tree used to define a class, “and” nodes are used to make the class more specific while “or” nodes are used to make it more general. The “standard model” assumes alternating layers of “simple” and “complex” cells implementing “and” and “or” functions respectively.
Value of diverse opinions — each of our invited speakers brings a different perspective to the problem of understanding the brain. We’re still in the early stages of our attempts to understand the principles underlying neural computation and it is important to have a diversity of viewpoints — even though blind men examining an elephant is the image that comes to mind.
Prion-like proteins implicated in long-term memory formation — this is the work out of Eric Kandel’s lab at Columbia that I mentioned to a couple of you before class on Wednesday. This research included the first description of a prion-like protein which, when switched into its self-perpetuating state — the defining characteristic of prion-like proteins, does not destroy the cell in which the protein resides — that is the self-perpetuating state is part of the normal function of the cell. According to Kandel, this self-perpetuating behavior is essential to long-term learning. Once a synapse is made, it takes work to maintain the connection. This particular protein translates messenger RNA at the synapse to maintain the machinery for local protein synthesis required to maintain the synapse. Discovered nearly a decade ago in Kandel’s lab, additional evidence can be fond in this 2010 article  in Cell and 2012 news release.
It is often enough to know that something is possible to spur engineers to build something. An analogous phenomenon occurs when a scientist learns that an organism is capable of some behavior and that provides a critical piece of the puzzle yielding a new theory or insight into some biological mechanism. If what you want to learn is the principles governing biological computation, then results from psychophysics and behavioral neuroscience are often just the ticket to constrain your theories and suggest experiments to test them. Jim Dicarlo makes a similar argument in his recent paper in Neuron  (PDF).
The very idea that humans can learn to identify thousands of images from the briefest of glances has changed how we think about vision and the papers I mentioned in class, e.g., [51, 13, 9, 108], are interesting for just this reason. There are lots of reasons why hashing has not become part of the repertoire of computational neuroscientists and most of the reasons are due to biases in the way neuroscience is taught. I believe that hashing provides a useful computational perspective for explaining some of the findings in the fore mentioned papers, but you won’t find a lot of support for this in the literature and I doubt that you’ll find any neural circuit models. But that actually makes it more interesting to pursue because you might be able to break new ground.
Locality sensitive hashing (LSH) provides a mechanism for learning invariants, another theme that came up in our discussion with Jim and a possible way of explaining both “interference” and our ability to usefully categorize and group similar scenes.10 I love mechanisms and particularly those couched in the language of mathematics and physics, but I’ve learned to appreciate how useful behavioural experiments can be in making progress on the difficult problem of understanding the brain — in general, the more constraints one can bring to bear, the more we are come up with something useful.
20% of your grade is for class participation and the most significant way I want you to participate is by “hosting” one of the lectures. By “hosting”, I mean that you will be responsible for carefully reading the associated paper or papers, watching the video, scanning the slides and then vetting questions posed by your fellow students, serving as a moderator during the class discussion, and helping to follow up on more in-depth questions and possible projects. I don’t expect you to know everything about the content of the papers, videos and slides; this is supposed to be a learning experience. Quite the contrary, I’ll work with you to navigate the related work and think about the issues in the most fruitful way. In some cases, I’ll have one of my neuroscience colleagues help out with the necessary background. Both Monday and Wednesday lectures will require hosts even though only the latter will have invited speakers with whom we’ll have the opportunity to directly engage and discuss their work. There are still some slots to fill in the calendar and I’m thinking about one or two additional invited speakers, but in the mean time I want you all to volunteer for two lectures by sending me your first and second choices by tomorrow (Friday) noon. Monday’s lecture is obviously pretty soon, but I’ve already provided some of the necessary background in the class notes. Of course, I expect every one of you to read the papers, watch the videos and come up with good discussion questions for all the lectures. This isn’t meant to require a lot of extra work on your part. Hopefully, the opportunity to work with me in understanding the lecture content will help you learn to navigate in this complex, fast-changing and multi-disciplinary field.
In previous incarnations of this course, I started with a series of lectures based on the early chapters in Bear, Connor and Paradiso  which is listed on the course home page. The slides are available in the CS379C 2010 course archives. Bear, Connors and Paradiso is very good as an introductory text and I recommend it highly. You can probably take out a copy from the library but get the latest edition that they have. The reason I abandoned the introductory lectures is that they were both too much and too little. Even if you’ve already taken a couple of years of biology and neuroscience you’d still constantly run into technical details in papers that you would find baffling. This time around I’m taking a learn-and-ask-questions-as-you-go attitude; the hope is that by jumping right to the hard problems, breaking technologies and new results you’ll get a broad understanding of the challenges facing the field and will be better prepared and better motivated to fill in the gaps later on.
Neuroscience is already a “big” field and it’s changing and growing by leaps and bounds. At some point during your tenure at Stanford, I recommend taking some introductory neuroscience or working your way through Bear, Connors and Paradiso, but you’ll still only be scratching the surface. Given my experience, you’re better off spending the lion’s share of your undergraduate career acquiring basic knowledge and skills like statistics, computer science (machine learning in particular) and programming that you can apply to a variety of problems including neuroscience. That knowledge and those skills won’t be out of date by the time you graduate as would much of what you learned about the brain. At the same time, soak up all you can about neuroscience — molecular, genetic, cellular, cognitive, evolutionary — and when you graduate consider either getting an advanced degree in neuroscience or working with one of the many companies that are trying to make money from our growing understanding of the brain; I predict that your math, CS and programming experience will be more relevant to those jobs than an undergraduate degree in neuroscience and they’ll be happy to take you on with your portable knowledge and skills and let you learn as you go.
The slides and video for today’s class are now available. There are two videos for class discussion. Each video is about 30 minutes and so the total time is about the same length as the earlier talks. It appears that Jim pulled an all-nighter updating CNS and creating slides and video for today’s class. The slides for the first video are here. This an updated version of the slide deck that was linked to the calendar earlier. The updated version includes slides on a package that implements temporal models and is useful for processing video. The first video in which Jim describes the Cortical Neural Simulator (CNS) and its features and applications is here. The second video in which he shows how to install and work with CNS is here. Enjoy.
What is today’s lecture about in three sentences or less? The take-home message, the elevator pitch.
GPUs provide (lots of) SIMD parallelism which can considerably accelerate the “linear filters plus non-linear pooling” class of biological models — see Sebastian Seung’s comments in the Great Brain Mapping Debate.
CNS adds a layer of abstraction that reinforces the metaphors underlying this class of models making it relatively easy to implement such models and obtain a 10–20-fold speedup over optimized C code.
Metaphors both expand and constrain our thinking. What if we were to substitute hashing for computing dot products which is the computation at the core of matrix multiplication and convolving images with linear filters.
CNS is clearly well-suited to a variety of tasks in computer vision. However, given that the architecture is rather general, it would also seem to be well suited to modeling many other portions of the brain as well. To what extent has this been attempted? More to the point, are there many other problems outside the field of computer vision in which CNS has proven useful? If so, which have had the most success? If not, what are the most significant reasons?
The HMAX package can be used to implement the Riesenhuber and Poggio “standard model” for arbitrary cortical functions. However, as far as I know, it has been only applied to modeling sensory cortex, but you might ask Jim about this on Wednesday.
The Hodgkin-Huxley package defines several types of spiking cells using Hodgkin-Huxley dynamics and in principle could be used to model all sorts of neural models including those developed by Gerald Edelman, Eugene Izhikevich and their collaborators at the Neurosciences Institute and Brain Corporation.
CNS achieves its speed by using GPUs for SIMD parallelism and while lots of neural models benefit from this sort of parallelism it depends on the programmer to exploit locality in memory and is limited practically speaking to what can fit in the GPU’s fast GDDR memory.
In principle you could scale the computations to run on many GPUs or on a large cluster of machines. This is basically what Evolved Machines does to scale their neural models. Paul Rhodes of Evolved Machines buys a lot of GPUs from Nvidia as does David Cox at Harvard.
What’s a good example of the phenomenon of interference?
The article on interference theory listed in the resources for today’s class is a good starting point to explore the many aspects of interference in psychology and cognitive science. Note that “priming”, the role of “context”, and the use of a “distractor” task are related concepts in psychology.
Do a search using the keywords “interference visual recognition” to find examples and research papers on manifestations of interference involving visual stimuli. Note that often interference involves multiple modalities as in the case in which a word associated with one sort of visual category is paired with a different or contradictory visual stimuli.
Prior exposure to a degraded signal can delay subsequent recognition when the pristine signal is presented. See for example this paper by Jerome Bruner and Mary Potter which describes how priming with a blurry image can cause delayed recognition of the same image when presented in focus.
On Saturday morning, as I prepared my lecture for Monday, I realized how much technical jargon I take for granted in talking about parallel computing with my graduate students and colleagues at Google and Stanford. I also realized that most of the practical lessons I’ve learned and can reasonably convey to you in the relatively short span of a single lecture can best be summarized in terms of memorable anecdotes. And to make it easier for you to keep track of all the jargon, I’ve created a bulletized listing of relevant terms linked to descriptive web pages and listed in the order I’ll introduce them in my lecture. I’ll also be talking about several programming languages and extensions that I use on a regular basis, and, for illustration purposes, I’ve provided examples of programs written in each language listed here and ordered left-to-right decreasing in their difficulty of mastery and increasing in their simplicity of use and appropriateness for CS379C projects: SSE, CUDA, Eigen, Jacket, and CNS. That’s a complicated way of saying that Jacket and CNS will give you the most bang-for-your-buck in accelerating your Matlab programs for CS379C projects, and, if instead you choose to hack in C++ and utilize a library like OpenCV, then Eigen would be a good choice if you want to accelerate your code.
Lesson: Some folks never learn
Connection Machine — in particular the CM-2 manufactured by Thinking Machines Corporation with 65,536 single-bit processors.
Optical flow — measure the apparent motion of objects and edges in visual scene — used in robotics to calculate time to collision.
Bülthoff et al  implementation on the CM-2 — state-of-the-art in real-time optical flow — 10 seconds per frame on a CM-5 — a CM-5 cost $1.47M.
Camus and Bülthoff  implementation in C++ — state-of-the-art in real-time optical flow — 10 frames per second on an Intel-powered PC that cost less than $1K.
Lesson: Don’t bet against Intel
Amdahl’s Law — speedup on a machine with multiple-processors is bounded by the sequential part of the algorithm.
Moore’s Law — number of transistors that can be placed on an integrated circuit doubles approximately every two years.
Lee et al  — myth of the 100-fold speedup — reinforcing the lesson about betting against Intel and underscoring the importance of Amdahl’s law.
Lesson: Don’t bet against Amdahl
Cloud computing — clusters of networked computers accessible through the web — commodity computers are the key to scalability.
Network interface controller (NIC) — Gigabit Ethernet is now pretty common and by using optical fiber, fast switching, multiple NICs and specialized hardware like Infiniband you can push that up to 100 Gigabits per second.
Message passing interface (MPI) — standard portable message-passing protocol for parallel programming — MPI provides synchronization and communication primitives for distributed computing — as an alternative for some applications you can get away with using the Remote Procedure Call Protocol (RPC) which allows a computer program to cause a subroutine or procedure to execute in another address space — we’re talking about low-level, “close-to-the-metal” programming here, think in terms of having to keep in mind the difference between big and little endian addressing.
Lesson: Don’t bet against Intel
Embarrassingly parallel problem — one for which little or no effort is required to separate the problem into a number of parallel tasks.
MapReduce — a functional programming term now used almost exclusively to refer to a framework used to instantiate a form a bulk synchronous parallel computing where the “mapper” implements the concurrent-computation step, e.g., compute the term-frequency vector for each document in a large collection of documents, and the “reducer” implements the communication and barrier-synchronization steps, e.g., generate an inverted index to store and efficiently retrieve all the documents — Dean and Ghemawat  — advances in “compute” often underscore weaknesses in “memory” — users of MapReduce quickly come to appreciate the importance of an industrial-strength file system [46, 25] for sharding very large data and Google engineers are expert in trading time for space to reduce latency.
Lesson: Don’t bet against Amdahl
Vector processor — implements an instruction set containing instructions that operate on one-dimensional arrays of data called vectors — with the exception of graphics and gaming this sort of computing was primarily the purview of high-performance numerical “number crunching” of the sort carried out for military (weapons-yield models), climatological (weather prediction), geological (oil and gas drilling) and financial (derivative valuation) applications.
Basic Linear Algebra Subprograms (BLAS) — linear algebra and in particular vector and matrix operations — architecture-specific support available in proprietary libraries from AMD and Intel that exploit parallelism using multiple cores and vector processing units such as MMX and AVX on Intel chips and take advantage of other architecture-specific hardware such as cache hierarchies.
Streaming SIMD extensions (SSE) — instruction set designed for Intel processors to support Single Instruction Multiple Data (SIMD) parallelism and subsequently standardized and extended to support additional operators and operands — SSE is critical to building fast matrix operations on commodity hardware and the Eigen C++ template library is a painless and open-source method for taking advantage of SSE and the associated hardware on AMD and Intel chips.
Lesson: Don’t bet against Intel
Graphics processing units (GPU) — acceleration for computations involving graphics much of which is highly parallelizable — OpenGL is the most common programming interface — historically the cost of hardware has been driven down by the gaming industry and the low cost of the hardware encouraged hackers to apply the hardware for other purposes including computer vision and scientific computing.
Compute Unified Device Architecture (CUDA) — proprietary parallel programming extensions for C++ developed by Nvidia — the open-source alternative OpenCL has not gained much traction and so there is as yet no cross-platform standard that has been widely adopted — on mobile devices OpenGL is often employed and there is some movement toward LLVM-wrapped compute (Low Level Virtual Machine) that would standardize across a range of GPU and DSP (Digital Signal Processor) hardware.
General-purpose computing on graphics processing (GP-GPU) — the application of graphics processing hardware to perform computations traditionally performed by conventional central processing units (CPU) — Nvidia has been the most aggressive in pushing this approach and Intel and AMD see it as a challenge to multi-core and many-core computing in terms of programmability, price, and performance including power measured floating point operations per second per watt — the latest edition of Hennessy and Patterson  includes a section on GP-GPU hardware in the chapter on vector processing11
Lesson: Don’t bet against Intel unless you’re Jen-Hsun Huang
As the preponderance of computing has moved from the defense department to large organizations to global data centers that we all depend upon, different markets have shaped the design of the hardware that provides our computing cycles. From code breaking and ballistics tables to on-line shopping, different classes of computation have dominated the industry, but I’m going to try to ignore the nuances and sketch out the evolution of the field as a series of adaptations. One theme that is constant throughout concerns how advances in “memory” and “compute” have driven innovation; a storage technology that gets data to the computing hardware faster creates opportunities for computing technology that processes the data faster and visa versa.
Most early stored-program machines had simple central processing units that serially fetched data and instructions from memory and then executed the instructions one at a time storing the results back in memory. To speed things up engineers developed a memory hierarchy where each level in the hierarchy allowed faster loads and stores than the earlier level, and, in addition to developing faster disks, tapes, tubes and transistors to expedite serial transfers, they widened the pathways between levels in the hierarchy to allow for parallel transfers. As memory performance improved, the central processing units became the bottleneck and these were made to allow for parallel operation creating by special-purpose hardware to handle expensive scalar operations like multiplications and divides and highly-parallel vector operations like dot products. To keep this special-purpose hardware fully occupied, circuitry was added to “prefetch” data and perform the steps in a stored program “out of order” so as not to get held up waiting for registers to be loaded from slower parts of the memory hierarchy or more expensive computations to finish.
All of this extra circuitry added to the cost of processors, caused them to take up more space on the silicon die, produced more heat that needed to be dissipated and required more power to operate. Modern vector units such as the Tesla and Fermi line of Nvidia GPUs reduce transistor count and power requirements in part by using simpler “in order” processors that are not as flexible in handling flow of control but make up for it by requiring the compiler to keep the data and instruction pipelines full. Nvidia hardware is able to handle thousands of light-weight threads and schedule them in “warps” that are ready to go in terms of all their associated data having been preloaded into fast registers using parallel “coalesced” loads.
For problems that fit this computational model, execution can be very fast, but the same themes apply as in the case of more conventional processors, namely that memory and compute have to work together closely and you can’t avoid Amdahl’s law — the inherently serial part of the computation often dominates the running time. Smarter compilers, out-of-order processors, clever prefetching and decades of additional improvements have not only shaped the hardware, they have also shaped the programmers and the languages and tools they use to develop code. Just as our understanding of neural circuitry is shaped by extant instances of computing hardware — the computation model du jour as it were, so too current computing hardware biases the way we think about programming and favors the status quo.
Elsewhere I’ve suggested that hashing — constant expected-time storage and retrieval — might he a good model for explaining how we accomplish seemingly miraculous feats of perceptual recognition and recall. One reason why hashing hasn’t received more serious attention is that it isn’t taught in most engineering schools and even gets short shrift in most computer science departments though hash tables are frequently used in Java and C++ libraries for building dictionaries and lookup tables. Another reason could be that the most common hash functions involve a modulo operation which seems distinctly “non-biological”, but, as I pointed out in class, dot products are equally suspicious if we’re going to get picky about what’s biological plausible.
I was struck in reading Seung’s account of how memories are assembled by his use of computer metaphors — see pages 91–93 in . He suggests that short-term, volatile memory is like (dynamic) RAM requiring continuous activation (and power) to maintain, thus invoking the reverberating-activity theory of Rafael Loente de Nó. On the other hand, he opines that long-term memory is more like a hard drive and is consolidated in the form of stable synaptic circuits, invoking a form of neural Darwinism — attributed to Gerald Edelman and Jean-Pierre Changeux among others — to explain how arbitrary collections of neurons might be wired to together to form these consolidations. In explaining Stephen Grossberg’s stability-plasticity dilemma, Seung returns to the volatile RAM versus stable hard drive analogy suggesting that “patterns of spiking neurons are only useful for retaining information over short periods of time” and because synaptic connections change more slowly than spiking patterns “they are less suitable to active manipulation of information.”
Intel and Nvidia have been fighting for some time about the best way to support SIMD — Single Instruction Multiple Data — parallelism. For a while, Intel was pushing a new hardware architecture called Larrabee and a supporting toolchain — compilers, debuggers, profilers, run-time environments — based on a language called Ct — the “C” stands for the C programming language and the “t” for throughput as in “high throughput computing.” Larrabee was cancelled and the architecture resurfaced first as “Knights Ferry” and now “Knights Corner” and Intel is trying to attract market share away from Nvidia.
According to a recent article, Intel claims that the “Knights Corner coprocessor will deliver the performance of a GPU without the challenges of a having to adopt a new programming model — CUDA OpenCL, or whatever. And since the MIC architecture is x86-based (essentially simple Pentium cores glued to extra wide vector units), developing Knights Corner applications will not be that different than programming a multicore Xeon CPU”, i.e., you’ll be able to port all your old code for free! MIC refers to Intel’s Many Integrated Core architecture.
Nvidia counters that the problem is the cores in Nvidia GPUs and Intel hardware are very different. Intel achieves ease of porting by using what are essentially scaled down x86 cores — highly compatible with existing code but requiring a lot transistors and hence there are fewer (on the order of hundreds) such cores on the Intel devices — while Nvidia GPUs use many more (on the order of thousands) simple cores that use only a fraction of the power required by the x86 cores.
In Monday’s lecture, we’ll explore the issues in somewhat more detail, but if you want to delve a little deeper take a look at the Wikipedia pages on in-order and out-of-order processors and think about the importance of caches and wide, high-speed memory buses in high-performance computing.
The first day our class met there was a very interesting public debate held at Columbia University: The Great Brain Mapping Debate starring Sebastian Seung and Anthony Movshon, hosted by Robert Krulwich and Carl Zimmer. Luckily it was posted on YOUTUBE and NPR and I got to watch it last night. It was so good and so reflected the issues explored in this course that I’ve decided to assign it for one of our Monday classes. I’ll provide some highlights in this post.
There was a great discussion of retinal models in which Tony and Sebastian talked about the difference between linear and non-linear mathematical models Tony provided a nice summary description of computation in the retina and mentioned in passing some of the recent work on retina which has dramatically changed how we think about this — see Markus Meister’s lectures on retinal computation, e.g., his Heller Lecture and Allen Institute Lecture, and Dick Masland’s work on retinal ganglion cells.12 Sebastian came back and claimed that that neuroscientists have trouble with non-linear models and, citing the degree to which vision has influenced the rest of neuroscience, suggested that the influence of linear systems theory still dominated the mindsets of many neuroscientists. Agreeing with Tony, he admitted that people like Meister and Masland have broken away from the simple “filter” model, and suggested that perhaps the retina is not the most compelling application of connectomics with his pithy comment that the retina is the “invertebrate part of mammalian model” — an insider joke for neuroscientists which I’ll explain presently.
There was also an interesting exchange about so-called Jennifer Aniston cells and a discussion of Hebbian learning and distributed coding — how partial stimuli can evoke a full-blown probe into episodic memory — which reminded me of yet another computation meme “holographic memory” which has infected the populace. Sebastian said that a feature detector for Jennifer Aniston is definitely not likely to be a simple linear filter and brought up the example of a songbird’s memory of its various mating and signalling songs. Tony had an interesting comment about cellular switches — yet another non-linearity — as an alternative to making new connections in service to storing memories.
Sebastian brought up David Marr and I can fill in some of the background in class but the most telling comment is that Marr’s comment that you can separate algorithms and implementations — which is certain true and important in the case of some sorts of scientific inquiry — still haunts his home institution MIT. In this regard, Tony mentioned a new paper out in Nature Neuroscience by Matteo Carandini  which provides what Tony characterizes a modern interpretation of Marr. Rather than paraphrase Movshon, I’ve included the abstract for Carandini’s paper below:
Neuroscience seeks to understand how neural circuits lead to behavior. However, the gap between circuits and behavior is too wide. An intermediate level is one of neural computations, which occur in individual neurons and populations of neurons. Some computations seem to be canonical: repeated and combined in different ways across the brain. To understand neural computations, we must record from a myriad of neurons in multiple brain regions. Understanding computation guides research in the underlying circuits and provides a language for theories of behavior. — published in Nature NeuroscienceTony made an interesting comment — alluding to Sebastian’s insider joke for neuroscientists — that invertebrates — like marine worms, referring to C. elegans — don’t need spiking neurons which are essential in larger brains for efficient coding and communication across greater distances.
The the debate then turned to the politics and economics of science. Tony offered that “giga” science projects may not be best use of scarce public funding and suggested that currently neuroscience is a “cottage industry” that revels in hypothesis-driven exploration. Both scientists were polite but you might have noticed their thinly veiled skepticism when Henry Markram’s Blue Brain project was mentioned. Movshon suggested the analogy to the huge amounts of money that were poured into the Human Genome Project and seemed to imply that the same outcome might have been accomplished with less government direction and more individual initiative. If time permits we’ll pursue this a bit, when we discuss the subject in class.
For more on what I called “immediate recognition” check out the papers of Simon Thorpe and of Helene Intraub. Thorpe’s  is a classic and Intraub’s seminal work is widely quoted and cited, e.g., Intraub  (PDF). Tommy Poggio and Max Riesenhuber and their students — especially Thomas Serre — have developed a “standard model” of immediate recognition in the cortex — see [97, 102, 103] — which was considerably improved by Jim Mutch and David Lowe  when Jim was at UBC before joining Poggio’s lab at MIT. You’ll have an opportunity to ask Jim questions about this work when he joins us for a class discussion next Wednesday. You might also check out some of the project pages from Riesenhuber’s lab at Georgetown.
I sent a note to Tom Serre asking about one of his papers and he reminded me of some relatively recent work out of Aude Oliva’s at MIT lab relating to the information capacity of pictorial memory — see Brady et al  — which builds on earlier work by Standing  and Potter and Faulconer . He also suggested that Biederman et al  might be a better reference than Intraub  with respect to the RSVP (Rapid Serial Visual Presentation) body of work which also builds on the work of Potter and her students. In tracking down these references, I also stumbled across a more recent paper out of Biederman’s lab — see Subramaniam  — which provides a neural interpretation of the RSVP results in terms of activity in inferior temporal cortex. I asked Aude about her work and she pointed out that the 2008 paper  tests long term visual memory, but the presentation time was 3 seconds. That paper has associated web resources available here. Greene and Oliva  (PDF) is the best paper I’ve found describing what we know about recall for sub-100ms exposure intervals.
In his new book , Sebastian Seung has a chapter entitled “To Freeze or To Pickle” in which he describes techniques for preserving neural tissue. He discusses the field of cryogenics, the field of cryobiology and so-called “life-extension” companies and foundations like Alcor. There is an interesting discussion of the advantages of slow freezing versus fast freezing and alternatives which cool cells under special conditions that turn liquid water into an exotic state of the matter which is said to be “glassy” or “vitrified”. In this state, water is solid but not crystalline. Its molecules remain disorganized unlike the orderly lattices of molecules found in ice crystals.
Turning to the “pickling” options he considers Eric Drexler’s proposal to preserve brains by chemical means. The basic idea is not a new one and involved a process called plastination which has been used for multiple purposes but has long been used to prepare tissue samples for electron microscopy. The goal is to leave every cellular detail intact right down to the molecular level and it typically involves multiple steps.
First they use traditional fixatives like formaldehyde which are delivered to the cells by circulating them through the blood vessels which serve to reinforce the bonds between molecules that comprise cells. Then the water in the brain is replaced by alcohol which is subsequently replace by an epoxy resin which is hardened in an oven. The resulting plastic block is hard enough that it can be cut with a diamond knife — called an ultramicrotome — into extremely thin slices — a typical thickness of these cuts for imaging by electron microscopy is around 50 nanometers. Seung’s chapter entitled “Seeing is Believing” provides a good summary of history and science leading to the techniques applied in creating connectomes.
Here’s a starting point  to explore electro-mechanical robots controlled by neural tissue — apparently these devices have come to be called hybrots for “hybrid robots” — you can find a PDF version here. Use CiteSeer, Google Scholar, Mendeley or one of the other research-publication search engines to search backward from papers cited in this paper and forward to papers that cite this paper to widen your search and find more recent experiments along the same lines. A good first step is to type “neurally controlled robot” into Google Scholar. One of the large-scale cortical simulation projects we’ll be looking at — Izhikevich and Edelman  — also conducted a “practical experiment” with a soccer-playing robots controlled by simulated neural networks (PDF).
If you’re interested in robotics you might also consider the topic of navigation, spatial reasoning and learning mazes. The hippocampus plays many roles in learning but it is central learning about the space we inhabit. Hippocampus is particular interesting as it is — perhaps has to be — highly plastic to deal with our fluid conception of space. Just the simple problem of following a tunnel or walking down corridors in a building and figuring out that the path you’ve taken has turned back on itself is a problem in topology. And of course the generalization of a retinotopic map is a topographic map which is likely an indication of our human difficulty with metric spaces.
Regarding class participation, yesterday’s discussion was OK but I felt that many of you didn’t get an opportunity to ask your questions or felt constrained or shy for some reason. I expect we’ll all relax more as we get more familiar with having these discussions, but you should feel free — indeed encouraged — to write down any questions that come up when you’re reading the papers or watching the videos and send them along to me in advance of the class discussion. This will give me a heads up regarding topics that interest you and I can do some background research if needed to be better prepared for class.
This whole “flipped” style of teaching has its advantages and disadvantages and we’ll have to put some effort to making it work for us. I’m trying to make this class a model for creating a learning experience offering Stanford students direct access to extraordinary scientists like Seung, Boyden, Olshausen and the rest of our invited speakers by leveraging the Stanford name to entice them to participate. Your suggestions on how to do this most effectively are welcome.
There was a lot in the reading and lecture for Wednesday. The three hypotheses were meant as a whirlwind tour of computational neuroscience, how theories / hypotheses stand up under the scrutiny of evidence and the healthy skepticism of invested scientists. It’s worth pointing out that the three hypotheses have been around for a long time. Some form of the modular minds hypothesis has been around since the Greeks, Mountcastle wasn’t the first to suggest that cortex is algorithmically homogeneous, and Darwin and Huxley anticipated Sapolsky by more than a hundred years. The paper and lectures attempt to weave together many threads from diverse disciplines to produce a coherent synthesis, and, yes, to answer one student’s question, the hypotheses are complementary and deeply intertwined, especially in our reformulations.
The modularity hypothesis is indeed received wisdom in AI, but it is anything but in neuroscience where Fodor’s formulation has been stretched to the point it is unrecognizable. Computer scientists are fond of Fodor-style modularity and neuroscientists don’t understand the difference between algorithms and implementations; in each case, their biases lead them to faulty assumptions about biological computation. The concept of scaling is familiar to both engineers and biologists though realized very differently in artifacts and organisms; we offer a resolution of the different perspectives which relies on genetic modularity to achieve computational scalability of the cortex. We posit increasing the depth of combinatorial circuits is the most important advantage of scaling the cortex, not the cortical column and not the sort of deep belief networks suggested by modern proponents of the single-algorithm hypothesis. In the paper, we gave combinatorial circuits short shrift, perhaps assuming more computer science than warranted; it may be that we need to include a short primer regarding circuits as a computational model.
In giving talks to mixed-discipline audiences, I have a habit of concentrating most on those areas I am least confident in, and concentrating the least on areas that I and my usual audiences — typically computer scientists — are most comfortable with. A fair number of the neuroscientists I’ve had the chance to ask don’t really understand how an algorithm differs from its implementation or why one would care, and having some appreciation for complexity theory is even more rare. Indeed, I am finding that even many computer scientists are not familiar with the basic concepts from circuit complexity. Prior to the pioneering work of Steven Cook and Richard Karp on NP-completeness, much of complexity theory was based on combinatorial circuits as a computational model rather than Turing machines. Thinking about the size (number of gates) and depth (number of layers) of circuits makes it relatively easy to analyze algorithms and problem classes in terms of space-time tradeoffs. Combinatorial circuits are also well suited to modeling biological computation as von Neumann, McCulloch and Pitts and others were quick to notice.
Parallelism is explicit in the topology of circuits and the basic units of computation, Boolean logic gates in this case, are closer to neurons than infinite tapes, registers, read-write heads, etc. Even the way in which gates are wired together is vaguely brain like. Leslie Valiant — who among his many accomplishments made fundamental contributions to circuit complexity — realized this and he based his neuroidal models on combinatorial circuits. He also recognized the ways in which brains were not like logic circuits: the obvious differences between Boolean circuits and spiking neurons but also the difficulty that real neural networks have in maintaining state which, in terms of circuits, corresponds to the intermediate products of computation at the outputs of the gates in a given layer. Valiant spends a good deal of his effort in showing how nature might overcome these limitations in order to build combinatorial circuits that implement complex functions [37, 125, 124].
The capacity for deeper combinatorial circuits — or at least deeper hierarchical representations — is inherent in the structure of primary sensory cortex, but it’s not clear to me that this capacity is fully utilized in the sensory cortex. Shallow representations suffice and, one could even argue, are essential for many behaviors that require quick response. Primary cortex seems to have mastered trading space for time and getting the most out of the parallelism that the cortical substrate supports.
Language, logic, planning, recursive theory-of-mind reasoning all play out on a very different time scale measured in seconds rather than tens of milliseconds. In these cases, nature can afford to build deeper combinatorial circuits to carry out the requisite inference in which conditionals and branching are common. This sort of inference would really require keeping track of the order of events and the enfolding of complicated causally-gated processes. Perhaps cortex was originally designed to support fast, feed-forward inference, with feedback at some lag to handle context and completion. But nature found other uses for these layered reentrant neural structures and by adding machinery to sustain the intermediate products of computation, it evolved to be able to perform complex combinatorial logic. Moreover, perhaps this capability found its greatest utility in the language and executive control areas in the frontal cortex where we see the most significant quantitative differences between humans and great apes.
Here are some themes that have occupied computational neuroscientists over the last two decades, have yet to be resolved satisfactorily, engender heated debate, and will likely require new methods for observing the brain in order to clarify and resolve: These were mentioned in the paper and video but will reappear frequently in the following weeks. The citations represent a diverse sampling of the literature since in all cases there is no definitive work in computational neuroscience and the applications of these ideas to explain neural function are all highly speculative:
EyeWire provides data and support for individuals interested in contributing to the effort to create a retinal connectome. CNS has a package developed by one of Sebastian Seung’s students for learning to segment cell bodies in 3-D volumes.
I mentioned in class yesterday that I often write out what I’m going to say right before giving a lecture. It is a coping strategy for public speaking and usually served to calm my pre-class agitation. The written document is hardly ever of any subsequent use since invariably I completely ignore it and extemporize. The few occasions where I have actually used my prepared script have tended to be disastrous. In any case, I promised I’d include what I wrote prior to class yesterday and so I have:
My name is Tom Dean. I’m a Consulting professor at Stanford, a Research Scientist at Google, and before that a Professor at Brown University for over twenty years. You can call me “Tom” or “Professor Dean” or just “Hey You”; I prefer “Tom”. I say this despite the fact that I know from my study of psychology and neuroscience that encouraging familiarity undermines respect. We’re all barely evolved apes. I’m probably one of the more eclectic professors you’ll meet at Stanford or the very least one of the more varied in terms of background and employment. I’ve also made a living as a sculptor, furniture builder, architect, building contractor, carpenter, machinist, metal fabricator, software engineer, etc.
I’ve also taught for much of the last 25 years and I can’t say I like the “lecture” part. Who’s ever heard the saying “ I must not fear. Fear is the mind-killer. Fear is the little-death that brings total obliteration.” — Frank Herbert in Dune. I know this is true on so many levels. Why do I admit this to you now? Because I’m risk prone and I’m curious how you’ll act. Why might I not want to admit it? Because it’s a bad idea to let a predator know you’re scared and we are all predators — top of the food chain and all that. There’s another, academically-related reason I mention it now. We are all emotional animals, deeply flawed from the perspective of ideal rational being. We are also social animals and we love to share and appreciate it when someone is forthright and not afraid to put their cards on the table.
What do I mean when I say we are emotional animals? I mean that our emotions color every aspect of our lives. In order to survive and direct our attention moment by moment we employ machinery that assigns a value to every thought that pops into our heads and that valuation is derived in large part to what we colloquially refer to as our emotions, though it is perhaps more accurate to refer to as body images. Emotions are only lately gaining scientific respectability and I am cautious for that reason of admitting that I find them interesting. I mention them now not because I expect we will spend much time talking about them in this class but rather because, along with our social instincts, emotions explain a great deal about human computation. I am manipulating your emotions right now just as you are manipulating mine. I can manipulate you by playing soothing nature sounds in the background or randomly flashing pictures of attractive, honest-faced people on a screen for fifty millisecond intervals as I’ve been doing since I started this class. Just kidding.
It also works to tell little personal anecdotes — I got this tip from watching Robert Sapolsky’s lectures and like him I dilute the effect by telling my audience what I’m up to. I peak in alertness and mental quickness around 7am — I’m usually up and working by 4am, 5am at the latest. Last night I hardly slept at all. It wasn’t that I was nervous about today though that may have had something to do with it. Rather a squirrel kept me up most of the night. A squirrel got into our attic during the winter and started making little scraping noises above the ceiling in our bedroom at night. And so yesterday I got out my tools and crawled all over the roof and eaves and blocked every possible entrance with scraps of wood, metal and heavy-duty screen. Last night the squirrel obsessively visited each place I blocked up and noisily gnawed at the wood, rattled the metal, scrabbled up and down our drain pipes and generally made a ruckus. The squirrel gave up around 4am when my internal clock signalled it was time to get up and get to work.
Most public speakers know — whether instinctively or having been informed by their “public-speaking coach” — that you can relax after the first few minutes — milliseconds actually — because your audience will already have established whether they like you or not and whether what you’re saying is worth listening to. But you can alter the normal course by taking unorthodox steps, unorthodox at least in terms of the usual protocol for giving talks. Here’s where I take a one-pound bar of Trader Joe’s 72% Dark Chocolate and flat-handed-smack it on the table thereby breaking it into a gazillion pieces so everyone can get their chocolate fix. I call this “chocolate fracking” and I’ve used it in several classes including the last time I taught this course in 2010. I just changed a lot of things in your bodies. If successful, I altered your heart rate, hormones and even your metabolism. I probably induced a minor stress response and in so doing changed the state of your brain and several of your major organs including your heart, kidneys and stomach — your other brain. Robert Sapolsky is a master at describing the underlying neural, hormonal and metabolic processes and we’ll hear more about one of his theories later this week, though his major area of expertise is not central to our focus of attention.
What is the primary focus of this class? Computational Scaling. Building brain-scale simulations of cortical function. Figuring out how to use Moore’s law to scale the investigation of neuroscience in much the same way that scientists and engineers have used it so scale the investigation of genomics. Building practical systems that leverage the principles of biological computation to perform useful inference. I want to train you to think about neuroscience as an engineer would, to gain an appreciation of what’s possible now and in the near future, and inspire you to pursue research and development in these areas of science and technology.
My boss Larry Page is fond of saying that it makes a lot of sense to tackle a really big problem. Why? For one thing you’ll probably have the space all to yourselves. Well, not entirely; there will be some people who lack the intellectual firepower to compete with you and don’t have the good sense to realize just how hard the problem actually is. And then there will be a very few just as smart as you but they may not have the resources unless they also work at a place like Google or study at an engineering school like Stanford or MIT. The second reason is that if you manage to get the required resources you’ll have a good chance of recruiting other very smart, very ambitious people to work with you.
It’s trickier than just picking a really hard problem. It has to be a really hard problem that is ready for cracking. Sometimes the prerequisite tools are not available to make the next step. But sometimes the tools are available but we just haven’t recognized them or we haven’t taken the time and put in the effort to make them precise enough or fast enough or cheap enough. Newtons and Einsteins might be extremely rare but good experimentalists like Robert Hooke, Ernest Rutherford and James Watt while still rare are your best friend when it comes to pushing the envelope of what can be done with today’s science and technology.
Crick and Watson might not have been able to unravel the mysteries of DNA had it not been for Max von Laue, William Bragg and the invention of x-ray crystallography as new method of determining the arrangement of atoms within a crystal. In our invited lectures series, we’ll be talking with some of the best of the current crop of scientist / inventor / engineer working at the cutting edge of experimental neuroscience. We’ll learn about new areas of study — connectomics, proteomics, synaptomics, optogenetics — and associated experimental apparatus — both hardware and software that is amenable to computational and robotic scaling.
What won’t we be talking about in this course is just as important. Computational neuroscience is a demanding discipline. The best people in the field often have background in physics or applied mathematics. They have to know about standard engineering fare including multivariate calculus, linear algebra, differential equations, linear systems theory, dynamical systems and electrical circuits plus a good deal of advanced math including lots of probability, statistics, coding theory and nonlinear control theory.
This is all good stuff, but a lot of what academics work on is ..., well, academic. Bruno Olshausen will give us some very interesting insights into what computational neuroscientists work on and what they don’t but suffice it to say that there’s a lot of grubbing around under the lamp post. Hodgkin-Huxley is a model of membrane conductance and action potential propagation based on giant squid neurons. It can be described as a simple electrical circuit with the membrane lipid layer represented as a capacitor and voltage-gated and ion channels represented as conductors. Its dynamics can be realized as set of systems differential equations and it and its variations and improvements have been very successful in predicting the behavior of individual and small collections of neurons.
Again great stuff, interesting, essential and scientifically useful, but not aggressively pushing the envelope of what we know about whole brain. And there is good reason to believe that there are other phenomena that only manifest themselves at larger scales and which figure just as prominently in how neurons work in brain-scale ensembles. The leaky-integrate-and-fire neuron which is based roughly on Hodgkin-Huxley will figure prominently in some of the larger scale models that we will investigate and one question we will have to ask our experts is whether or not it serves as a suitable foundation for such large scale models. One of the engineers who works for me at Google has worked on one of the largest such projects and will provide us with some interesting insights into the viability of such endeavors.
I also want to challenge those working in machine learning on so-called biologically-inspired computing architectures. Artificial neural networks have seen a huge resurgence in interest among academics in recent years with advent of new learning methods and cheap computational cycles from graphics processing units (GPUs). We’ll talk about the models as well as their implementation including some software developed by a superb software engineer and computational neuroscientist working in Tommy Poggio’s lab at MIT.
Much of the machine learning work is based on simple linear systems theory with a dash of nonlinear systems to spice things up. Linear algebra is the modern version of the hydraulic metaphor and the clock mechanism metaphor which inspired earlier scientists. Von Neumann machines are not considered good models for brains but they’re fine for simulating other computational models. But there are other algorithmic and computational metaphors that might spark very different biological models. One of my favorites is constant-time hashing and some of the new wrinkles on this old favorite of engineers such as locality-sensitive hashing.
Why hashing? I don’t want to digress too much, but, having given this example, I’ll provide some of my reasoning. A single neurons is likely much more powerful than a transistor gate and probably less powerful than a modern laptop running linux. Neuroscientists of the past generation grew comfortable with ridiculing the idea of a “grandmother” cell, but the idea is gaining credence again, albeit in a more subtle form. Many existing biologically-inspired models view a neuron as capable of performing a dot product — a component-wise multiplication of a data vector and a filter kernel — followed by a output nonlinear function. There are plenty of reasons to imagine neurons as capable of something like this but there are some computational reasons for thinking that it is a bad idea if you want to scale your models to brain size.13
As for biological plausibility, Aude Oliva makes the following observation. Most human memory models are based on the notion of interference: if two stimuli are too similar, they will interfere, create false alarms — you will think you have seen B, when you have seen in fact A. The notion of “similarity” can be calculated as a distance in a feature space. The question for psychology and computational neuroscience is: What is the “feature space” used by human memory.
Humans can visually differentiate about 10-100,000 object categories. If each of those categories was realized by a model — for instance a deformable parts model [43, 41, 42, 40] — implemented as a collection of, say, around 10 filters, to scan an image 100 × 100 pixels you’d have to perform around 1002 * 10 * 100,000 = 1010 dot products, and that’s for just one low-resolution image at a single scale. The issue is not whether cortical circuits could perform these calculations so much as would natural selection have settled for such a solution if a metabolically-less-expensive alternative was available.
Humans are very good at recognizing things they’ve seen before, even if only for a small fraction of a second. Subjects who are shown hundreds of images for 50 milliseconds each are not even conscious of what they’ve seen but by studying physiological cues it is possible to determine when the have unconsciously recognized an image they have seen before. And humans are remarkably good at this task. Google accomplishes a similar feat in recognizing copyrighted material by taking an arbitrary image or thumbnail from video, hashing it into a table that stores a very large number of document signatures, and announcing a “dupe” detection if it encounters a collision in the hash table.
One can easily imagine human “immediate recognition” as being facilitated by some form of neuronal hashing [102, 103]. And what about object detection and reducing or eliminating the need for computing dot products? Here are some other questions for you: What makes hashing work and does this match with what the brain is good at? Suppose you could do hashing on a massive scale how could this assist in building better object detectors? Also see here.
Brain sciences in the news in the past week:
For an attempt to create an “artificial brain” by attaching neurons to a nanocellulose scaffolding that has been positively charged to attract the cells and provide a three-dimensional culture in which the neurons can grow and even establish connections see here.
The Allen Institute for Brain Studies has received another large grant from Paul Allen. The article also mentions that Christof Koch joined the Allen Institute from Caltech in 2011 as chief scientific officer. See here for more details.
In a new study researchers at MIT used optogenetics to show that memories “reside in very specific brain cells, and that simply activating a tiny fraction of brain cells can recall an entire memory — explaining, for example, how Marcel Proust could recapitulate his childhood from the aroma of a once-beloved madeleine cookie.”
On Monday, April 2, with Radiolab regular Carl Zimmer — an excellent science writer and author of Soul Made Flesh: The Discovery of the Brain and How It Changed The World and Brain Cuttings: Fifteen Journeys Through the Mind, Robert Krulwich will be hosting a duel between MIT’s Sebastian Seung and Anthony Movshon of New York University. Sebastian has been making the rounds touting his new book  on connectomics and Tony Movshon has participated in Charlie Rose’s Brain Series co-hosted by Eric Kandel [65, 64]. The two will “duke it out” — Robert Krolwich’s wording — on the campus of Columbia University and the topic concerns the plausibility of a “Jennifer Aniston” neuron. The show will be streamed on Radiolab. Zimmer has discussed so-called grandmother neurons before; here’s what he had to say in recent blog post.
There is a wide array of beautiful animations of biological phenomena available on the web, including neural processes and visualizations of the developing brain. Some of them are even reasonably accurate given our state of knowledge. Robert Lue at Harvard and his animations of cellular processes are quite remarkable as this New York Times article on Dr. Lue’s work attests and you can see for yourself on the Harvard Medical School Cell Biology Visualization web site. You might also want to check out the award winning “The Inner Life of a Cell” and “Mitochondria” animations. These are extraordinarily detailed and — given the state of our current knowledge — accurate renderings of what goes on in a cell, but even so they tell only a small part of the whole story, important chapters of which we’re still discovering.
Here are some pointers to useful web pages concerning models of spiking neurons: Formal leaky-integrate-and-fire spiking-neuron model. Characterization as an electrical circuit consisting of a resistor and capacitor in parallel.
Polsky A, Mel B.W., and Schiller J.
Computational subunits in thin dendrites of pyramidal cells.
Nature Neuroscience, 37(6):621–617, 2004.|
E. Adelson and J. Movshon.
Phenomenal coherence of moving visual patterns.
Nature, 300:523–525, 1982.|
Edward H. Adelson and James R. Bergen.
Spatiotemporal energy models for the perception of motion.
Journal of the Optical Society of America A, 2(2):284–299,
Yehuda Afek, Noga Alon, Omer Barad, Eran Hornstein, Naama Barkai, and Ziv
A biological solution to a fundamental distributed computing problem.
Science, 331(6014):183–185, 2011.|
Searching for filters with ‘interesting’ output distributions: an
uninteresting direction to explore?
Network: Computation in Neural Systems, 7(1):409–421, 1996.|
Horace B. Barlow.
Possible principles underlying the transformations of sensory
In W. A. Rosenblith, editor, Sensory Communication, pages
217–234. MIT Press, Cambridge, MA, 1961.|
Mark F. Bear, Barry Connors, and Michael Paradiso.
Neuroscience: Exploring the Brain (Third Edition).
Lippincott Williams & Wilkins, Baltimore, Maryland, 2006.|
Pietro Berkes and Laurenz Wiskott.
Slow feature analysis yields a rich repertoire of complex cell
Journal of Vision, 5(6):579–602, 2005.|
I. Biederman, J. Rabinowitz, A. Glass, and W.E. Stacy.
On the information extracted from a glance at a scene.
Journal of Experimental Psychology, 103(3):597–600, 1974.|
Elie Bienenstock, Stuart Geman, and Daniel Potter.
Compositionality, MDL priors and object recognition.
In M.C. Mozer, M.I. Jordan, and T. Petsche, editors, Advances in
Neural Information Processing Systems 9, pages 838–844. MIT Press,
Cambridge, MA, 1998.|
Michael Black and Allan Jepson.
Eigentracking: Robust matching and tracking of articulated objects
using a view-based representation.
International Journal of Computer Vision, 26(1):63–84, 1998.|
Christoph Börgers, Giovanni Talei Franzesi, Fiona E. N. LeBeau, Edward S.
Boyden, and Nancy J. Kopell.
Minimal size of cell assemblies coordinated by gamma oscillations.
PLoS Biology, 8(2):e1002362, 2012.|
Timothy F. Brady, Talia Konkle, George A. Alvarez, and Aude Oliva.
Visual long-term memory has a massive storage capacity for object
Proceedings of the National Academy of Sciences,
Sergey Brin and Larry Page.
The anatomy of a large-scale hypertextual Web search engine.
In Proceedings of the 7th World Wide Web Conference, 1998.|
A. Broder, R. Krauthgamer, and M. Mitzenmacher.
Improved classification via connectivity information.
In Proceedings of the 11th ACM-SIAM Symposium on Discrete
Algorithms, pages 576–585, 2000.|
A. Broder, R. Kumar, F. Maghoul, P. Raghavan, S. Rajagopalan, R. Stata,
A. Tomkins, and J. Wiener.
Graph structure in the Web: experiments and models.
In Proceedings of the 9th World Wide Web Conference, 2000.|
Thomas Brox and Jitendra Malik.
Object segmentation by long term analysis of point trajectories.
In Proceedings of the 11th European Conference on Computer
Vision, pages 282–295, Berlin, Heidelberg, 2010. Springer-Verlag.|
H. Bülthoff, J. Little, and T Poggio.
A parallel algorithm for real-time computation of optical flow.
Nature, 337(6207):549–553, February 1989.|
Rhythms of the Brain.
Oxford University Press, 2006.|
Charles F. Cadieu and Bruno A. Olshausen.
Learning intermediate-level representations of form and motion from
Neural Computation, 24(4):827–866, 2012.|
Theodore Camus and Heinrich Bülthoff.
Space-time tradeoffs for adaptive real-time tracking.
In Proceedings of the SPIE Conference on Advances in
Intelligent Robotic Systems, pages 268–276. SPIE, 1991.|
Ryan T. Canolty, Karunesh Ganguly, Steven W. Kennerley, Charles F. Cadieu,
Kilian Koepsell, Jonathan D. Wallis, and Jose M. Carmena.
Oscillatory phase coupling coordinates anatomically dispersed
functional cell assemblies.
Proceedings of the National Academy of Sciences,
From circuits to behavior: a bridge too far?
Nature Neuroscience, 15(4):1097–6256, 2012.|
Matteo Carandini, David J. Heeger, and J. Anthony Movshon.
Linearity and normalization in simple cells of the macaque primary
Journal of Neuroscience, 17:8621–8644, 1997.|
Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach,
Michael Burrows, Tushar Chandra, Andrew Fikes, and Robert E. Gruber.
BigTable: A distributed storage system for structured data.
ACM Transactions on Computing Systems, 26(2), 2008.|
T. Cheatham, A. Fahmy, D. Stefanescu, and L. Valiant.
Bulk synchronous parallel computing: A paradigm for transportable
In Proceedings of the 28th Annual Hawaii Conference on System
Sciences, volume II, pages 268–275. IEEE Computer Society Press, 1995.|
P. Dayan and L. F. Abbott.
MIT Press, Cambridge, MA, 2001.|
J. Dean and S. Ghemawat.
MapReduce: Simplified data processing on large clusters.
In Proceedings of the 6th Symposium on Operating Systems Design
and Implementation, pages 137–150, 2004.|
Stanislas Dehaene and Jean-Pierre Changeux.
A hierarchical neuronal network for planning behavior.
Proceedings of the National Academy of Sciences,
Thomas B. DeMarse, Daniel A. Wagenaar, Axel W. Blau, and Steve M. Potter.
The neurally controlled animat: Biological brains acting with
Autonomous Robots, 11(3):1573–7527, 2001.|
James J DiCarlo, Davide Zoccolan, and Nicole C Rust.
How does the brain solve visual object recognition?
Neuron, 73:415–34, 2012.|
Shaul Druckmann, Thomas K. Berger, Felix Schürmann, Sean Hill, Henry
Markram, and Idan Segev.
Effective stimuli for constructing reliable neuron models.
PLoS Computational Biology, 7(8):e1002133, 2011.|
Intelligible encoding of ASL image sequences at extremely low information
G. sperling and m.s. landy and y. cohen and m. pavel.
Computer Vision Graphics and Image Processing, 31(1):335–391,
P. Erdös and A. Rényi.
On the evolution of random graphs.
Publications of the Mathematical Institute of the Hungarian
Academy of Sciences, 5:17–61, 1960.|
P. Erdös and A. Rényi.
On random graphs I.
Publ. Math. Debrecen, 6:290–297, 1961.|
D.A. Fair, A.L. Cohen, J.D. Power, N.U.F. Dosenbach, and J.A. Church.
Functional brain networks develop from a local to distributed
PLoS Computational Biology, 5(5), 2009.|
Vitaly Feldman and Leslie G. Valiant.
Experience-induced neural circuits that achieve high capacity.
Neural Computation, 21(12):2715–2754, 2009.|
D. J. Felleman and D. C. Van Essen.
Distributed hierarchical processing in primate cerebral cortex.
Cerebral Cortex, 1:1–47, 1991.|
P. Felzenszwalb, R. Girshick, D. McAllester, and D. Ramanan.
Object detection with discriminatively trained part based models.
IEEE Transactions on Pattern Analysis and Machine
Intelligence, 32:1627–1645, 2010.|
P. F. Felzenszwalb, R. B. Girshick, and D. McAllester.
Cascade object detection with deformable part models.
In IEEE Conference on Computer Vision and Pattern
Recognition, pages 2241–2248. IEEE, 2010.|
Pedro F. Felzenszwalb and Daniel P. Huttenlocher.
Pictorial structures for object recognition.
International Journal of Computer Vision, 61(1):55–79, 2004.|
Pedro F. Felzenszwalb, David A. McAllester, and Deva Ramanan.
A discriminatively trained, multiscale, deformable part model.
In IEEE Computer Society Conference on Computer Vision and
Pattern Recognition, pages 1–8. IEEE Computer Society, 2008.|
M.A. Fischler and R.A. Elschlager.
The representation and matching of pictorial structures.
IEEE Transactions on Computers, 22(1):67–92, 1973.|
Learning invariance from transformation sequences.
Neural Computation, 3:194–200, 1991.|
Dileep George and Jeff Hawkins.
Towards a mathematical theory of cortical micro-circuits.
PLoS Computational Biology, 5(10), 2009.|
Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung.
The Google file system.
In Proceedings of the 19th ACM Symposium on Operating Systems
Principles, pages 29–43. ACM, 2003.|
David Gibson, Jon M. Kleinberg, and Prabhakar Raghavan.
Inferring Web communities from link topology.
In Proceedings of the 9th ACM Conference on Hypertext and
Hypermedia, pages 225–234, Pittsburgh, Pennsylvania, June 1998.|
James J. Gibson.
Perception of the Visual World.
Houghton Mifflin, Boston, 1950.|
James J. Gibson.
The Ecological Approach to Visual Perception.
Houghton Mifflin, Boston, 1979.|
Aristides Gionis, Piotr Indyk, and Rajeev Motwani.
Similarity search in high dimensions via hashing.
In Malcolm P. Atkinson, Maria E. Orlowska, Patrick Valduriez,
Stanley B. Zdonik, and Michael L. Brodie, editors, Proceedings of the
25th International Conference on Very Large Data Bases, pages 518–529.
Morgan Kaufmann, 1999.|
Michelle R. Greene and Aude Oliva.
The Briefest of Glances: The Time Course of Natural Scene
Psychological Science, 20(4):464–472, 2009.|
A. Grinvald, D. Shoham, A. Shmuel, D. Glaser, I. Vanzetta, E. Shtoyerman,
H. Slovin, A. Sterkin, C. Wijnbergen, R. Hildesheim, and A. Arieli.
In-vivo optical imaging of cortical architecture and dynamics.
Technical Report BC-AG/99-6, Grodetsky Center for Research of Higher
Brain Functions at the Weizmann Institute of Science, Israel, 2001.|
Memory skills of deaf learners: Implications and applications.
American Annals of the Deaf, 156:2011, 4.|
Bharath Hariharan, Pablo Arbelaez, Lubomir Bourdev, Subhransu Maji, and
Semantic contours from inverse detectors.
In International Conference on Computer Vision, 2011.|
John L. Hennessy and David A. Patterson.
Computer Architecture: A Quantitative Approach.
5th Edition, Morgan Kaufmann, San Francisco, CA, 2012.|
Visual Intelligence: How We Create What We See.
W. W. Norton, New York, NY, 1998.|
Aapo Hyvärinen and Patrick O. Hoyer.
Emergence of phase and shift invariant features by decomposition of
natural images into independent feature subspaces.
Neural Computation, 12(7):1705–1720, 2000.|
Aapo Hyvärinen and Patrik O. Hoyer.
A two-layer sparse coding model learns simple and complex cell
receptive fields and topography from natural images.
Vision Research, 41(18):2413–2423, 2001.|
Aapo Hyvärinen, Jarmo Hurri, and Patrik O. Hoyer.
Natural Image Statistics: A probabilistic approach to early
Aapo Hyvärinen, Jarmo Hurri, and Jaakko Väyrynen.
Bubbles: a unifying framework for low-level statistical properties of
natural image sequences.
Journal of the Optical Society of America, 20(7):1237–1252,
Piotr Indyk and Rajeev Motwani.
Approximate nearest neighbors: Towards removing the curse of
In Proceedings of the 30th Annual ACM Symposium on Theory of
Computing, pages 604–613, 1998.|
Understanding and remembering briefly glimpsed pictures: Implications
for visual scanning and memory.
In Veronika Coltheart, editor, Fleeting Memories: Cognition of
Brief Visual Stimuli, pages 47–70. MIT Press, Cambridge, MA, 1999.|
Eugene M. Izhikevich and Gerald M. Edelman.
Large-scale model of mammalian thalamo-cortical systems.
Proceedings of the National Academy of Science,
E.R. Kandel, J.H. Schwartz, and T.M. Jessell.
Principles of neural science (Fourth Edition).
McGraw-Hill, Health Professions Division, 2000.|
Eric R. Kandel.
In Search of Memory: The Emergence of a New Scince of Mind.
W. W. Norton, New York, NY, 2006.|
Yan Karklin and Michael S. Lewicki.
Emergence of complex cell properties by learning to generalize in
Nature, 457:83–86, 2009.|
J.N. Kay, I. De la Huerta, I.J. Kim, Y. Zhang, M. Yamagata, M.W. Chu,
M. Meister, and J.R. Sanes.
Retinal ganglion cells with distinct directional preferences differ
in molecular identity, structure, and central projections.
Journal of Neuroscience, 25(31):7753–7762, 2011.|
K.N. Kay, T. Naselaris, R.J. Prenger, and J.L. Gallant.
Identifying natural images from human brain activity.
Nature, 452:352–355, 2008.|
David Kempe, Jon Kleinberg, and Éva Tardos.
Maximizing the spread of influence through a social network.
In Proceedings of the 9th ACM SIGKDD International Conference
on Knowledge Discovery and Data Mining, pages 137–146, New York, NY, USA,
Georges Khazen, Sean L. Hill, Felix Schürmann, and Henry Markram.
Combinatorial expression rules of ion channel genes in juvenile rat
(rattus norvegicus) neocortical neurons.
PLoS ONE, 7(4):e34786, 2012.|
Authoritative sources in a hyperlinked environment.
In Proceedings of 9th Annual ACM-SIAM Symposium on Discrete
Algorithms, pages 668–677, 1998.|
J. Kleinberg, S.R. Kumar, P. Raghavan, S. Rajagopalan, and A. Tomkins.
The Web as a graph: Measurements, models and methods.
In Proceedings of the International Conference on Combinatorics
and Computing, 1999.|
Etienne Koechlin, Gregory Corrado, Pietro Pietrini, and Jordan Grafman.
Dissociating the role of the medial and lateral anterior prefrontal
cortex in human planning.
Proceedings of the National Academy of Sciences,
Etienne Koechlin and Thomas Jubault.
Broca’s area and the hierarchical organization of human behavior.
Neuron, 50(6):963–974, 2006.|
Etienne Koechlin, Chrystèle Ody, and Frédérique Kouneiher.
The architecture of cognitive control in the human prefrontal cortex.
Science, 302:1181–1185, 2003.|
R. Fergus L. Fei-Fei and P. Perona.
Learning generative visual models from few training examples: an
incremental Bayesian approach tested on 101 object categories.
IEEE Transactions on Pattern Analysis & Machine Intelligence,
Michael F. Land and Dan-Eric Nilsson.
Oxford University Press, Oxford, UK, 2002.|
Victor W. Lee, Changkyu Kim, Jatin Chhugani, Michael Deisher, Daehyun Kim,
Anthony D. Nguyen, Nadathur Satish, Mikhail Smelyanskiy, Srinivas Chennupaty,
Per Hammarlund, Ronak Singhal, and Pradeep Dubey.
Debunking the 100X GPU versus CPU myth: an evaluation of
throughput computing on CPU and GPU.
SIGARCH Computer Architecture News, 38(3):451–460, 2010.|
J. Y. Lettvin, H. R. Maturana, W. S. McCulloch, and W. H. Pitts.
What the frog’s eye tells the frog’s brain.
Proceedings of the Institute for Radio Engineers,
Subhransu Maji, Lubomir Bourdev, and Jitendra Malik.
Action recognition from a distributed representation of pose and
In IEEE International Conference on Computer Vision and
Pattern Recognition, 2011.|
Ravi S. Menon and Seong-Gi Kim.
Spatial and temporal limits in cognitive neuroimaging with fMRI.
Trends in Cognitive Sciences, 3:207–216, 1999.|
T. M. Mitchell, S. V. Shinkareva, A. Carlson, K.M. Chang, V. L. Malave, R. A.
Mason, and M. A. Just.
Predicting human brain activity associated with the meanings of
Science, 320(5880):1191–1195, 2008.|
Hans P. Moravec.
Mind Children: The Future of Robot and Human Intelligence.
Harvard University Press, Cambridge, MA, 1988.|
Hans P. Moravec.
Robot: Mere Machine to Transcendent Mind.
Oxford University Press, Oxford, 1999.|
Jim Mutch and David G. Lowe.
Multiclass object recognition with sparse, localized features.
In Proceedings of the 2006 IEEE Computer Society Conference on
Computer Vision and Pattern Recognition, pages 11–18, Washington, DC, USA,
2006. IEEE Computer Society.|
Jim Mutch and David G. Lowe.
Object class recognition and localization using sparse features with
limited receptive fields.
International Journal on Computer Vision, 80(1):45–57, 2008.|
Thomas Naselaris, Ryan J. Prenger, Kendrick N. Kay, Michael Oliver, and Jack L.
Bayesian reconstruction of natural images from human brain activity.
Neuron, 63(6):902–915, 2009.|
M. Newman, D. Watts, and S. Strogatz.
Random graph models of social networks.
Proceedings of the National Academy of Science, 99:2566–2572,
Andrew Ng and Michael Jordan.
On discriminative versus generative classifiers: A comparison of
logistic regression and naïve Bayes.
In T. G. Dietterich, S. Becker, and Z. Ghahramani, editors, Advances in Neural Information Processing Systems 14, Cambridge, MA, 2002.
Shinji Nishimoto, An T. Vu, Thomas Naselaris, Yuval Benjamini, Bin Yu, and
Jack L. Gallant.
Reconstructing visual experiences from brain activity evoked by
Current Biology, 21:1641–1646, 2011.|
B. A. Olshausen and D. J. Field.
Natural image statistics and efficient coding.
In Workshop on Information Theory and the Brain, 1995.|
B. A. Olshausen and D. J. Field.
Natural image statistics and efficient coding.
Computation in Neural Systems, 7(2):333–339, 1996.|
Nancy A. O’Rourke, Nicholas C. Weiler, Kristina D. Micheva, and Stephen J.
Deep molecular diversity of mammalian synapses: Why it matters and
how to measure it.
Nature Review Neuroscience, 0:000–000, 2012.|
A. Pentland, B. Moghaddam, and T. Starner.
View-based and modular eigenspaces for face recognition.
In Proceedings of the 1994 IEEE Conference on Computer Vision
and Pattern Recognition. IEEE Computer Society, 1994.|
Panayiota Poirazi, Terrence Brannon, and Bartlett W. Mel.
Pyramidal neuron as two-layer neural network.
Neuron, 37(6):989–999, 2003.|
M. C. Potter and B. A. Faulconer.
Time to understand pictures and words.
Nature, 253(5491):437–438, 1975.|
M. Riesenhuber and T. Poggio.
Hierarchical models of object recognition in cortex.
Nature Neuroscience, 2(11):1019–1025, November 1999.|
Mohammad Amin Sadeghi and Ali Farhadi.
Recognition using visual phrases.
In IEEE Computer Vision and Pattern Recognition, Workshop on
Biologically Consistent Vision, 2011.|
A. Saxe, P. Koh, Z. Chen, M. Bhand, B. Suresh, and A. Y. Ng.
On random weights and unsupervised feature learning.
In International Conference in Machine Learning, 2011.|
Michael Schmidt and Hod Lipson.
Distilling free-form natural laws from experimental data.
Science, 324(5923):81–85, 2009.|
J. Sepulcre, H. Liu, T. Talukdar, I. Martincorena, and B.T.T. Yeo.
The organization of local and distant functional connectivity in the
PLoS Computational Biology, 6(6), 2010.|
T. Serre, M. Kouh, C. Cadieu, U. Knoblich, G. Kreiman, and T. Poggio.
A theory of object recognition: Computations and circuits in the
feedforward path of the ventral stream in primate visual cortex.
Technical Report AI Memo 2005-036 (CBCL Memo 259), MIT Center
for Biological & Computational Learning, December 2005.|
T. Serre, L. Wolf, S. Bileschi, M. Riesenhuber, and T. Poggio.
Object recognition with cortex-like mechanisms.
IEEE Transactions on Pattern Analysis and Machine
Intelligence, 29(3):411–426, 2007.|
Connectome: How the Brain’s Wiring Makes Us Who We Are.
Houghton Mifflin Harcourt, Boston, 2012.|
Kausik Si, Yun-Beom Choi, Erica White-Grindley, Amitabha Majumdar, and Eric R.
Aplysia CPEB can form prion-like multimers in sensory neurons that
contribute to long-term facilitation.
Cell, 140(3):421–435, 2010.|
Pawan Sinha, Benjamin Balas, Yuri Ostrovsky, and Richard Russell.
Face recognition by humans: Nineteen results all computer vision
researchers should know about.
Proceedings of the IEEE, 94(11):1948–1962, 2006.|
L. Sirovich and M. Kirby.
Low dimensional procedure for the characterization of human faces.
Journal of the Optical Society of America, 4(3):519–524, 1987.|
Learning 10,000 pictures.
Journal of Experimental Psychology, 25(2):207–222, 1973.|
Garrett B. Stanley, Fei-Fei Li, and Yang Dan.
Reconstruction of natural scenes from ensemble responses in the
lateral geniculate nucleus.
The Journal of Neuroscience, 19(18):8036–8042, 1999.|
Keith E. Stanovich.
Decision Making and Rationality in the Modern World.
Oxford University Press, Oxford, UK, 2009.|
Keith E. Stanovich.
Rational and irrational thought: The thinking that IQ tests miss.
Scientific American, November/December:34–39, 2009.|
Keith E. Stanovich.
What Intelligence Tests Miss: The Psychology of Rational
Yale University Press, New Haven, CT, 2009.|
Greg J. Stephens, Leslie C. Osborne, and William Bialek.
Searching for simplicity in the analysis of neurons and behavior.
Proceedings of the National Academy of Sciences,
S. Subramaniam, I. Biederman, and S. A. Madigan.
Accurate identification but no priming and chance recognition memory
for pictures in RSVP sequences.
Visual Cognition, 7(4):511–535, 1975.|
Patrik Sundberg, Thomas Brox, Michael Maire, Pablo Arbelaez, and Jitendra
Occlusion boundary detection and figure/ground assignment from
In IEEE International Conference on Computer Vision and
Pattern Recognition, 2011.|
Tatsuto Takeuchia and Karen K. De Valois.
Sharpening image motion based on the spatio-temporal characteristics
of human vision.
In Bernice E. Rogowitz, editor, Proceedings of SPIE Conference
on Human Vision and Electronic Imaging, pages 83–94, 2005.|
Jason M. Tangen, Sean C. Murphy, and Matthew B. Thompson.
Flashed face distortion effect: Grotesque faces from relative spaces.
Perception, 40(1):628–630, 2011.|
S. Thorpe, D. Fize, C. Marlot, et al.
Speed of processing in the human visual system.
Nature, 381(6582):520–522, 1996.|
Gasper Tkacik, Patrick Garrigan, Charles Ratliff, Grega Milcinski, Jennifer M.
Klein, Lucia H. Seyfarth, Peter Sterling, David H. Brainard, and Vijay
Natural images from the birthplace of the human eye.
PLoS ONE, 6:e20409, 2011.|
Antonio Torralba, Rob Fergus, and William Freeman.
Object and scene recognition in tiny images.
Journal of Vision, 7(9):193–193, 2007.|
K. Tsunoda, Y. Yamane, M. Nishizaki, and M. Tanifuji.
Complex objects are represented in macaque inferotemporal cortex by
the combination of feature columns.
Nature Neuroscience, 4:832–838, 2001.|
M. A. Turk and A. P. Pentland.
Face recognition using eigenfaces.
In Proceedings of the 1991 IEEE Conference on Computer Vision
and Pattern Recognition, pages 586–591. IEEE Computer Society, 1991.|
Amos Tversky and Daniel Kahneman.
Judgment under uncertainty: Heuristics and biases.
Science, 185(24157):1124–1131, 1974.|
Leslie G. Valiant.
Memorization and association on a realistic neural model.
Neural Compututation, 17(3):527–555, 2005.|
Leslie G. Valiant, Jianer Chen, and S. Cooper.
Neural computations that support long mixed sequences of knowledge
In Lecture Notes in Computer Science: Theory and Applications of
Models of Computation. Springer, Berlin, 2009.|
Bram-Ernst Verhf, Rufin Vogels, and Peter Janssen.
Inferotemporal cortex subserves three-dimensional structure
Neuron, 73:171–182, 2012.|
M. J. Wainwright, O. Schwartz, and E. P. Simoncelli.
Natural image statistics and divisive normalization: Modeling
nonlinearity and adaptation in cortical neurons.
In R. Rao, B. Olshausen, and M. Lewicki, editors, Probabilistic
Models of the Brain: Perception and Neural Function, chapter 10, pages
203–222. MIT Press, February 2002.|
Brian A. Wandell, Serge O. Dumoulin, and Alyssa A. Brewer.
Visual field maps in human cortex.
Neuron, 56(2):366–383, 2007.|
Andrew B. Watson and Jr. Albert J. Ahumada.
Model of human visual-motion sensing.
Journal of the Optical Society of America A, 2(2):322–341,
Van J. Wedeen, Douglas L. Rosene, Ruopeng Wang, Guangping Dai, Farzad
Mortazavi, Patric Hagmann, Jon H. Kaas, and Wen-Yih I. Tseng.
The geometric structure of the brain fiber pathways.
Science, 335(6076):1628–1634, 2012.|
Laurenz Wiskott and Terrence Sejnowski.
Slow feature analysis: Unsupervised learning of invariances.
Neural Computation, 14(4):715–770, 2002.|
Jay Yagnik, Dennis Strelow, David A. Ross, and Ruei-sung Lin.
The power of comparative reasoning.
In Proceedings of the International Conference on Computer
Vision. IEEE Computer Society, 2011.|
Yukako Yamane, Eric T. Carlson, Katherine C. Bowman, Zhihong Wang, and
Charles E. Connor.
A neural code for three-dimensional object shape in macaque
Nature Neuroscience, 11:1352–1360, 2008.|
Yukako Yamane, Kazushige Tsunoda, Madoka Matsumoto, Adam N. Phillips, and
Representation of the spatial relationship among object parts by
neurons in macaque inferotemporal cortex.
Journal of Neurophysiology, 96(6):3147–3156, 2006.|
Long Zhu, Yuanhao Chen, Antonio Torralba, William T. Freeman, and Alan L.
Part and appearance sharing: Recursive compositional models for
multi-view and multi-object detection.
In IEEE Conference on Computer Vision and Pattern
Recognition, pages 1919–1926. IEEE, 2010.|
Long Zhu, Yuanhao Chen, Alan L. Yuille, and William T. Freeman.
Latent hierarchical structural learning for object detection.
In IEEE Conference on Computer Vision and Pattern
Recognition, pages 1062–1069. IEEE, 2010.|
1 I was thinking about the Feynman video on Sunday evening as I watched Stevie Wonder perform “Alfie” at the White House in honor of Hal David and Burt Bacharach receiving the Gershwin Prize. “Alfie” always struck me as a bit schmaltzy but Wonder’s version was jazzy and the improvisations in his harmonica solo were inspired. It occurred to me that jazz improvisation is analogous to how we perceive the world — a creative process in which we imagine the world in terms of our past experiences. More generally, remembering is a creative process that results in changes to the memories — patterns of stimulated neural activity — of the recalled events. This sort of effect has been observed in numerous psychology experiments, though admittedly we’re pretty much clueless when it comes to understanding the mechanism.
What occurred to me is that there probably isn’t a qualitative difference between what happens when a classical pianist plays a Chopin piano concerto and jazz pianist plays a jazz standard with improvisation. [If you don’t immediately think of Stevie Wonder as a jazz artist or “Alfie” as a jazz standard, then think of John Coltrane playing “Lush Life” by Billy Strayhorn.] Both performances involve creative recall and rendering and both result in the associated memories being changed. Moreover — and here’s where the connection to Feynman’s theorising comes in — I wouldn’t be surprised to find out that even the most creative acts, such as writing music and creating mathematics, can be characterized as improvisation built on a largely routinized scaffolding and carried out jointly by the cerebral and cerebellar cortices — like the first time you sent a text message while riding a bike.
2 In , during the training phase, each voxel is modeled as a weighted (linear) combination of Gabor-like filters spanning a range of locations, orientations and spatial frequencies. During the image-identification phase, we select the image with a predicted activity that is most similar to the observed activity. In , training is similar with the difference that, instead of using gabor filters as a basis, we use a set of semantic features each of which encodes the co-occurrence of a particular word — or a small set of related words — with the target stimuli. In the evaluation phase, we select the word with predicted activity is the most similar to the observed activity.
A motion sequence may be represented as a single pattern in x-y-t space; a velocity of motion corresponds to a three-dimensional orientation in this space. Motion information can be extracted by a system that responds to the oriented spatiotemporal energy. We discuss a class of models for human motion mechanisms in which the first stage consists of linear filters that are oriented in space-time and tuned in spatial frequency. The outputs of quadrature pairs of such filters are squared and summed to give a measure of motion energy. These responses are then fed into an opponent stage. Energy models can be built from elements that are consistent with known physiology and psychophysics, and they permit a qualitative understanding of a variety of motion phenomena.
We propose a model of how humans sense the velocity of moving images. The model exploits constraints provided by human psychophysics, notably that motion-sensing elements appear tuned for two-dimensional spatial frequency, and by the frequency spectrum of a moving image, namely, that its support lies in the plane in which the temporal frequency equals the dot product of the spatial frequency and the image velocity. The first stage of the model is a set of spatial-frequency-tuned, direction-selective linear sensors. The temporal frequency of the response of each sensor is shown to encode the component of the image velocity in the sensor direction. At the second stage, these components are resolved in order to measure the velocity of image motion at each of a number of spatial locations and spatial frequencies. The model has been applied to several illustrative examples, including apparent motion, coherent gratings, and natural image sequences. The model agrees qualitatively with human perception.
6 Micro-stimulation “is a powerful tool for establishing causal relationships between physiologically characterized neurons and behavioral performance.” It does have its limitations since “the electrical pulses evoked by micro-stimulation simultaneously excite many neurons in the neighborhood of the electrode tip. Therefore, successful application of micro-stimulation relies upon structural regularities within the cortex, such as a clustering of neurons with comparable stimulus selectivities.” — from Verhoef et al .
It seems like most object recognition models/attempts are committed to context independence/isolation from background, e.g. in the DiCarlo paper, identifying a car in a city scene and in a grassland. I understand that this is most consistent with the feed-forward, “independent assembly line workers” model of our vision system. However, there’s a lot of research on priming in cognitive science (e.g. mere exposure preference, flashing people images of faces with happy vs. angry expressions, cross-modal situations where recall of a certain word is facilitated if the person does a motion consistent with that word, etc.). Efficient priming/inference from context also appears evolutionarily advantageous, e.g. when hunting/gathering, targets probably correlate to the environment. Is human object recognition facilitated by context? Have there been attempts to model the influence of context, e.g. as adjusted priors of some sort, or is this inconsistent with the "assembly line" view/generalization of the object recognition algorithm?
With respect to using a “bag of image patches” as features, what about extracting only the dominant edges and treating their relative positions as features (e.g. a prototypical chair has four vertical components intersecting a perpendicular horizontal one, with another vertical set for the back of the chair)? Seems like this higher-level configuration/structural essence/“line drawing” of a chair would be more consistent across poses/scales and more compact in high-dimensional space. Maybe this is equivalent to the aggregation of image patches at an advanced-enough level in the hierarchical cortical representation.
Is anyone pushing for unsupervised learning on training data that is on the same order of magnitude and has the same temporal continuity as what humans see during development? DiCarlo cites “approximately 100 million image translation experiences per year of life”, so the practical constraints are intimidating. Perhaps the crux of the problem is not in our understanding of the learning algorithm but in having such impoverished input data compared to the human brain (about two years of visual experience prior to stable object recognition? I guess this is hard to quantify specifically since our best metric for pre-verbal infants is “gaze duration.”
The largest computer network of the neocortex yet built was constructed by Eugene Izhikevich and Gerald Edelman. Their three-dimensional model consisted of 100,000 neurons exhibiting some known cortical firing patterns. Each excitatory neuron was randomly connected to 75 local and 25 distant targets. Twenty percent of the neurons were GABAergic and wired locally, mimicking the proportions in the mammalian cortex. Despite such dense anatomical wiring, involving more than 7 million excitatory connections, neurons in the model remained dead silent unless external noise was provided to each neuron. At low levels of input noise, the system sustained oscillatory patterns with spatially uniform activity. High levels of input noise gave rise to asynchronous Poisson patterns of spiking activity that led to organized, sustained patterns. Other computer networks do not fare better. In contrast to the brain, most current models of the brain or pieces of the brain do not give rise to true spontaneous patterns without some externally supplied noise. They either are dead silent or generate avalanches of activity involving nearly the whole population. The usual explanation is that the network is not large and complex enough and therefore cannot generate enough noise. However, computer networks, including the supersized systems, fail to generate enough internal noise necessary for observing some desired patterns.
How large should a system be to generate continuous spontaneous patterns? My answer is that size is not the (only) issue. Even a very small real brain or neuronal networks with just a few dozen neurons can solve complex problems that would make man-made computer-controlled robots jealous. All real brains, small and large, possess spontaneous activity because they are complex enough. However, complexity does not simply emerge from increasing the number of constituents. Neuronal systems that consist of glutamatergic excitatory and GABAergic inhibitory neurons do not do much else than generate large epileptiform population discharges interrupted by silence. Indeed, this is exactly what an isolated piece of the mammalian cortex does. Fetal cortical tissue transplanted into the anterior chamber of the eye or into a blood-supplying cavity in the cortex generates synchronous discharges of various sizes followed by pauses of various lengths, a behavior not much different from that of sand piles. Isolated cortical slabs and cortical neurons grown as two-dimensional tissue culture generate similar burst/pause patterns. When isolated from their subcortical inputs, the nearly two million neurons in the rat hippocampus just sit and wait to be part of a giant collective scream. These intermittent patterns are a far cry from the 1/f dynamics of the intact mammalian cortex.
Applying external noise to a network is convenient, but it has some inconvenient consequences. In Izhikevich’s large model, noise intensity had to be increased fiovefold to shift the system from avalanches to irregular patterns. At this high level of noise, synchrony occurred only in response to strong external inputs, and the average firing rate of neurons doubled. Most important, 10 percent of all action potentials occurred in response to the externally applied noise rather than to internally generated synaptic activity. This seems like an inefficient system because such a large percentage of spikes are devoted to noise production. In models, this may not be such a big problem. However, energy calculations indicate that the brain cannot afford to waste so many spikes. Doubling the firing rate of neocortical neurons would exhaust their energy resources within minutes. Furthermore, spikes generated by noise will propagate activity and interfere with signal-related computation. So, is noise just a waste or is there something good about it? Find out in Cycle 6.
The collective behavior of neurons, summed up crudely as the mean field (EEG and MEG) is a blend of rhythms. Neuronal networks in the mammalian cortex generate several distinct oscillatory bands, covering frequencies from < 0.05 hertz to > 500 hertz. These neuronal oscillators are linked to the much slower metabolic oscillators. The mean frequencies of the experimentally observed oscillator categories form a linear progression on a natural logarithmic scale with a constant ratio between neighboring frequencies, leading to the separation of frequency bands. Because the ratios of the mean frequencies of the neighboring cortical oscillators are not integers, adjacent bands cannot linearly phase-lock with each other. Instead, oscillators of different bands couple with shifting phases and give rise to a state of perpetual fluctuation between unstable and transient stable phase synchrony. This metastability is due to the presence of multiple coupled oscillators that perpetually engage and disengage each other. The resulting interference dynamics are a fundamental feature of the global temporal organization of the cerebral cortex. The power density of the EEG or local field potential is inversely proportional to frequency (f) in the mammalian cortex.
This 1/fα power relationship implies that perturbations occurring at slow frequencies can cause a cascade of energy dissipation at higher frequencies, with the consequence that widespread slow oscillations modulate faster local events. The scale freedom, represented by the 1/fα statistics, is a signature of dynamic complexity, and its temporal correlations constrain the brain’s perceptual and cognitive abilities. The 1/fα (pink) neuronal “noise” is a result of oscillatory interactions at several temporal and spatial scales. These properties of neuronal oscillators are the result of the physical architecture of neuronal networks and the limited speed of neuronal communication due to axon conduction and synaptic delays.
Brain evolution opted for a complex wiring pattern in the mammalian cortex. The resulting 1/fα temporal statistics of mean field are the hallmark of the most complex dynamics and imply an inherently labile, self-organized state. Although brain states are highly labile, neuronal avalanches are prevented by oscillatory dynamics. Most oscillations are transient but last long enough to provide stability for holding and comparing information at linear time scales. Scale-free dynamics generate complexity, whereas oscillations allow for temporal predictions. Order in the brain does not emerge from disorder. Instead, transient order emerges from halfway between order and disorder from the territory of complexity. The dynamics in the cerebral cortex constantly alternate between the most complex metastable state and the highly predictable oscillatory state: the dynamic state transitions of the brain are of the complexity-order types. When needed, neuronal networks can shift quickly from a highly complex state to act as predictive coherent units due to the deterministic nature of oscillatory order.
9 “Is there anything about the brain that is fundamentally incompatible with this framework? One difficulty is that neurons can interact outside the confines of synapses. For example, neurotransmitter molecules might escape from one synapse and diffuse away to be sensed by more distant neurons. This could lead to interactions between neurons not connected by a synapse, or even between neurons that do not actually contact each other.” — see page 269 in .
10 Variants of hashing relevant to our discussion include:
conventional hashing — hash functions map objects into a large table — collisions are undesirable and require additional effort to resolve, e.g., secondary hashes or serial search;
11 In a paper in SCIDAC Review, Patterson and his co-authors laid out the challenges and opportunities for many-core in terms of these thirteen Dwarfs — see John Shalf, Jon Bashor, Dave Patterson, Krste Asanovic, Katherine Yelick, Kurt Keutzer and Tim Mattson. The Many-Core Revolution: Will HPC Lead or Follow? SCIDAC Review, Fall, 2009.
12 Meister identifies fifty different retinal cell types and in particular fifteen different ganglion cells each of which provides a different signal to the brain often completely different areas. He debunks the myth that the retinal is all center surround cells and outlines his research agenda of working backwards from the visual processing requirements of the organisms to circuits that facilitate the associate behaviors. In the Heller lecture, he spends some time describing a decoder for retinal activity used by the salamander in catching prey. He begins the lecture by listing three sources of motion each of which he claims there are particular retinal circuits designed to assist in processing. Here’s an excerpt from the abstract for his Allen Institute lecture entitled “Neural computations in the retina”:
In order to understand how circuits work, it is critical to recognize and know about different cell types, postulated Dr. Meister. At this point we do not have good tools to understand multiple inputs and complexity of circuits. Dr. Meister focuses on the retina to identify individual cell types and the computations they perform; he has identified 20 different types of retinal ganglion cells alone. Through distribution profile measures of ganglion cells he has found that these are indeed discrete cell types, not a continuum of similar cells, which the brain recognizes as distinct. Each cell type is spread out over the retina with apparent repulsion properties such that their processes cover all space without overlapping (tiling principle). Thus, the visual input is captured in twenty distinct channels and sent on to more central processing centers. Further, he took the audience from cell types to behavior. He showed that closely related stimuli (e.g., a dark spot from above compared to a dark spot on the ground) can elicit either a very strong response in a cell or no response at all, accompanied by a fear response in the mouse or no response at all , but practically nothing in between. This is evidence for the idea that cell types have evolved to meet an animal’s needs, and that discrete cell types respond to specific stimuli and can thus elicit specific behaviors. In the search for cell types Dr. Meister pointed out that neural systems have hundreds of types of components, which may be more important than the fixed statistic that there are 86 billion neurons in the human brain.
13 Today, Monday, April 2, Radiolab host Robert Krulwich along with science writer Carl Zimmer, will be hosting a duel between MIT’s Sebastian Seung and Anthony Movshan of New York University. The two will “duke it out” — Robert Krolwich’s wording — on the campus of Columbia University and the topic concerns the plausibility of a “Jennifer Aniston” neuron.