# Introduction to Learning Dynamical Systems

This page is under construction. This is the introductory section for the tutorial on learning dynamical systems. Like all of the sections of the tutorial, this section provides some very basic information and then relies on additional readings and Mathematica notebooks to fill in the details. The rest of this section is organized in terms of some frequently asked questions, cursory answers, and links and references providing more detailed answers.
Why do we care about dynamical systems?

Dynamical systems are mathematical objects used to model physical phenomena whose state (or instantaneous description) changes over time. These models are used in financial and economic forecasting, environmental modeling, medical diagnosis, industrial equipment diagnosis, and a host of other applications.

For the most part, applications fall into three broad categories: predictive (also referred to as generative), in which the objective is to predict future states of the system from observations of the past and present states of the system, diagnostic, in which the objective is to infer what possible past states of the system might have led to the present state of the system (or observations leading up to the present state), and, finally, applications in which the objective is neither to predict the future nor explain the past but rather to provide a theory for the physical phenomena. These three categories correspond roughly to the need to predict, explain, and understand physical phenomena.

As an example of the latter, a scientist might offer a theory for a particular chemical reaction in terms of a set of differential equations involving temperature, pressure, and amounts of compounds. The scientist's theory might be used to predict the outcome of an experiment or explain the results of a reaction, but from the scientist's point of view the set of equations is the object of primary interest as it provides a particular sort of insight into the physical phenomena.

Predictive and diagnostic reasoning are often described in terms of causes and effects. Prediction is reasoning forward in time from causes to effects. Diagnosis is reasoning backward from effects to causes. In the case of medical diagnosis, the effects correspond to observed symptoms (also called findings) and the causes correspond to diseases.

Not all physical phenomena can be easily predicted or diagnosed. Some phenomena appear to be highly stochastic in the sense that the evolution of the system state appears to be governed by influences similar to those governing the role of dice or the decay of radioactive material. Other phenomena may be deterministic but the equations governing their behavior are so complicated or so critically dependent on accurate observations of the state that accurate long-term observations are practically impossible.

Before defining dynamical systems more precisely, we consider some particular instances of prediction. Consider the problem of predicting the behavior of financial instruments.

``Money is just a type of information, a pattern that, once digitized, becomes subject to persistent programmatic hacking by the mathematically skilled. As the information of money swishes around the planet, it leaves in its wake a history of its flow, and if any of that complex flow can be anticipated, then the hacker who cracks the pattern will become a rich hacker.'' - Cracking Wall Street, Kevin Kelly, Wired, July, 1994.

The above quote appeared in a magazine piece about Doyne Farmer, Norm Packard, and the Prediction Company. The story is nominally about financial forecasting, but the characters are quite interesting in their own right. Back issues of Wired are available on the web if you're interested in browsing. Access requires registration which is free but takes a couple of minutes to read and answer some questions.

John Casti also provides an excellent introduction to the problems inherent in predicting weather, the stock market, and other complex phenomena in his book ``Complexification'' (Harper Collins, 1994) and its predecessor ``Searching for Uncertainty: What Scientists Can Know About the Future'' (Morrow, 1991). In particular, ``Searching for Certainty'' is concerned with problems of ``scientific prediction and explanation of everyday events like the weather, stock market price movements and the outbreak of warfare.'' Casti's books are written for a general audience, but they offer solid, intuitive explanations of topics like deterministic chaos and algorithmic complexity coupled with interesting examples and entertaining anecdotes.

What is a dynamical system?

The answer to the above question requires that we are precise about the notions of ``system state'' and ``evolution of the system state over time.'' The system state at time t is an instantaneous description of the system which is sufficient to predict the future states of the system without recourse to states prior to t. The evolution of the system state refers to a sequence or continuous trajectory through the space of possible system states. The space of possible system states is called the state space of the dynamical system.

Our treatment of dynamical systems borrows from the work of Kalman. The suggested reading comes from Chapter 2 of ``Planning and Control'' by Thomas Dean and Michael Wellman, published by Morgan Kaufmann, 1991. Note when you have finished reading this material you should be comfortable with such terms as: dynamical system, controller, observer, system equation, system state. If you are already familiar with these terms, then you can probably skip the reading.

We note that Kalman's formulation of dynamical system makes no assumptions regarding characteristics (e.g., continuous versus discrete, stochastic versus deterministic) of the underlying physical processes. It should be clear that a dynamical system is an abstract mathematical object that allows us to talk about physical processes. Necessarily we impose a perspective and a set of assumptions regarding the physical processes of interest.

What is a ``model'' of a dynamical system?

Whenever you read the word ``model,'' make sure that you understand the context. In evaluating a learning system, we will assume some model for the dynamical system; this model is unknown to the learning system. We assume that this model accurately describes the behavior of the physical process insofar as we are interested. In fact, this model may be incomplete in important ways. In many practical situations, the best ``model'' may be the physical process itself, i.e., the available data corresponding to observations of the process. Examples of models include systems of differential and difference equations, finite state machines and their stochastic counterparts, and sets of axioms in a temporal logic. Finite sequences of real numbers or sequences of vectors with real of integer components also constitute models, albeit relatively simple ones.

A learning system attempts to infer a model of the dynamical system. So in learning there is the model of the physical phenomena or target and then there is the model inferred by the learning system. In the terminology of computational learning theory, the model of the target is called a ``concept'' and model inferred by the learning system is called a ``hypothesis.'' In describing learning problems more generally, we will often refer to the space of possible concepts, the ``concept space'', and the space of possible hypotheses, the ``hypothesis space.'' If this terminology is foreign to you, read an introductory text or tutorial on learning theory.

What are the characteristics of dynamical systems?

This is different from the question ``what are the characteristics of physical processes?'' Our understanding of physical processes is limited by our ability to model them mathematically, and so, as far as we are concerned, the characteristics of dynamical systems are the characteristics of mathematical models, e.g., linear, nonlinear, deterministic, stochastic, discrete, continuous. One of the most important characteristics of dynamical systems concerns the notion of observability.

In some control problems, we are given a model for the dynamical system so we can predict the future of the system if we are able to observe the current state or accurately estimate the current state from a sequence of observations.

In many practical learning problems in which we are trying to infer a model for a dynamical system, we cannot directly observe the system state but only aspects of the state perhaps corrupted by (observation) noise. For example, in financial prediction, the observations might correspond to stock prices and the noise might be due to small errors in recording the time of stock transactions. Presumably, there are other factors besides the instantaneous price of stocks that are important in predicting the behavior of the stock market, e.g., news of a change in the interest rate.

What does it mean to ``infer a model?''

Inference is search in a space of models (the hypothesis space) guided by some criterion for performance. Performance is measured in terms of difference between the proferred hypothesis (learned model) and the target concept (actual model or proxy in the form of available data). Techniques depend on the hypothesis space and the performance criterion. At the risk of introducing terms before they are defined, we might search the space of hidden Markov models using gradient search methods looking for the maximum likely model given the data. Expectation Maximization (EM) and Minimum Description Length (MDL) suggest general performance criteria that we consider in subsequent sections.

Some practitioners decompose the task of inference into two stages. The first stage is a preprocessing stage in which the data is conditioned to remove the effects of ``noise.'' In the second stage, the conditioned data is used to search for a model. In some cases, the sources of noise should be considered as an integral part of the dynamical system and therefore deserving of a more central role in inference. In cases where the sources of noise are well understood, it makes sense to use signal processing techniques to filter the noise and simplify subsequent inference.

There are a variety of inference problems involving dynamical systems. Here we mention a few of the more important problems that we will revisit often in subsequent sections of the tutorial.

System identification refers to the search for a model (typically used for prediction). The model is usually selected from a hypothesis space represented by a parameterized set of functions, e.g., all polynomials of degree 3 or less. Prediction or forecasting requires that given sequence of observations we predict future observations, e.g., the price of Microsoft stock at noon tomorrow. Prediction may involve inferring a model (system identification) but need not. State Regulation refers to the problem of inferring a ``control law'' that will generate inputs to a dynamical system and thereby control the behavior of the system according to some desired behavior.

What does it mean to infer a ``good'' model?

This is a very difficult question to answer. For the most part, we refrain from offering an answer but exercise some editorial freedom in selecting papers that provide answers in keeping with our views.

We tend toward a Bayesian approach to inference. You will see numerous expositions of Bayesian approaches to learning in the rest of this tutorial, but we will sketch the ideas briefly here. From a Bayesian perspective, the task of learning is to find a model h (hypothesis) that is most likely given the data d, i.e., find h maximizing Pr(h|d). By Bayes rule we have Pr(h|d)=Pr(d|h)Pr(h)/Pr(d). So-called mixture models combine hypotheses using a weighting scheme based Pr(h|d). In simpler schemes, we choose a single hypothesis, and, since Pr(d) is fixed for all hypotheses, we can ignore Pr(d).

Making explicit our assumptions regarding the (prior) probability of models (Pr(h)) greatly simplifies the comparison of different performance criteria. In some cases, we will find it useful to generalize the Bayesian view to encompass the Bayesian decision theory. From a decision-theoretic perspective, the objective is to find a model the minimizes the Bayes risk or, equivalently, maximizes the expected utility. Note that in some cases you may choose to use a model for prediction that is not the most likely if the expected utility of using the most likely model is low or the variance is high in comparison with some other, less likely model. Bayesian decision theory allows us to distinguish good answers from bad outcomes.

We should note that several of the papers covered in this tutorial employ tools that at first blush appear to be quite foreign to the Bayesian perspective. For example, you will encounter tools from statistical mechanics for reasoning about energy and tools from coding and information theory for reasoning about entropy and complexity. There are numerous articles demonstrating that these diverse, sometimes arcane tools from other disciplines are in fact just alternative methods for talking about probabilities. In some cases, the mathematics developed in these other disciplines provides just the right tools for manipulating the probabilities understood as energies, entropies, or whatever. When appropriate we will point you to articles providing the necessary connections and equivalences.

Is there a more intuitive, perhaps visual method for thinking about dynamical systems?

Several disciplines have worked out graphical languages that capture many of the qualitative aspects of dynamical systems. In the next installment, we explore graphical models in more detail.

## Exercises

See LorenzEquations.m for an example of a continuous-time chaotic dynamical system and LogisticFunction.m for an example of a discrete-time chaotic dynamical systems.

Cellular automata are special cases of dynamical systems corresponding to finite state machines. For more on cellular automata see CellularAutomata.m

The notebook TimeSeries.m contains examples of time series generated by dynamical systems. This notebook can be obtained on line and contains code from ``Applied Mathematica,'' by William T. Shaw and Jason Tigg, published by Addison-Wesley in December 1993. This book shows how Mathematica can be used to solve problems in the applied sciences. Back to Tutorial