Finite State Dynamical Systems

This page is under construction.
In this section, we consider various formalisms for representing dynamical systems with finite state spaces. See [Moore, 1956] ``Gedanken Experiments on Sequential Machines'' for an introduction to finite automata for modeling discrete time and space dynamical systems.

Deterministic Finite Automata

We begin with a generalized characterization of deterministic finite automata. Later we consider a somewhat restricted version that maps better to results in formal language theory. A deterministic finite automaton (DFA) is a six tuple, M=(Q,B,Y,f,q_0,h), where Q is a finite nonempty set of states, B is a finite nonempty set of inputs, Y is a finite nonempty set of outputs, f is the transition function, f:Q times B -> Q, q_0 is the initial state, and h is the output function, h: Q -> Y.

The problem of inferring the smallest DFA consistent with a set of input/output pairs is NP-complete [Gold, 72] [Gold, 78] [Angluin, 78]. Even finding a DFA polynomially close to the smallest is intractable assuming P!=NP [Pitt & Warmuth, 89]. Kearns and Valiant [1989] show that predicting the outputs of an unknown DFA on inputs chosen from an arbitrary probability distribution is as hard as computing certain apparently hard number-theoretic predicates.

Angluin [Angluin, 1987], building on the work of Gold [1972], provides a polynomial-time algorithm for inferring the smallest DFA given the ability to reset the automaton to the initial state at any time and a source of counterexamples. In Angluin's model, at any point, the learner can hypothesize a DFA and the source of counterexamples will indicate if it is correct and, if it is not, provide a sequence of inputs on which the hypothesized and actual DFAs generate different outputs.

Let A = B^* denote the set of all finite sequences of inputs, and |a| denote the length of the sequence a in A. Let q be the sequence of outputs of length |a|+1 resulting from executing the sequence a starting in q, and qa be the final state following the execution of the sequence a starting in q. An automaton is said to be ``reduced'' if, for all q_1 != q_2 in Q, there exists a in A such that q_1 != q_2. A reduced automaton is used to represent the ``discernable'' structure of a dynamical system; you cannot expect a learner to discern the difference between two states if no sequence of inputs and outputs serves to distinguish them. A ``homing sequence,'' a in A, has the property that, for all q_1,q_2 in Q, q_1 = q_2 implies q_1a = q_2a. Every automaton has a homing sequence; however, the shortest homing sequence may be as long as |Q|^2 [Rivest & Schapire, 1989].

A sequence, a in A, is said to be a ``distinguishing sequence'' if, for all q_1,q_2 in Q, q_1 = q_2 implies q_1 = q_2. Note that every distinguishing sequence is a homing sequence, but not the other way around. A distinguishing sequence is a sequence of inputs such that for any starting state the sequence of outputs associated with the states encountered in executing that sequence uniquely identifies the starting state.

Rivest and Schapire [1989] show how to dispense with the reset in the general case, and how to dispense with both the reset and the source of counterexamples in the case in which a distinguishing sequence is either provided or can be learned in polynomial time [Rivest & Schapire, 1987] [Schapire, 1991].

Yannakakis and Lee [1991] have shown that it is PSPACE-complete to determine whether or not a DFA has a distinguishing sequence. (There exist machines whose shortest distinguishing sequence is exponential in length.) However, they also show that it can be determined in polynomial time whether a DFA has an ``adaptive'' distinguishing sequence, and, if so, find such a sequence of length O(|Q|^2) in polynomial time. An adaptive distinguishing sequence (distinct from a ``preset'' distinguishing sequence as defined above) is not really a sequence at all but rather a decision tree whose branches are correspond to outputs.

Stochastic Finite Automata

In a stochastic finite automaton (SFA), f(q,a) and h(q) are random functions and Pr(f(q,a)|q,a) and Pr(h(q)) define the state transition and output probabilities respectively. The general SFA enables us to model uncertainty in state transitions and outputs.

A hidden Markov model (HMM) [Rabiner, 1989] corresponds to an SFA in which there is only one possible input. Figure 1 depicts the general form for an HMM as a graphical model.

                q     f(q) depends probabilistically on q
                o ----> o
                |       |
                |       |
                v       v 
                o       o
               h(q) depends probabilistically on q 

        Fig. 1: General form for a hidden Markov model 
The HMM training problem is the problem of approximating an arbitrary, unknown source distribution by distributions generated by HMMs.

Abe and Warmuth [1992] show that HMMs are not trainable in time polynomial in the alphabet size unless RP = NP. Further, they show that even if we allow the algorithm to run exponential in the alphabet size, there are no known algorithms that run in time polynomial in the number of states of the target HMM.

A Markov chain [Kemeny & Snell, 1960] can be thought of as an SFA with deterministic output function that serves to uniquely identify states (e.g., for all q in Q, h(q) = q). (You might also imagine the case of SFAs in which outputs are deterministic but ambiguous.)

Markov chains are said to have the the Markov property: the probability of the next state is independent of past states given the immediately prior state.

We can generalize the Markov property and speak of the L-Markov property: the probability of the next state is independent of states more than L states in the past given the last L states. Stochastic processes that satisfy the L-Markov property but not the L-1-Markov property are said to be order-L Markov chains.

In a probabilistic finite automata (PFA), the output in state q is a random function of q, and the state following q is a deterministic function of q and the output in q. Figure 2 depicts the general form for a PFA as a graphical model. Autoregressive (AR) models represent a special case of PFAs.

                q    f(q,h(q)) deterministically depends on q and h(q)
                o ----> o
                |    /  |
                |   /   |
                |  /    |
                | /     |
                v       v 
                o       o
              h(q) probabilistically depends on q

        Fig. 2: General form for a probabilistic finite automaton
Note that the PFA in Figure 2 can be transformed into the general form for a HMM by augmenting the state space so that Q' = Q times Y, the augmented state transitions are probabilistic f': Q' -> Q', and the output function is deterministic h': Q' -> Y. This is a completely observable (nonhidden) Markov process.

An acyclic probabilistic finite automaton (APFA) [Ron et al., 1995] is a PFA in which there are no cycles in the state-transition diagram. Note that it may be useful for short-term prediction to assume acyclicity.

Ron et al. [1994] introduce a restriction on PFAs called probabilistic finite suffix automata (PFSA). PFSA is a PFA in which each state in Q has a unique label corresponding to a sequence (string) of outputs and, moreover, the label for f(q,h(q)) is a suffix of the sequence defined by concatenating the output h(q) to the label for q. The set of strings labeling states is suffix free: a set of strings S is suffix free if for all s in S, s is the only suffix of s in S. An L-PFSA is a PFSA in which each state is labeled by a string of length at most L. Note that an L-PFSA can be modeled by an order-L Markov chain. L-PFSAs can be thought of as Markov chains of variable order. To be clear on what it means to be a PFSA consider the following example.

            t = 1       2       3       4       5       6

                o ----> o ----> o ----> o ----> o ----> o
                |    /  |    /  |    /  |    /  |    /  |
                |   /   |   /   |   /   |   /   |   /   |
                |  /    |  /    |  /    |  /    |  /    |
                | /     | /     | /     | /     | /     |
                v       v       v       v       v       v 
                o       o       o       o       o       o

       h(q_t) = 0       1       1       0       1       0

          q_2 is some (nonempty) suffix of 
                the label of q_1 concatenated with 0
          q_3 is some (nonempty) suffix of 
                the label of q_2 concatenated with 1
          q_4 ...
Let n be an upper bound on |Q|. Ron et al. [1994] show that it is possible to learn a model for an L-PFSA that is epsilon close (using the Kullback-Leibler divergence to measure the distance between the distributions corresponding to the hypothesized model and the target L-PFSA) with probability 1-delta in time polynomial in L, n, |Y|, 1/epsilon, and 1/delta. Ron et al. [1995] show similar results (with better bounds) for learning APFAs.

Equivalent Models

In this section, we consider the relationship between dynamical models considered as recognizers for languages. We begin with the definition for deterministic finite automata given by [Aho, Hopcroft, & Ullman, 1974]. This definition is different from the more general description given earlier.

A DFA is a five tuple (Q,q_0,U,A,f) consisting of a set of states, an initial state, a set of inputs, a set of accepting states, and a state transition function f:Q cross U -> Q. We say that a DFA accepts a string t = u_0,...,u_k in U^* if and only if there exists a sequence of states and inputs q_0,u_0,q_1,...,u_k,q_{k+1} such that f(q_i,u_i) = q_{i+1} for 0 =< i =< k and q_{k+1} is in A.

  Here is the graphical model. 

        q    f(q,u)     
        o ----> o
             /           Here is the state transition diagram.
           /                    q      u     f(q,u)
        u /                     o ------------> o
        o       o
A (deterministic) hidden Markov model (HMM) is a five tuple (Q,q_0,Y,h,f), consisting of a set of states, an initial state, a set of outputs, an output function h:Q -> Y, and a state transition function f:Q -> Q. In the following, we generalize to allow nondeterministic transitions so that f is a relation Q -> 2^Q where 2^Q denotes the power set of Q. We say that an HMM accepts a string t = y_0,...,y_k in Y^* if and only if there exists a sequence of states and outputs q_0,y_0,q_1,...,y_k,q_k,y_{k+1},... such q_{i+1} is in f(q_i) and h(q_i) = y_i for 0 =< i =< k and h(q_i) = null symbol for k+1 =< i =< infinity.
  Here is the graphical model. 

        q      f(q)
        o ----> o        Here is the state transition diagram.
        |       |               
        |       |               q             f(q)
        v       v               o ------------> o
        o       o              h(q)          h(f(q))

How do you model a DFA M = (Q,q_0,U,A,f) using an HMM?

To model accepting states in the HMM, create a set of states Q' = Q union {a} where a is new state.
To model outputs let Y = U union {null} where null is the null symbol.
To convert from inputs of the DFA to outputs of the HMM, let Q'' = Q' cross U, f'((q,u)) = {(q',u') : f(q,u') = q' for some u' in U}, f'((a,u)) = {(a,null)}, and h((q,u)) = u.
The equivalent HMM is M' = (Q'',(q_0,null),Y,h,f'). Prove that M accepts a string t in U^* if and only if M' accepts t.

How do you model an HMM M = (Q,q_0,Y,h,f) using a DFA?

Let f'(q,h(q')) = q' for each q' in f(q).
Let A be the unique largest set of all states in Q such that for h(q) = null and f(q) is contained in A.
The equivalent DFA is M' = (Q,q_0,Y,A,f'). Prove that M accepts a string t in Y^* if and only if M' accepts t.


1. We define a simple HMM to be an HMM in which h(q) = null if and only if f(q) = q. Prove that every HMM interpreted as language recognizer can be modeled by a simple HMM.

2. Prove the pumping lemma for HMMs.

Back to Tutorial