In the following sections we present a survey of intrinsic computation across a wide range of process classes. We think of a class of systems as given by equations of motion, or other specification for a stochastic process, that are parametrized in some way—a pair of control parameters in a one-dimensional map or the energy of a Hamiltonian. The space of parameters, then, is the concrete representation of the space of possible systems, and a class of system is a subset of the set of all possible processes. A point in the parameter space is a particular system whose intrinsic computation we will summarize by a pair of numbers—one a measure of randomness, the other a measure of structure. In several cases, these measures are estimated from sequences generated by the temporal or spatial process.
Ising spin systems
We now investigate the complexity-entropy diagrams of the Ising model in one and two spatial dimensions. Ising models are among the simplest physical models of spatially extended systems. Originally introduced to model magnetic materials, they are now used to model a wide range of cooperative phenomena and order-disorder transitions and, more generally, are viewed as generic models of spatially extended, statistical mechanical systems.
89,90 Like the logistic and tent maps, Ising models are also studied as an intrinsically interesting mathematical topic. As we will see, Ising models provide a useful contrast with the intrinsic computation seen in the interval maps.
Specifically, we consider spin-
1/2 Ising models with nearest (NN) and next-nearest neighbor (NNN) interactions. The Hamiltonian (energy function) for such a system is
where the first (second) sum is understood to run over all NN (NNN) pairs of spins. In one dimension, a spin’s nearest-neighbors will consist of two spins, one to the right and one to the left, whereas in two dimensions a spin will have four nearest neighbors—left, right, up, and down. Each spin
Si is a binary variable,
Si ∊ {−1,+1}. The coupling constant
J1 is a parameter that when positive (negative) makes it energetically favorable for NN spins to (anti-)align. The constant
J2 has the same effect on NNN spins. The parameter
B may be viewed as an external field; its effect is to make it energetically favorable for spins to point up (i.e., have a value of
+1) instead of down. The probability of a configuration is taken to be proportional to its Boltzmann weight; the probability of a spin configuration
C is proportional to
e−βH(C), where
β = 1/T is the inverse temperature.
In equilibrium statistical mechanics, the entropy density is a monotonic increasing function of the temperature. Quite generically, a plot of the entropy
hμ as a function of temperature
T resembles that of the top plot in Fig.
6 shown below. Thus,
hμ may be viewed as a nonlinearly rescaled temperature. One might ask, then, why one might want to plot complexity versus entropy: Is a plot of complexity versus temperature qualitatively the same? Indeed, the two plots would look very similar. However, there are two major benefits of complexity-entropy diagrams for statistical mechanical systems. First, the entropy captures directly the system’s unpredictability, measured in bits per spin. The entropy thus measures the system’s information processing properties. Second, plotting complexity versus entropy and not temperature allows for a direct comparison of the range of information processing properties of statistical mechanical systems with systems for which there is not a well defined temperature, such as the deterministic dynamical systems of the previous section or the cellular automata of the subsequent one.
One-dimensional Ising system
We begin by examining one-dimensional Ising systems. In Refs.
25,74,91 two of the authors developed exact, analytic transfer-matrix methods for calculating
hμ and
E in the thermodynamic
(N→∞) limit. These methods make use of the fact the NNN Ising model is order-2 Markovian. We used these methods to produce Fig.
4, the complexity-entropy diagram for the NNN Ising system with antiferromagnetic coupling constants
J1 and
J2 that tend to anti-align coupled spins. The figure gives a scatter plot of
105 (hμ,E) pairs for system parameters that were sampled randomly from the following ranges:
J1 ∊ [−8,0],
J2 ∊ [−8,0],
T ∊ [0.05,6.05], and
B ∊ [0,3]. For each parameter realization, the excess entropy
E and entropy density
hμ were calculated. Fig.
4 is rather striking—the
(hμ,E) pairs are organized in the shape of a “bat cape.” Why does the plot have this form?
Recall that if a sequence over a binary alphabet is periodic with period p, then E = log2 p and hμ = 0. Thus, the “tips” of the bat cape at hμ = 0 correspond to crystalline (periodic) spin configurations with periods 1, 2, 3, and 4. For example, the (0, 0) point is the period-1 configuration with all spins aligned. These periodic regimes correspond to the system’s different possible ground states. As the entropy density increases, the cape tips widen and eventually join.
Figure
4 demonstrates in graphical form that there is organization in the process space defined by the Hamiltonian of Eq.
(22). Specifically, for antiferromagnetic couplings,
E and
hμ values do not uniformly fill the plane. There are forbidden regions in the complexity-entropy plane. Adding randomness
(hμ) to the periodic ground states does not immediately destroy them. That is, there are low-entropy states that are almost-periodic. The apparent upper linear bound is that of Eq.
(15) for an order-2 Markov chain,
E ⩽ 2−2hμ.
In contrast, in the logistic map’s complexity-entropy diagram (Fig.
2) one does not see anything remotely like the bat cape. This indicates that there are no low-entropy, almost-periodic configurations related to the exactly periodic configurations generated at zero-entropy along the period-doubling route to chaos. Increasing the parameter there does not add randomness to a periodic orbit. Rather, it causes a system bifurcation to a higher-period orbit.
Two-dimensional Ising model
Thus far we have considered only one-dimensional systems, either temporal or spatial. However, the excess entropy can be extended to apply to two-dimensional configurations as well; for details, see Ref.
92. Using methods from there, we calculated the excess entropy and entropy density for the two-dimensional Ising model with nearest- and next-nearest-neighbor interactions. In other words, we calculated the complexity-entropy diagram for the two-dimensional version of the system whose complexity-entropy diagram is shown in Fig.
4. There are several different definitions for the excess entropy in two dimensions, all of which are similar but not identical. In Fig.
4 we used a version that is based on the mutual information and, hence, is denoted
EI.
92
Figure
5 gives a scatter plot of 4500 complexity-entropy pairs. System parameters in Eq.
(22) were sampled randomly from the following ranges:
J1 ∊ [−3,0],
J2 ∊ [−3,0],
T ∊ [0.05,4.05], and
B = 0. For each parameter setting, the excess entropy
EI and entropy density
hμ were estimated numerically; the configurations themselves were generated via a Monte Carlo simulation. For each
(hμ,E) point the simulation was run for 200 000 Monte Carlo updates per site to equilibrate. Configuration data were then taken for 20 000 Monte Carlo updates per site. The lattice size was a square of
48×48 spins. The long equilibration time is necessary because, for some Ising models at low temperature, single-spin flip dynamics of the sort used here have very long transient times.
93,94,95
Note the similarity between Figs.
4,5. For the 2D model, there is also a near-linear upper bound:
E ⩽ 5(1−hμ). In addition, one sees periodic spin configurations, as evidenced by the horizontal bands. An
EI of
1 bit corresponds to a checkerboard of period 2;
EI = 3 corresponds to a checkerboard of period 4; while
EI = 2 corresponds to a “staircase” pattern of period 4. See Ref.
92 for illustrations. The two period-4 configurations are both ground states for the model in the parameter regime in which
∣J2∣<∣J1∣ and
J2<0. At low temperatures, the state into which the system settles is a matter of chance.
Thus, the horizontal streaks in the low-entropy region of Fig.
5 are the different ground states possible for the system. In this regard Fig.
5 is qualitatively similar to Fig.
4—in each there are several possible ground states at
hμ = 0 that persist as the entropy density is increased. However, in the two-dimensional system of Fig.
5 one sees a scatter of other values around the periodic bands. There are even
EI values larger than 3. These
EI values arise when parameters are selected in which the NN and NNN coupling strengths are similar;
J1 ≈ J2. When this is the case, there is no energy cost associated with a horizontal or vertical defect between the two possible ground states. As a result, for low temperatures the system effectively freezes into horizontal or vertical strips consisting of the different ground states. Depending on the number of strips and their relative widths, a number of different
EI values are possible, including values well above 3, indicating a very complex spatial structure.
Despite these differences, the similarities between the complexity-entropy plots for the one- and two-dimensional systems are clearly evident. This is all the more noteworthy since one- and two-dimensional Ising models are regarded as very different sorts of system by those who focus solely on phase transitions. The two-dimensional Ising model has a critical phase transition while the one-dimensional does not. And, more generally, two-dimensional random fields are generally considered very different mathematical entities than one-dimensional sequences. Nevertheless, the two complexity-entropy diagrams show that, away from criticality, the one- and two-dimensional Ising systems’ ranges of intrinsic computation are similar.
Ising model phase transition
As noted above, the two-dimensional Ising model is well known as a canonical model of a system that undergoes a continuous phase transition—a discontinuous change in the system’s properties as a parameter is continuously varied. The 2D NN Ising model with ferromagnetic
(J1>0) bonds and no NNN coupling
(J2 = 0) and zero external field
(B = 0) undergoes a phase transition at
T = Tc ≈ 2.269 when
J1 = 1. At the critical temperature
Tc the free energy is nonanalytic and the magnetic susceptibility and specific heat both diverge. In Fig.
5 we restricted ourselves to antiferromagnetic couplings and thus did not sample in the region of parameter space in which the phase transition occurs.
What happens if we fix
J1 = 1,
J2 = 0, and
B = 0, and vary the temperature? In this case, we see that the complexity, as measured by
E, shows a sharp maximum near the critical temperature
Tc. Figure
6 shows results obtained via a Monte Carlo simulation on a
100×100 lattice. We used a Wolff cluster algorithm and periodic boundary conditions. After
106 Monte Carlo steps (one step is one proposed cluster flip), 25 000 configurations were sampled, with 200 Monte Carlo steps between measurements. This process was repeated for over 200 temperatures between
T = 0 and
T = 6. More temperatures were sampled near the critical region.
In Fig.
6 we first plot entropy density
hμ and excess entropy
E versus temperature. As expected, the excess entropy reaches a maximum at the critical temperature
Tc. At
Tc the correlations in the system decay algebraically, whereas they decay exponentially for all other
Tc values. Hence,
E, which may be viewed as a global measure of correlation, is maximized at
Tc. For the system of Fig.
6,
Tc appears to have an approximate value of 2.42. This is above the exact value for an infinite system, which is
Tc ≈ 2.27. Our estimated value is higher, as one expects for a finite lattice. At the critical temperature,
hμ ≈ 0.57, and
E ≈ 0.413.
Also in Fig.
6 we show the complexity-entropy diagram for the 2D Ising model. This complexity-entropy diagram is a single curve, instead of the scatter plots seen in the previous complexity-entropy diagrams. The reason is that we varied a single parameter, the temperature, and entropy is a single-valued function of the temperature, as can clearly be seen in the first plot in Fig.
6. Hence, there is only one value of
hμ for each temperature, leading to a single curve for the complexity-entropy diagram.
Note that the peak in the complexity-entropy diagram for the 2D Ising model is rather rounded, whereas E plotted versus temperature shows a much sharper peak. The reason for this rounding is that the entropy density hμ changes very rapidly near Tc. The effect is to smooth the E curve when plotted against hμ.
A similar complexity-entropy was produced by Arnold.
75 He also estimated the excess entropy, but did so by considering only one-dimensional sequences of measurements obtained at a single site, while a Monte Carlo simulation generated a sequence of two-dimensional configurations. Thus, those results do not account for the two-dimensional structure but, rather, reflect properties of the dynamics of the particular Monte Carlo updating algorithm used. Nevertheless, the results of Ref.
75 are qualitatively similar to ours.
Erb and Ay
96 have calculated the multi-information for the two-dimensional Ising model as a function of temperature. The multi-information is the difference between the entropy rate and the entropy of a single site:
H(1)−hμ. That is, the multi-information is only the leading term in the sum which defines the excess entropy, Eq.
(11). (Recall that
hμ(1) = H(1).) They find that the multi-information is a continuous function of the temperature and that it reaches a sharp peak at the critical temperature (see Ref.
96, Fig. 4).
Cellular automata
The next process class we consider is cellular automata (CAs) in one and two spatial dimensions. Like spin systems, CAs are common prototypes used to model spatially extended dynamical systems. For reviews see, e.g., Refs.
97,98,99. Unlike the Ising models of the previous section, the CAs that we study here are deterministic. There is no noise or temperature in the system.
The states of the CAs we shall consider consist of one- or two-dimensional configurations
s = …s−1,s0,s1,… of discrete
K-ary local states
si ∊ {0,1,…,K−1}. The configurations change in time according to a global update function
Φ,
starting from an initial configuration
s0. What makes CAs cellular is that configurations evolve according to a local update rule. The value
st+1i of site
i at the next time step is a function
ϕ of the site’s previous value and the values of neighboring sites within some radius
r,
All sites are updated synchronously. The CA update rule
ϕ consists of specifying the output value
st+1 for all possible neighborhood configurations
ηt = sti−r…,sti…,sti+r. Thus, for 1D radius-
r CAs, there are
K2r+1 possible neighborhood configurations and
2K2r+1 possible CA rules. The
r = 1,
K = 2 1D CAs are called elementary cellular automata.
97
In all CA simulations reported we began with an arbitrary random initial configuration s0 and iterated the CA several thousand times to let transient behavior die away. Configuration statistics were then accumulated for an additional period of thousands of time steps, as appropriate. Periodic boundary conditions on the underlying lattice were used.
In Fig.
7 we show the complexity-entropy diagram for 1D,
r = 2,
K = 2 (binary) cellular automata. There are
225 ≈ 4.3×109 such CAs. We cannot examine all 4.3 billion CAs; instead we sample the space these CAs uniformly. For the data of Fig.
7, the lattice has
5×104 sites and a transient time of
5×104 iterations was used. We plot
hμ versus
E for spatial sequences. Plots for the temporal sequences are qualitatively similar. There are several things to observe in these diagrams.
One feature to notice in Fig.
7 is that no sharp peak in the excess entropy appears at some intermediate
hμ value. In contrast, the maximum possible excess entropy falls off moderately rapidly with increasing
hμ. A linear upper bound,
E ⩽ 4(1−hμ), is almost completely respected. Note that, as is the case with the other complexity-entropy diagrams presented here, for all
hμ values except
hμ = 1, there is a range of possible excess entropies.
In the early 1990s there was considerable exploration of the organization of CA rule space. In particular, a series of papers
62,100,101,102 looked at two-dimensional eight-state
(K = 8) cellular automata, with a neighborhood size of 5 sites—the site itself and its nearest neighbor to the north, east, west, and south. These references reported evidence for the existence of a phase transition in the complexity-entropy diagram at a critical entropy level. In contrast, however, here and in the previous sections we find no evidence for such a transition. The reasons that Refs.
62,100,101,102 report a transition, are twofold. First, they used very restricted measures of randomness and complexity: entropy of single isolated sites and mutual information of neighboring pairs of single sites, respectively. These choices have the effect of projecting organization onto their complexity-entropy diagrams. The organization seen is largely a reflection of constraints on the chosen measures, not of intrinsic properties of the CAs. Second, they do not sample the space of CAs uniformly; rather, they parametrize the space of CAs and sample only by sweeping their single parameter. This results in a sample of CA space that is very different from uniform and that is biased toward higher complexity CAs. For a further discussion of complexity-entropy diagrams for cellular automata, including a discussion of Refs.
62,100,101,102, see Ref.
103.
Markov chain processes
In this and the next section, we consider two classes of processes that provide a basis of comparison for the preceding nonlinear dynamics and statistical mechanical systems: those generated by Markov chains and topological ϵ-machines. These classes are complementary to each other in the following sense. Topological ϵ-machines represent structure in terms of which sequences (or configurations) are allowed or not. When we explore the space of topological ϵ-machines, the associated processes differ in which sets of sequences occur and which are forbidden. In contrast, when exploring Markov chains, we fix a set of allowed words—in the present case the full set of binary sequences—and then vary the probability with which subwords occur. These two classes thus represent two different possible origins of organization in intrinsic computation—types that were mixed in the preceding example systems.
In Fig.
8 we plot
E versus
hμ for order-2 (4-state) Markov chains over a binary alphabet. Each element in the stochastic transition matrix
T is chosen uniformly from the unit interval. The elements of the matrix are then normalized row by row so that
∑jTij = 1. We generated
105 such matrices and formed the complexity-entropy diagram shown in Fig.
8. Since these processes are order-2 Markov chains, the bound of Eq.
(15) applies. This bound is the sharp, linear upper limit evident in Fig.
8:
E = 2−2hμ.
It is illustrative to compare the 4-state Markov chains considered here with the 1D NNN Ising models of Sec.
3B1. The order-2 (or 4-state Markov) chains with a binary alphabet are those systems for which the value of a site depends on the previous two sites, but no others. In terms of spin systems, then, this is a spin-
1/2 (i.e., binary) system with nearest- and next-nearest neighbor interactions. The transition matrix for the Markov chain is
4×4 and thus has 16 elements. However, since each row of the transition matrix must be normalized, there are 12 independent parameters for this model class. In contrast, there are only 3 independent parameters for the 1D NNN Ising chain—the parameters
J1,
J2,
B, and the temperature
T. One of the parameters may be viewed as setting an energy scale, so only three are independent.
Thus, the 1D NNN systems are a proper subset of the 4-state Markov chains. Note that their complexity-entropy diagrams are very different, as a quick glance at Figs.
4,8 confirms. The reason for this is that the Ising model, due to its parametrization (via the Hamiltonian of Eq.
(22)), samples the space of processes in a very different way than the Markov chains. This underscores the crucial role played by the choice of model and, so too, the choice in parametrizing a model space. Different parametrizations of the same model class, when sampled uniformly over those parameters, yield complexity-entropy diagrams with different structural properties.
The space of processes: Topological ϵ-machines
The preceding model classes are familiar from dynamical systems theory, statistical mechanics, and stochastic process theory. Each has served an historical purpose in their respective fields—purposes that reflect mathematically, physically, or statistically useful parametrizations of the space of processes. In the preceding sections we explored these classes, asked what sort of processes they could generate, and then calculated complexity-entropy pairs for each process to reveal the range of possible information processing within each class.
Is there a way, though, to directly explore the space of processes, without assuming a particular model class or parametrization? Can each process be taken at face value and tell us how it is structured? More to the point, can we avoid making structural assumptions, as done in the preceding sections?
Affirmative answers to these questions are found in the approach laid out by computational mechanics.
14,19,28 Computational mechanics demonstrates that each process has an optimal, minimal, and unique representation—the
ϵ-machine—that captures the process’s structure. Due to optimality, minimality, and uniqueness, the
ϵ-machine may be viewed as the representation of its associated process. In this sense, this representation is parameter free. To determine an
ϵ-machine for a process one calculates a set of causal states and their transitions. In other words, one does not specify
a priori the number of states or the transition structure between them. Determining the
ϵ-machine makes no such structural assumptions.
19,28
Using the one-to-one relationship between processes and their ϵ-machines, here we invert the preceding logic of going from a process to its ϵ-machine. We explore the space of processes by systematically enumerating ϵ-machines and then calculating their excess entropies E and their entropy rates hμ. This gives a direct view of how intrinsic computation is organized in the space of processes.
As a complement to the Markov chain exploration of how intrinsic computation depends on transition probability variation, here we examine how an
ϵ-machine’s structure (states and their connectivity) affects information processing. We do this by restricting attention to the class of topological
ϵ-machines whose branching transition probabilities are fair (equally probable). An example is shown in Fig.
10 below.
If we regard two
ϵ-machines isomorphic up to variation in transition probabilities as members of a single equivalence class, then each such class of
ϵ-machines contains precisely one topological
ϵ-machine. Symbolic dynamics
104 refers to a related class of representations as topological Markov chains. An essential, and important, difference is that
ϵ-machines always have the smallest number of states.
It turns out that the topological
ϵ-machines with a finite number of states can be systematically enumerated.
105 Here we consider only
ϵ-machines for binary processes:
A = {0,1}. Two
ϵ-machines are isomorphic and generate essentially the same stochastic process, if they are related by a relabeling of states or if their output symbols are exchanged: 0 is mapped to 1 and vice versa. The number of isomorphically distinct topological
ϵ-machines of
n = 1,…,5 states is listed in Table
1.
In Fig.
9 we plot their
(hμ,E) pairs. There one sees that the complexity-entropy diagram exhibits quite a bit of organization, with variations from very low to very high density of
ϵ-machines coexisting with several distinct vertical (iso-entropy) families. To better understand the structure in the complexity-entropy diagram, though, it is helpful to consider bounds on the complexities and entropies of Fig.
9. The minimum complexity,
E = 0, corresponds to machines with only a single state. There are two possibilities for such binary
ϵ-machines. Either they generate all 1s (or 0s) or all sequences occurring with equal probability (at each length). If the latter, then
hμ = 1; if the former,
hμ = 0. These two points, (0, 0) and (1, 0), are denoted with solid circles along the horizontal axis of Fig.
9.
The maximum E in the complexity-entropy diagram is log2 5 ≈ 2.3219. One such ϵ-machine corresponds to the zero-entropy, period-5 processes. And there are four similar processes with periods p = 1,2,3,4 at the points (0,log2 p). These are denoted on the figure by the tokens along the left vertical axis.
There are other period-5 cyclic, partially random processes with maximal complexity, though: those with causal states in a cyclic chain. These have
b = 1,2,3,4 branching transitions between successive states in the chain and positive entropy. These appear as a horizontal line of enlarged square tokens along in the upper portion of the complexity-entropy diagram. Denote the family of
p-cyclic processes with
b branchings as
Fp,b. An
ϵ-machine illustrating
F5,3 is shown in Fig.
10. The excess entropy for this process is
log2 5 ≈ 2.32, and the entropy rate is
3/5.
Since
ϵ-machines for cyclic processes consist of states in a single loop, their excess entropies provide an upper bound among
ϵ-machines that generate
p-cyclic processes with
b branching states, namely,
Clearly,
E(Fp,b)→∞ as
p→∞. Their entropy rates are given by a similarly simple expression,
Note that
hμ(Fp,b)→0 as
p→∞ with fixed
b and
hμ(Fp,b)→1 as
b→p. Together, then, the family
F5,b gives an upper bound to the complexity-entropy diagram.
The processes
Fp,b are representatives of the highest points of the prominent jutting vertical towers of
ϵ-machines so prevalent in Fig.
9. It therefore seems reasonable to expect the
(hμ,E) coordinates for
p-cyclic process languages to possess at least
p−1 vertical towers, distributed evenly at
hμ = b/p,
b = 1,…,p−1, and for these towers to correspond with towers of
m-cyclic process languages whenever
m is a multiple of
p.
These upper bounds are one key difference from earlier classes in which there was a decreasing linear upper bound on complexity as a function of entropy rate, E ⩽ R(1−hμ). That is, in the space of processes, many are not so constrained. The subspace of topological ϵ-machines illustrates that there are many highly entropic, highly structured processes. Some of the more familiar model classes appear to inherit, in their implied parametrization of process space, a bias away from such processes.
It is easy to see that the families Fp,p−1 and Fp,1 provide upper and lower bounds for hμ, respectively, among the process languages that achieve maximal E and for which hμ>0. Indeed, the smallest positive hμ possible is achieved when only a single of the equally probable states has more than one outgoing transition.
More can be said about this picture of the space of intrinsic computation spanned by topological
ϵ-machines.
105 Here, however, our aim is to illustrate how rich the diversity of intrinsic computation can be and to do so independent of conventional model-class parametrizations. These results allow us to probe in a systematic way a subset of processes in which structure dominates.