- Research
- Open
- Published:

# Inhomogeneous Sparseness Leads to Dynamic Instability During Sequence Memory Recall in a Recurrent Neural Network Model

*The Journal of Mathematical Neuroscience***volume 3**, Article number: 8 (2013)

## Abstract

Theoretical models of associative memory generally assume most of their parameters to be homogeneous across the network. Conversely, biological neural networks exhibit high variability of structural as well as activity parameters. In this paper, we extend the classical clipped learning rule by Willshaw to networks with inhomogeneous sparseness, i.e., the number of active neurons may vary across memory items. We evaluate this learning rule for sequence memory networks with instantaneous feedback inhibition and show that little surprisingly, memory capacity degrades with increased variability in sparseness. The loss of capacity, however, is very small for short sequences of less than about 10 associations. Most interestingly, we further show that, due to feedback inhibition, too large patterns are much less detrimental for memory capacity than too small patterns.

## 1 Introduction

Many brain areas exhibit extensive recurrent connectivity. Over decades such neuronal feedback attracted a huge amount of theoretical modeling [1–3] and one of the most prominent functions that is proposed for the recurrent synaptic connections is that of associative memory. In most such theories, all memory items are generally treated as equal, particularly in terms of the sparseness with which they are neurally represented, i.e., in terms of how many neurons are active during recall. In this paper, we extend a particular class of such auto-association networks, viz., sequence memory networks, to include variable sparseness and thereby add one aspect of variability that is to be expected in biological neural networks.

Memory sequences have been shown to occur in the rodent brain during hippocampal sharp-wave ripple events [4, 5]. The major hypothesis of the present paper is that these sequences are stored in the recurrent connections of the hippocampal network, which is supported by findings of fast coordinated excitatory synaptic currents during sharp-wave ripple events in slices [6].

Here, we build on a previous model of memory sequences [7], which enhances memory capacity by instantaneous feedback inhibition. Both our mean field analysis and our simulations show that, in this model, inhomogeneity in pattern sizes reduces memory capacity, but it does so in an asymmetric way: whereas too small patterns significantly compromise the stability of sequence recall, too large patterns can be compensated for quite robustly.

## 2 Model

We use the standard formalism of auto-associative networks: a discrete-time dynamical system. The individual time steps may be interpreted as the cycles of the hippocampal ripple oscillations. The states ${x}_{i}(t)$ of all neurons $1\le i\le N$ at time step *t* determine the states at time $t+1$ by a thresholded function

Here, as in many other approaches, we define *Θ* as the Heaviside function, which is equivalent to restricting the neuron states to binary variables, with ${x}_{i}=1$ if neuron *i* fires and ${x}_{i}=0$ if it is silent. The state of the network at time *t* is thus denoted by $\mathbf{x}(t)\in {\{0,1\}}^{N}$. The other parameters are the firing threshold *θ* and the synaptic weights ${J}_{ij}$.

In the framework of the dynamical system of (1), associative memory is considered to be the (approximate) recall of a network state $\mathbf{x}(t+n)={\xi}_{t+n}$ at time $t+n$ after the network has been initialized with some appropriate cue $\mathbf{x}(t)={\xi}_{t}$ at time *t*. Recall can either occur as convergence to a dynamical attractor ($n\to \mathrm{\infty}$) [2], or as a one-step association ($n=1$) [8, 9]. The specific choice of the synaptic matrix ${J}_{ij}$ determines which patterns can be recalled, or *are stored* in the network.

In this paper, we will deal with sequences of one-step associations with binary synapses [8, 10, 11] and instantaneous feedback inhibition [7, 9, 12]. Here, memory sequences are described as sequences of random activity patterns *ξ* that are binary vectors of dimension *N*, $\xi \in {\{0,1\}}^{N}$. A memory sequence of length *Q* is an ordered occurrence of activity patterns ${\xi}_{1}\to {\xi}_{2}\to \cdots \to {\xi}_{Q}$. The number ${M}_{t}$ of active neurons in each pattern ${\xi}_{t}$ is called pattern size, and will in general be different for each pattern.

As proposed in [7, 9, 11], we model the weights ${J}_{ij}$ of the binary synapses as products of two independent binary stochastic processes, ${J}_{ij}={w}_{ij}{s}_{ij}$. The first stochastic variable ${w}_{ij}$ indicates the presence (${w}_{ij}=1$) or absence (${w}_{ij}=0$) of a morphological connection, with probability $Pr({w}_{ij}=1)={c}_{m}$, called *morphological connectivity*. The second process ${s}_{ij}$ is called synaptic state and will be used to store memories. In the potentiated state (${s}_{ij}=1$), a presynaptic spike increments the postsynaptic potential by 1, whereas in the silent state (${s}_{ij}=0$), the postsynaptic potential remains unaffected. According to (1), neuron *i* fires a spike at cycle $t+1$ if its postsynaptic potential ${h}_{i}(t)={\sum}_{j=1}^{N}{w}_{ij}{s}_{ij}{x}_{j}(t)$ at time *t* exceeds the threshold *θ*.

Willshaw’s [10] clipped Hebbian rule is used to set the synaptic states ${s}_{ij}$ such that the network is able to recall the memory sequences: a synapse is in the potentiated state only if it connects two neurons that are activated in sequence at least once.

In the case of homogeneous sparseness, where all patterns have the same number *M* of active neurons, Willshaw’s rule connects the fraction $c/{c}_{m}$ of potentiated synapses in the network with the number *P* of stored associations by

where $f=M/N$ is the coding ratio. The effective connectivity *c* defines the noise level during recall, i.e., how many spurious inputs a neuron gets that are not part of the memory pattern to be recalled. If *c* is too large, the network will exhibit many spurious activations and the memory can no longer be recalled. Equation (2) thus provides a capacity estimate of the network in that it says how many associations are stored at the maximum noise level *c*.

In the case of inhomogeneous sparseness, this formula is no longer valid and thus we asked what would its generalization be. To this end, we introduce the coding ratio vector

where ${M}_{k}={f}_{k}N$ is the number of active neurons in the stored pattern ${\xi}_{k}$. The elements of ** ϕ** are considered to be random variables and distributed according to the coding ratio distribution ${p}_{\varphi}(\varphi )$, with mean coding ratio ${\varphi}_{0}$ and standard deviation ${\sigma}_{\varphi}$. For now, we use a Gamma distribution (Fig. 1) for ${p}_{\varphi}(\varphi )$ with mean ${\varphi}_{0}=0.01$. By varying the standard deviation ${\sigma}_{\varphi}$ we control how much the elements of vector

**deviate from its mean ${\varphi}_{0}$.**

*ϕ*In analogy to (2), the probability of synaptic potentiation $\varsigma =c/{c}_{m}$ can be computed analytically for any given coding ratio vector ** ϕ** (see the Appendix) as follows:

This expression, however, only provides an implicit dependence on the number *P* of patterns, and moreover, it of course depends on the specific choice of ** ϕ**. We therefore asked how much

*ς*varies over the statistical ensemble of

**. In addition to addressing this question numerically, we could also find analytical expressions for the mean $\u3008\varsigma \u3009$ and variance ${\sigma}_{\varsigma}^{2}$ of**

*ϕ**ς*over all possible realizations of

**, which depend on the first to fourth-order moments of the coding ratio distribution ${p}_{\varphi}(\varphi )$ (see the Appendix). For sufficiently small variances ${\sigma}_{\varphi}^{2}$, the Gaussian approximations resulting from $\u3008\varsigma \u3009$ and ${\sigma}_{\varsigma}^{2}$ fit the empirical distributions of**

*ϕ**ς*very well; see Fig. 2 for 10

^{6}random samples of

**.**

*ϕ*The results show that the variability in *ς* is actually relatively large (about 10 % for ${\sigma}_{\varphi}=0.1{\varphi}_{0}$) and even increases with increasing number of associations *P*. We therefore decided not to use the expectation value $\u3008\varsigma \u3009$ for further discussion, but to show empirical distributions of many realizations of ** ϕ** whenever possible.

In order to evaluate the dynamics of sequence retrieval, we mostly do not simulate the full neural network but use a mean field approximation [7, 11] based on two macroscopic dynamic variables: the number ${m}_{t}\in [0,{M}_{t}]$ of correctly activated neurons (*hits*) and the number ${n}_{t}\in [0,N-{M}_{t}]$ of incorrectly activated neurons (*false alarms*). For large network sizes *N* and large pattern sizes ${M}_{t}$, the central limit theorem predicts the distribution of the total number of synaptic inputs $h(t)$ to be Gaussian, and the variables ${m}_{t}$ and ${n}_{t}$ can be reinterpreted as expectation values over many realizations of the connectivity matrix. Denoting the mean number of synaptic inputs as $\mu \equiv \u3008h(t)\u3009$ and the variance as ${\sigma}^{2}\equiv \u3008h{(t)}^{2}\u3009-{\u3008h(t)\u3009}^{2}$, we obtain (see the Appendix) for the **On** population (should fire),

and for the **Off** population (should not fire),

Willshaw’s learning rule yields correlations in the synaptic states that are captured by the terms proportional to ${V}_{\varsigma}^{2}$ (see the Appendix).

The discrete-time network dynamics in (1) maps to the mean field model such that

with

where $\Phi (z)\equiv [1+erf(z/\sqrt{2})]/2$ denotes the cumulative distribution function (cdf) of the normal distribution.

In the framework of this model and following [7, 12], inhibition is introduced as instantaneous negative feedback proportional to the total number ${m}_{t}+{n}_{t}$ of active neurons at time *t*. Formally, this is achieved by substituting $\theta \to \theta +{h}_{t}$ in (10) and (11), where ${h}_{t}=b({m}_{t}+{n}_{t})$. The inhibitory weight is taken as $b={c}_{m}\varsigma $ throughout the paper (for discussion see [7]).

## 3 Results

### 3.1 Inhomogeneous Sparseness Reduces Dynamic Stability

Some exemplary numerical evaluations of the mean field dynamics from equations (9) and following for different firing thresholds *θ* and different pattern size inhomogeneities ${\sigma}_{\varphi}$ are shown in Fig. 3. Despite being just random samples, these plots already reveal the impact of increasing inhomogeneity in the coding ratio vector ** ϕ** on the network’s ability to successfully retrieve the stored patterns. As the standard deviation ${\sigma}_{\varphi}$ grows, the activity fluctuations during replay become more and more pronounced and, at some point, lead to dynamic instability during recall. This is clearly visible for a firing threshold $\theta =28$, where perfect pattern retrieval (${m}_{t}/{M}_{t}=1$ and ${n}_{t}/(N-{M}_{t})=0$) is interrupted more and more frequently as ${\sigma}_{\varphi}$ increases, eventually preventing the retrieval of the full sequence at ${\sigma}_{\varphi}/{\varphi}_{0}=20\phantom{\rule{0.25em}{0ex}}\mathrm{\%}$. There the network falls silent, due to a small pattern in the sequence that no longer generated the required synaptic drive to retrieve the following pattern in the sequence. For lower thresholds ($\theta =24,26$), the network activity may instead explode prematurely due to a big pattern that generates too much synaptic drive and sets the network into a permanently active (epileptic) state. As we simulate a network with instantaneous feedback inhibition, the epileptic state is characterized by approximately half of the neurons being active at any time (${m}_{t}/{M}_{t}\approx {n}_{t}/(N-{M}_{t})\approx 1/2$), where the subset of active neurons changes from time step to time step.

In summary, the range of thresholds under which the network successfully replays the full sequence is severely reduced as the pattern sizes become more and more inhomogeneous.

#### 3.1.1 Replay Success Rate

In order to analyze the numerical results more quantitatively, we introduce a criterion for what we consider to be a successful replay. Following [11], we define the retrieval quality

as the relative difference between hit ratio and false alarm ratio, and consider a pattern at time *t* to be retrieved successfully if ${\Gamma}_{t}>0.5$. By running the mean field equations many times with different random realizations of vector ** ϕ**, we obtain an empirical

*replay success rate*${\varrho}_{t}$ as the fraction of runs with successful retrieval ${\Gamma}_{t}>0.5$.

Figure 4 shows replay success rates for a sequence of length $Q=100$ for ${\varphi}_{0}=0.01$ and varying inhomogeneities ${\sigma}_{\varphi}$. When the *P* stored associations are still approximately homogeneous (*top left*), the full sequence can be retrieved with probability one for a large range of firing thresholds *θ*. As we let inhomogeneity increase, this range becomes narrower (*top right*), and eventually collapses (*bottom*), so that only the first items can be retrieved with high probability. Hence, inhomogeneous sparseness strongly affects the replay of long sequences, but does relatively little harm to short sequences ($Q\u2a855$).

#### 3.1.2 Region of Stable Replay

Sequence retrieval not only critically depends on the firing threshold *θ*, but also on the mean coding ratio ${\varphi}_{0}$. We therefore searched for regions of stable replay in $({\varphi}_{0},\theta )$ space (Fig. 5) at time step $t=100$. The region where sequence replay for homogeneous patterns (${\sigma}_{\varphi}=0$) is unfeasible is shown in white for comparison. Replay regions exhibit the typical wedge shape [7, 11]. If the firing threshold *θ* is too low or the mean coding ratio ${\varphi}_{0}$ is too large, all neurons immediately start to fire and the network falls into an all-active state. If the firing threshold is too high or the mean coding ratio is too small, the network immediately falls into an all-silent state. As patterns become less homogeneous, the wedge-shaped region of replay becomes narrower and narrower, and eventually vanishes for highly inhomogeneous patterns (*bottom right*). The region of retrieval obtained from the mean field equations was validated with some computer simulations of the corresponding networks of binary neurons (*white discs*, corresponding to 95 % replay success rate).

### 3.2 Storage Capacity

As in the classical Willshaw net ([10], and (2)) the mean synaptic connectivity *c* for a network with inhomogeneous pattern sizes depends on the number *P* of stored associations, as well; see (4). This allowed us to adjust the mean connectivity to a fixed value $c=0.05$ by changing the number *P* of stored associations in the network. So far, we have kept *c* constant and varied the width parameter ${\sigma}_{\varphi}$ of the size distribution. We saw that for large inhomogeneities (${\sigma}_{\varphi}/{\varphi}_{0}\u2a8620\phantom{\rule{0.25em}{0ex}}\mathrm{\%}$) replay of long sequences is hardly possible. But what if we reduce *c*? Intuitively, replay should become more stable if we reduce the “noise” connectivity *c*. As shown in Fig. 6, this is indeed the case: If the number of stored associations, and thus *c*, is decreased, sequence retrieval is robust under high inhomogeneity, and may even allow for replay of the full sequence for a whole range of firing thresholds.

However, reducing *c* comes at the cost of a reduced memory capacity *P*, and thus we have to find a way to quantify the trade-off between replay stability and capacity for a network with inhomogeneous patterns sizes. To this end, we define the *maximum retrievable sequence length*

as the maximum number ${T}_{90}$ of time steps for which the replay success rate ${\varrho}_{T}$ remains above 90 % for a given pattern size vector ** ϕ** and morphological connectivity ${c}_{m}$. Since replay stability strongly depends on the firing threshold, the maximum is taken over all possible firing thresholds

*θ*.

Figure 7 shows as a function of the number *P* of stored associations in the network (as well as the corresponding mean connectivity *c*). When the total number of stored associations is small, even a sequence consisting of all stored associations may be retrieved under high inhomogeneity (${\sigma}_{\varphi}/{\varphi}_{0}=25\phantom{\rule{0.25em}{0ex}}\mathrm{\%}$). As *P* grows, the curve reaches a maximum and then decreases very quickly. The breakdown point and slope depend critically on the degree of inhomogeneity. For the homogeneous case ${\sigma}_{\varphi}\to 0$, decreases infinitely steeply, and the number *P* of patterns at the breakdown determines the “classical” storage capacity. For finite inhomogeneities ${\sigma}_{\varphi}$, decreases according to a power law $\mathcal{T}\propto {P}^{-\alpha}$ that trades stability vs. capacity *P*. Since the exponent *α* is much smaller for high inhomogeneities ${\sigma}_{\varphi}/{\varphi}_{0}$, the net decrease of capacity for short sequences (⪅10) is relatively small compared to a network with homogeneous sparseness (a decrease of ∼1.8 for ${\sigma}_{\varphi}/{\varphi}_{0}=25\phantom{\rule{0.25em}{0ex}}\mathrm{\%}$).

### 3.3 Asymmetry of the Size Distribution

From our observations of single runs in Fig. 3, we already derived some anecdotal insight into the mechanisms underlying the breakdown of sequence replay: network activity may cease after a small pattern, whereas large patterns may lead to epilepsy. However, it is unclear which of these two ways of terminating replay is more problematic, or whether both occur equally often.

In order to tackle this question about the mechanisms of sequence termination, we investigate the effect of *skewness* (or asymmetry) in the pattern size distribution, i.e., an imbalance between bigger and smaller than average patterns. So far, we have used the Gamma distribution shown in Fig. 1, which is relatively symmetric for small variation coefficients. To have a better handle on skewness, we now switch to triangular distributions (Fig. 8).

A symmetric triangular distribution (Fig. 8b) is used for comparison. In order to study the effect of an excess of small patterns we constructed a negatively skewed distribution (Fig. 8a) by cutting away all patterns above the line of symmetry (${\varphi}_{max}$) and adding smaller patterns instead. Since this distribution has a lower mean coding ratio ${\varphi}_{0}={\varphi}_{max}-\sqrt{2}{\sigma}_{\varphi}$, we increased the number *P* of patterns to account for the same “noise” connectivity *c* as in the symmetric case. The region of stable replay is clearly reduced in the negatively skewed distribution as compared to the symmetric one. This reduction could either be because small pattern sizes are intrinsically bad, or because the asymmetry of the distribution is a limiting factor. We therefore also considered the case of an excess of large patterns (Fig. 8c). For such a positively skewed distribution, the asymmetry is the same as for the negatively skewed distribution; however, and interestingly, the region of stable replay is larger than for the symmetric distribution. Again, the connectivity was adjusted to the same value by reducing the number *P* of associations to compensate for the higher mean coding ratio ${\varphi}_{0}={\varphi}_{max}+\sqrt{2}{\sigma}_{\varphi}$.

From these observations, we conclude that indeed the small patterns are much more problematic for replay with inhomogeneous pattern sizes than the large patterns. To understand why, we compared the shape of the replay regions for the three distributions (a, b, and c), and observe that the slope of the lower side of the wedge is relatively insensitive to skewness, whereas the slope of the upper side of the wedge is very different in each case. Failures owing to activity explosion (the lower side of the wedge) are almost independent of the skewness, due to the instantaneous feedback inhibition in the mean field equations. On the other hand, the upper side of the wedge is determined by the network’s falling into a silent state. Thus, positive skewness (c) is the more robust distribution. Note that this was also apparent in Fig. 5, where the reduction of the region of stability with increasing inhomogeneity was much more pronounced on the upper side of the wedges than on the lower side, despite the relatively symmetric Gamma distribution used there.

Finally, in order to more directly illustrate how replay terminates after small patterns, we show a scatter plot of 10^{4} sample pairs $({M}_{\tau},{M}_{\tau +1})$, where *τ* is the last time step at which the pattern was replayed with sufficient quality ${\Gamma}_{\tau}>0.5$ (Fig. 9; note that here we again used a Gamma distribution to achieve comparability with Fig. 4). Points below the red line (${M}_{\tau}>{M}_{\tau +1}$) represent sequences for which the last correctly replayed pattern ${\xi}_{\tau}$ was bigger than the following pattern ${\xi}_{\tau +1}$, whereas points above the red line (${M}_{\tau}<{M}_{\tau +1}$) represent sequences for which ${\xi}_{\tau}$ was smaller than ${\xi}_{\tau +1}$. On the low-threshold edge of the stability region, which is prone to over-excitement ($\theta =25$, cf. Fig. 5), a big-to-small pattern transition is as likely to lead to a failure in sequence replay as a small-to-big transition. However, on the high-threshold edge of the stability region ($\theta =30$), where replay eventually dies out, most failures are caused by small-to-big pattern transitions ($\sim 80\phantom{\rule{0.25em}{0ex}}\mathrm{\%}$ of points above the red line).

Again, these results show that the small patterns are more detrimental for sequence replay than the large patterns, since in the latter case fluctuations can be compensated for by feedback inhibition, whereas the former have no compensatory mechanism.

### 3.4 Nonlinear Inhibition

So far, we have assumed a linear dependence of instantaneous feedback inhibition on the total network activity, i.e., ${h}_{t}=b({m}_{t}+{n}_{t})$, since it was shown to optimize replay quality [7, 12]. In this final section, we investigate how a particular *nonlinear* form of inhibition could improve the network’s resilience to inhomogeneity, because (a) physiological data from cortical inhibitory networks suggest supralinear dependence on input [13, 14] and (b) supralinear inhibition effectively provides a positive feedback (with respect to linear inhibition) in cases of too low activity.

To be able to best compare the nonlinear with the linear case, we constructed a nonlinearity that only implements such positive boost for too low activities ${m}_{t}+{n}_{t}$ and remains linear for too large activities (see upper panel of Fig. 10b). Formally, it is obtained by replacing the linear term $b({m}_{t}+{n}_{t})$ by the function ${h}_{t}=h({m}_{t}+{n}_{t})$, where

with parameters

chosen such that the slope of $h(x)$ at the operation point $x={m}_{t}+{n}_{t}={\varphi}_{0}N$ is equal to *b* in both the linear and nonlinear parts.

Figure 10 shows the mean retrieval quality ${\overline{\Gamma}}_{t}$ at time step $t=100$ (averaged over 10^{2} random realizations of vector ** ϕ**) in the $({\varphi}_{0},\theta )$ plane, for linear (a) and nonlinear (b) inhibition and two levels of inhomogeneity ${\sigma}_{f}/{\varphi}_{0}$.

For low inhomogeneity (${\sigma}_{\varphi}/{\varphi}_{0}=5\phantom{\rule{0.25em}{0ex}}\mathrm{\%}$), although the region of stability is wider in the nonlinear case, the retrieval quality in the gained region is not as good as in the region shared by both feedback strategies (see lighter red stripe in middle panel of Fig. 10b). This finding fits very well to previous papers that report that linear inhibition maximizes replay quality for homogeneous pattern sizes: the gain in robustness is mostly obtained by a reduced replay quality. For a large inhomogeneity (${\sigma}_{\varphi}/{\varphi}_{0}=20\phantom{\rule{0.25em}{0ex}}\mathrm{\%}$), linear feedback almost completely extinguishes replay, whereas nonlinear inhibition recovers a considerable stable replay region with high retrieval quality.

Supralinear inhibitory feedback at low activity levels thus significantly widens the replay region making the network resilient to higher levels of inhomogeneity than would be possible with linear feedback. The underlying mechanism by which this is achieved can be explained as follows: smaller-than-average patterns generate only little negative feedback and thereby keep up the activity in the network, whereas bigger-than-average patterns are compensated for optimally by linear negative feedback.

## 4 Discussion

This paper extends previous models of sequence memory [7, 9, 11, 12] that were based on Willshaw’s learning rule [10] to *inhomogeneous* pattern sizes, i.e., patterns of variable sparseness. Our work reveals that inhomogeneity in the sparseness of stored patterns is detrimental to a recurrent network’s dynamic stability during sequence retrieval. Bigger than average patterns tend to lead the network into an all-active epileptic state as a result of an excessively high synaptic drive, whereas smaller than average patterns tend to lead to an all-silent state as a result of an insufficient synaptic drive. In either case, sequence retrieval is terminated prematurely due to dynamic instability. As expected, the higher the variability in pattern sizes, the higher is the probability of premature sequence termination. Our results thus suggest that a plasticity mechanism that ensures a certain degree of homogeneity in the sparseness of hippocampal representations would be useful for the reliable retrieval of long sequences.

Instantaneous linear feedback inhibition is able to compensate to a certain degree for bigger-than-average patterns, but it does nothing to prevent the network from falling silent since it does not compensate for an insufficient synaptic drive. This asymmetry is reflected on the relative impact of differently skewed pattern size distributions. Compared to a symmetric distribution, negative skewness leads to a smaller region of stable replay, whereas positive skewness leads to a larger region. Positively skewed pattern size distributions are thus more resilient to premature sequence termination under linear feedback. The higher vulnerability to smaller-than-average patterns can be corrected for by introducing a nonlinear negative feedback which is close to zero for lower-than-average network activity. Such supralinear inhibition can make the network resilient to higher levels of inhomogeneity than linear feedback inhibition.

Memory networks with variable sparseness have been studied by Amit and Huang [15, 16] under a different learning paradigm in which old memories are gradually overwritten by new memories, and for several more involved synaptic (meta-)plasticity rules. There, inhomogeneity in the pattern sizes was shown to decrease the signal-to-noise ratio during recall as well.

In contrast to palimpsest models [17–22] in which old memories are overwritten, our model assumes that all memories are equally well preserved in the synaptic states of the network, which argues for additional plasticity rules that continuously readjust the synaptic matrix to keep old memories fresh. Such a mechanism would necessarily require ongoing plasticity rules, which may then easily be linked to some sort of pattern size homeostasis that tries to keep the sparseness homogeneous. Such persistent network remodeling fits experimental findings that, at least for a few weeks after memory acquisition, existing memories can be extinct by blocking protein synthesis together with memory reactivation [23], hinting at the presence of plasticity mechanisms during early retrieval.

## Appendix

In this appendix, we give the mathematical details of how we derive the expectation values necessary for our dynamical model from the underlying stochasticity of the recurrent synaptic matrix. The effect of learning on the synaptic connections is modeled by binary random variables that take a value ${s}_{ij}=1$ if the putative synapse from neuron *j* to neuron *i* is in a potentiated state (is able to transmit signals), whereas ${s}_{ij}=0$ if it is inactive and cannot contribute to the postsynaptic depolarization. Since not all neurons are considered to be synaptically connected, the real synaptic weight is a product ${w}_{ij}{s}_{ij}$ [9] where ${w}_{ij}=1$ or 0 according to a binomial process with probability ${c}_{m}$ (the morphological connectivity) that is supposed to model the existence of a physical synapse. The two random variables ${s}_{ij}$ and ${w}_{ij}$ are considered to be independent.

The vector of pattern sizes $\mathit{\varphi}=({f}_{0},\dots ,{f}_{P})$ defines how many neurons ${M}_{t}={f}_{t}N$ fire in each pattern ${\xi}_{t}$. According to Willshaw’s rule, a given sequence of stored patterns, with sizes specified by ** ϕ**, uniquely defines the matrix of synaptic states ${s}_{ij}$: Only those synapses are potentiated for which the presynaptic neuron is active in pattern ${\xi}_{k}$

*and*the postsynaptic neuron is active in pattern ${\xi}_{k+1}$ for at least one value $k=0,\dots ,P-1$. In order to translate this learning rule into formulas, we have to introduce the theoretical concept of an

*activation schedule*.

### A.1 Activation Schedule

Different neurons participate differently in the replay of memory sequences. Formally, each neuron is described by its activation schedule $A=\{{a}_{1},{a}_{2},\dots ,{a}_{P}\}$, ${a}_{k}\in \{0,1\}$, which indicates in which patterns the neuron fires (${a}_{k}=1$) or not (${a}_{k}=0$). Since we assume the participation in a pattern to be random, the probability for a specific activation schedule is computed as

where $|A|$ is the number of patterns in which the neuron is active and $\alpha :\{1,\dots ,P\}\to \{1,\dots ,P\}$ is a reordering of the patterns such that those in which the neuron is active have the $|A|$ lowest indices. If $|A|=0$, the first factor equals 1 as indicated by the Kronecker symbol ${\delta}_{|A|=0}$.

The activation schedule of a neuron allows us to compute the fraction ${\varsigma}_{A}$ of putative activated synapses at a postsynaptic neuron with activation schedule *A* in analogy to the classical Willshaw idea,

where the product on the right-hand side is the fraction of synapses remaining inactive after storing *P* sequential activations, corresponding to $|A|$ learning steps.

In order to compute the mean connectivity in the network, we average over all postsynaptic neurons (i.e., activation schedules) and obtain

where we have introduced the abbreviation ${\chi}_{k}=\frac{{f}_{k}(1-{f}_{k-1})}{1-{f}_{k}}$ and used the algebraic identity

Similarly, the second moment $E[{\varsigma}_{A}^{2}]$ is computed as

### A.2 Mean and Variance of Total Synaptic Input

With the above two moments, we can find means and variances for the synaptic inputs. We start with the probability of total synaptic input to a postsynaptic cell with activation schedule *A*, which is binomially distributed according to

The probability of total synaptic input to an average postsynaptic cell can then be obtained as

The mean value of *h* depends on whether the postsynaptic cell belongs to the **On** population (should fire) or the **Off** population (should not fire). For the **Off** population, we have

and for the **On** population,

In order to obtain the variance of *h*, we compute the second moment of *h* for the **Off** population,

The variance is then given by

where

is the squared variation coefficient of ${\varsigma}_{A}$ over all activation schedules *A*. Similarly, for the **On** population, we get

### A.3 Mean and Variance of *ς* over Pattern Size Distribution

So far, all formulas were obtained for a specific realization of the pattern size (coding ratio) vector ** ϕ**. The pattern sizes themselves can, however, be considered as resulting from a stochastic process as well. We therefore are interested in expectation values over the pattern size distribution to be able to account for average connectivities over many realizations of the network. Such an average connectivity $\u3008\varsigma \u3009$ upon imprinting the memory with

*P*patterns is given by

where $\u3008\cdot \u3009$ indicates the expected value over the size distribution ${p}_{\varphi}(\varphi )$. The last term can be expanded as follows:

The last term on the right, as well as higher-order terms, contain overlapping indices (e.g., when ${t}^{\prime}=t+1$), so that for each term of order 2*k* (for $k=1,\dots ,P$), we will have 2*j* isolated indices, each contributing a term $\u3008f\u3009$, and $(k-j)$ duplicated indices, each contributing a term $\u3008{f}^{2}\u3009$ (for $j=1,\dots ,k$). Therefore, we can write

where ${n}_{j}^{(P,k)}$ is the number of *k*-combinations of *P* elements with exactly *j* non-adjacent element sets. For example, given *P* elements ${a}_{1},{a}_{2},\dots ,{a}_{P}$, the 3-combination ${a}_{1}{a}_{2}{a}_{3}$ has $j=1$ non-adjacent sets, ${a}_{1}{a}_{3}{a}_{4}$ has $j=2$, and ${a}_{1}{a}_{3}{a}_{5}$ has $j=3$. After some algebra, we arrive at the expression

The mean probability of synaptic potentiation $\u3008\varsigma \u3009$ can thus be expressed as a function of the first and second order moments of the coding ratio distribution ${p}_{\varphi}(\varphi )$ as follows:

with the first and second moments

The second moment $\u3008{\varsigma}^{2}\u3009$ of the probability of synaptic potentiation is given by

where the last term equals

Here, we use

and

with

If ${p}_{\varphi}(\varphi )$ is the Gamma distribution, the higher moments can be computed as

Finally, the variance ${\sigma}_{\varsigma}^{2}$ of the probability of synaptic potentiation over all possible realizations of the coding ratio vector ** ϕ** is given by

## References

- 1.
Little WA: The existence of persistent states in the brain.

*Math Biosci*1974, 19: 101–120. 10.1016/0025-5564(74)90031-5 - 2.
Hopfield JJ: Neural networks and physical systems with emergent collective computational abilities.

*Proc Natl Acad Sci USA*1982, 79(8):2554–2558. 10.1073/pnas.79.8.2554 - 3.
Wennekers T, Palm G: Modelling generic cognitive functions with operational Hebbian cell assemblies. In

*Neural Network Research Horizons*. Edited by: Weiss M. Nova Science Publishers, New York; 2007:225–294. - 4.
Lee AK, Wilson MA: Memory of sequential experience in the hippocampus during slow wave sleep.

*Neuron*2002, 36(6):1183–1194. 10.1016/S0896-6273(02)01096-6 - 5.
Diba K, Buzsaki G: Forward and reverse hippocampal place-cell sequences during ripples.

*Nat Neurosci*2007, 10(10):1241–1242. 10.1038/nn1961 - 6.
Maier N, Tejero-Cantero Á, Dorrn AL, Winterer J, Beed PS, Morris G, Kempter R, Poulet JF, Leibold C, Schmitz D: Coherent phasic excitation during hippocampal ripples.

*Neuron*2011, 72: 137–152. 10.1016/j.neuron.2011.08.016 - 7.
Kammerer A, Tejero-Cantero Á, Leibold C: Inhibition enhances memory capacity: optimal feedback, transient replay and oscillations.

*J Comput Neurosci*2013, 34: 125–136. 10.1007/s10827-012-0410-z - 8.
Nadal JP: Associative memory: on the (puzzling) sparse coding limit.

*J Phys A*1991, 24: 1093–1101. 10.1088/0305-4470/24/5/023 - 9.
Gibson WG, Robinson J: Statistical analysis of the dynamics of a sparse associative memory.

*Neural Netw*1992, 5: 645–661. 10.1016/S0893-6080(05)80042-5 - 10.
Willshaw DJ, Buneman OP, Longuet-Higgins HC: Non-holographic associative memory.

*Nature*1969, 222(5197):960–962. 10.1038/222960a0 - 11.
Leibold C, Kempter R: Memory capacity for sequences in a recurrent network with biological constraints.

*Neural Comput*2006, 18(4):904–941. 10.1162/neco.2006.18.4.904 - 12.
Hirase H, Recce M: A search for the optimal thresholding sequence in an associative memory.

*Network*1996, 4: 741–756. - 13.
Kapfer C, Glickfeld L, Atallah B, Scanziani M: Supralinear increase of recurrent inhibition during sparse activity in the somatosensory cortex.

*Nat Neurosci*2007, 10: 743–753. 10.1038/nn1909 - 14.
Silberberg G, Markram H: Disynaptic inhibition between neocortical pyramidal cells mediated by Martinotti cells.

*Neuron*2007, 53: 735–746. 10.1016/j.neuron.2007.02.012 - 15.
Amit Y, Huang Y: Precise capacity analysis in binary networks with multiple coding level inputs.

*Neural Comput*2010, 22(3):660–688. 10.1162/neco.2009.02-09-967 - 16.
Huang Y, Amit Y: Capacity analysis in multi-state synaptic models: a retrieval probability perspective.

*J Comput Neurosci*2011, 30(3):699–720. 10.1007/s10827-010-0287-7 - 17.
Amit DJ, Fusi S: Learning in neural networks with material synapses.

*Neural Comput*1994, 6: 957–982. 10.1162/neco.1994.6.5.957 - 18.
Fusi S, Drew PJ, Abbott LF: Cascade models of synaptically stored memories.

*Neuron*2005, 45(4):599–611. 10.1016/j.neuron.2005.02.001 - 19.
Leibold C, Kempter R: Sparseness constrains the prolongation of memory lifetime via synaptic metaplasticity.

*Cereb Cortex*2008, 18: 67–77. 10.1093/cercor/bhm037 - 20.
Barrett AB, van Rossum MC: Optimal learning rules for discrete synapses.

*PLoS Comput Biol*2008., 4(11): Article ID e1000230 Article ID e1000230 - 21.
Päpper M, Kempter R, Leibold C: Synaptic tagging, evaluation of memories, and the distal reward problem.

*Learn Mem*2011, 18: 58–70. - 22.
van Rossum MC, Shippi M, Barrett AB: Soft-bound synaptic plasticity increases storage capacity.

*PLoS Comput Biol*2012., 8(12): Article ID e1002836 Article ID e1002836 - 23.
Milekic MH, Alberini CM: Temporally graded requirement for protein synthesis following memory reactivation.

*Neuron*2002, 36(3):521–525. 10.1016/S0896-6273(02)00976-5

## Acknowledgements

This work was funded by the German Federal Ministry for Education and Research (BMBF) under grant numbers 01GQ0981 (Bernstein Fokus on Neuronal Basis of Learning: Plasticity of Neuronal Dynamics) and 01GQ1004A (Bernstein Center for Computational Neuroscience Munich).

The authors are grateful for comments and discussions to Álvaro Tejero Cantero, Axel Kammerer, and Alexander Mathis.

## Author information

## Additional information

### Competing Interests

The authors declare that they have no competing interests.

### Authors’ Contributions

DM and CL performed the mathematical analysis. DM carried out the computer simulations. DM and CL drafted the manuscript.

## Authors’ original submitted files for images

## Rights and permissions

## About this article

#### Received

#### Accepted

#### Published

#### DOI

### Keywords

- Associative memory
- Sequence memory
- Memory capacity
- Sparse coding