Open Access

Multiscale analysis of slow-fast neuronal learning models with noise

The Journal of Mathematical Neuroscience20122:13

DOI: 10.1186/2190-8567-2-13

Received: 19 April 2012

Accepted: 26 October 2012

Published: 22 November 2012

Abstract

This paper deals with the application of temporal averaging methods to recurrent networks of noisy neurons undergoing a slow and unsupervised modification of their connectivity matrix called learning. Three time-scales arise for these models: (i) the fast neuronal dynamics, (ii) the intermediate external input to the system, and (iii) the slow learning mechanisms. Based on this time-scale separation, we apply an extension of the mathematical theory of stochastic averaging with periodic forcing in order to derive a reduced deterministic model for the connectivity dynamics. We focus on a class of models where the activity is linear to understand the specificity of several learning rules (Hebbian, trace or anti-symmetric learning). In a weakly connected regime, we study the equilibrium connectivity which gathers the entire ‘knowledge’ of the network about the inputs. We develop an asymptotic method to approximate this equilibrium. We show that the symmetric part of the connectivity post-learning encodes the correlation structure of the inputs, whereas the anti-symmetric part corresponds to the cross correlation between the inputs and their time derivative. Moreover, the time-scales ratio appears as an important parameter revealing temporal correlations.

Keywords

slow-fast systems stochastic differential equations inhomogeneous Markov process averaging model reduction recurrent networks unsupervised learning Hebbian learning STDP

1 Introduction

Complex systems are made of a large number of interacting elements leading to non-trivial behaviors. They arise in various areas of research such as biology, social sciences, physics or communication networks. In particular in neuroscience, the nervous system is composed of billions of interconnected neurons interacting with their environment. Two specific features of this class of complex systems are that (i) external inputs and (ii) internal sources of random fluctuations influence their dynamics. Their theoretical understanding is a great challenge and involves high-dimensional non-linear mathematical models integrating non-autonomous and stochastic perturbations.

Modeling these systems gives rise to many different scales both in space and in time. In particular, learning processes in the brain involve three time-scales: from neuronal activity (fast), external stimulation (intermediate) to synaptic plasticity (slow). Here, fast time-scale corresponds to a few milliseconds and slow time-scale to minutes/hour, and intermediate time-scale generally ranges between fast and slow scales, although some stimuli may be faster than neuronal activity time-scale (e.g., submilliseconds auditory signals [1]). The separation of these time-scales is an important and useful property in their study. Indeed, multiscale methods appear particularly relevant to handle and simplify such complex systems.

First, stochastic averaging principle [2, 3] is a powerful tool to analyze the impact of noise on slow-fast dynamical systems. This method relies on approximating the fast dynamics by its quasi-stationary measure and averaging the slow evolution with respect to this measure. In the asymptotic regime of perfect time-scale separation, this leads to a slow reduced system whose analysis enables a better understanding of the original stochastic model.

Second, periodic averaging theory [4], which has been originally developed for celestial mechanics, is particularly relevant to study the effect of fast deterministic and periodic perturbations (external input) on dynamical systems. This method also leads to a reduced model where the external perturbation is time-averaged.

It seems appropriate to gather these two methods to address our case of a noisy and input-driven slow-fast dynamical system. This combined approach provides a novel way to understand the interactions between the three time-scales relevant in our models. More precisely, we will consider the following class of multiscale stochastic differential equations (SDEs), with ϵ 1 , ϵ 2 > 0 https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq1_HTML.gif two small parameters
{ d v ϵ = 1 ϵ 1 [ F ( v ϵ , w ϵ , u ( t ϵ 2 ) ) ] d t + 1 ϵ 1 Σ d B ( t ) , d w ϵ = G ( v ϵ , w ϵ ) d t , https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_Equ1_HTML.gif
(1)

where v ϵ R p https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq2_HTML.gif represents the fast activity of the individual elements, w ϵ R q https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq3_HTML.gif represents the connectivity weights that vary slowly due to plasticity, and u ( t ) R p https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq4_HTML.gif represents the value of the external input at time t. Random perturbations are included in the form of a diffusion term, and ( B ( t ) ) https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq5_HTML.gif is a standard Brownian motion.

We are interested in the double limit ϵ 1 0 https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq6_HTML.gif and ϵ 2 0 https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq7_HTML.gif to describe the evolution of the slow variable w in the asymptotic regime where both the variable v and the external input are much faster than w. This asymptotic regime corresponds to the study of a neuronal network in which both the external input u and the neuronal activity v operate on a faster time-scale than the slow plasticity-driven evolution of synaptic weights W. To account for the possible difference of time-scales between v and the input, we introduce the time-scale ratio μ = ϵ 1 / ϵ 2 [ 0 , ] https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq8_HTML.gif. In the interesting case where μ ( 0 , ) https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq9_HTML.gif, one needs to understand the long-time behavior of the rescaled periodically forced SDE for any w 0 https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq10_HTML.gif fixed
d v = F ( v , w 0 , μ t ) d t + Σ ( v , w 0 ) d B ( t ) . https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_Equa_HTML.gif
Recently, in an important contribution [5], a precise understanding of the long-time behavior of such processes has been obtained using methods from partial differential equations. In particular, conditions ensuring the existence of a periodic family of probability measures to which the law of v converges as time grows have been identified, together with a sharp estimation of the speed of mixing. These results are at the heart of the extension of the classical stochastic averaging principle [2] to the case of periodically forced slow-fast SDEs [6]. As a result, we obtain a reduced equation describing the slow evolution of variable w in the form of an ordinary differential equation,
d w d t = G ¯ ( w ) , https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_Equb_HTML.gif

where G ¯ https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq11_HTML.gif is constructed as an average of G with respect to a specific probability measure, as explained in Section 2.

This paper first introduces the appropriate mathematical framework and then focuses on applying these multiscale methods to learning neural networks.

The individual elements of these networks are neurons or populations of neurons. A common assumption at the basis of mathematical neuroscience [7] is to model their behavior by a stochastic differential equation which is made of four different contributions: (i) an intrinsic dynamics term, (ii) a communication term, (iii) a term for the external input, and (iv) a stochastic term for the intrinsic variability. Assuming that their activity is represented by the fast variable v R n https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq12_HTML.gif, the first equation of system (1) is a generic representation of a neural network (function F corresponds to the first three terms contributing to the dynamics). In the literature, the level of non-linearity of the function F ranges from a linear (or almost-linear) system to spiking neuron dynamics [8], yet the structure of the system is universal.

These neurons are interconnected through a connectivity matrix which represents the strength of the synapses connecting the real neurons together. The slow modification of the connectivity between the neurons is commonly thought to be the essence of learning. Unsupervised learning rules update the connectivity exclusively based on the value of the activity variable. Therefore, this mechanism is represented by the slow equation above, where w R n × n https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq13_HTML.gif is the connectivity matrix and G is the learning rule. Probably the most famous of these rules is the Hebbian learning rule introduced in [9]. It says that if both neurons A and B are active at the same time, then the synapses from A to B and B to A should be strengthened proportionally to the product of the activity of A and B. There are many different variations of this correlation-based principle which can be found in [10, 11]. Another recent, unsupervised, biologically motivated learning rule is the spike-timing-dependent plasticity (STDP) reviewed in [12]. It is similar to Hebbian learning except that it focuses on causation instead of correlation and that it occurs on a faster time-scale. Both of these types of rule correspond to G being quadratic in v.

Previous literature about dynamic learning networks is thick, yet we take a significantly different approach to understand the problem. An historical focus was the understanding of feedforward deterministic networks [1315]. Another approach consisted in precomputing the connectivity of a recurrent network according to the principles underlying the Hebbian rule [16]. Actually, most of current research in the field is focused on STDP and is based on the precise times of the spikes, making them explicit in computations [1720]. Our approach is different from the others regarding at least one of the following points: (i) we consider recurrent networks, (ii) we study the evolution of the coupled system activity/connectivity, and (iii) we consider bounded dynamical systems for the activity without asking them to be spiking. Besides, our approach is a rigorous mathematical analysis in a field where most results rely heavily on heuristic arguments and numerical simulations. To our knowledge, this is the first time such models expressed in a slow-fast SDE formalism are analyzed using temporal averaging principles.

The purpose of this application is to understand what the network learns from the exposition to time-dependent inputs. In other words, we are interested in the evolution of the connectivity variable, which evolves on a slow time-scale, under the influence of the external input and some noise added on the fast variable. More precisely, we intend to explicitly compute the equilibrium connectivities of such systems. This final matrix corresponds to the knowledge the network has extracted from the inputs. Although the derivation of the results is mathematically tough for untrained readers, we have tried to extract widely understandable conclusions from our mathematical results and we believe this paper brings novel elements to the debate about the role and mechanisms of learning in large scale networks.

Although the averaging method is a generic principle, we have made significant assumptions to keep the analysis of the averaged system mathematically tractable. In particular, we will assume that the activity evolves according to a linear stochastic differential equation. This is not very realistic when modeling individual neurons, but it seems more reasonable to model populations of neurons; see Chapter 11 of [7].

The paper is organized as follows. Section 2 is devoted to introducing the temporal averaging theory. Theorem 2.2 is the main result of this section. It provides the technical tool to tackle learning neural networks. Section 3 corresponds to application of the mathematical tools developed in the previous section onto the models of learning neural networks. A generic model is described and three different particular models of increasing complexity are analyzed. First, Hebbian learning, then trace-learning, and finally STDP learning are analyzed for linear activities. Finally, Section 4 is a discussion of the consequences of the previous results from the viewpoint of their biological interpretation.

2 Averaging principles: theory

In this section, we present multiscale theoretical results concerning stochastic averaging of periodically forced SDEs (Section 2.3). These results combine ideas from singular perturbations, classical periodic averaging and stochastic averaging principles. Therefore, we recall briefly, in Sections 2.1 and 2.2, several basic features of these principles, providing several examples that are closely related to the application developed in Section 3.

2.1 Periodic averaging principle

We present here an example of a slow-fast ordinary differential equation perturbed by a fast external periodic input. We have chosen this example since it readily illustrates many ideas that will be developed in the following sections. In particular, this example shows how the ratio between the time-scale separation of the system and the time-scale of the input appears as a new crucial parameter.

Example 2.1 Consider the following linear time-inhomogeneous dynamical system with ϵ 1 , ϵ 2 > 0 https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq14_HTML.gif two parameters:
d v ϵ d t = 1 ϵ 1 ( v ϵ + sin ( t ϵ 2 ) ) , d w ϵ d t = w ϵ + v ϵ 2 . https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_Equc_HTML.gif
This system is particularly handy since one can solve analytically the first ordinary differential equation, that is,
v ( t ) = 1 1 + μ 2 ( sin ( t ϵ 2 ) μ cos ( t ϵ 2 ) ) + v 0 e t ϵ 1 , https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_Equd_HTML.gif
where we have introduced the time-scales ratio
μ : = ϵ 1 ϵ 2 . https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_Eque_HTML.gif

In this system, one can distinguish various asymptotic regimes when ϵ 1 https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq15_HTML.gif and ϵ 2 https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq16_HTML.gif are small according to the asymptotic value of μ:

  • Regime 1: Slow input μ = 0 https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq17_HTML.gif:

First, if ϵ 1 0 https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq6_HTML.gif and ϵ 2 https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq16_HTML.gif is fixed, then v ( t ) https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq18_HTML.gif is close to sin ( t ϵ 2 ) https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq19_HTML.gif, and from geometric singular perturbation theory [21, 22] one can approximate the slow variable w ϵ https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq20_HTML.gif by the solution of
d w d t = w + ( sin ( t ϵ 2 ) ) 2 . https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_Equf_HTML.gif
Now taking the limit ϵ 2 0 https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq7_HTML.gif and applying the classical averaging principle [4] for periodically driven differential equations, one can approximate w ϵ https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq20_HTML.gif by the solution of
d w d t = w + 1 2 , https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_Equg_HTML.gif

since 1 2 π 0 2 π sin ( s ) 2 d s = 1 2 https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq21_HTML.gif.

  • Regime 2: Fast input μ = https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq22_HTML.gif:

If ϵ 2 0 https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq7_HTML.gif and ϵ 1 https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq15_HTML.gif is fixed, then the classical averaging principle implies that v ϵ https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq23_HTML.gif is close to the solution of
d v d t = v ϵ 1 , https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_Equh_HTML.gif
so that w ϵ https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq20_HTML.gif can be approximated by
d w d t = w + ( v 0 e t / ϵ 1 ) 2 , https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_Equi_HTML.gif

and when ϵ 1 0 https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq6_HTML.gif, one does not recover the same asymptotic behavior as in Regime 1.

  • Regime 3: Time-scales matching 0 < μ < https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq24_HTML.gif:

Now consider the intermediate case where ϵ 1 https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq15_HTML.gif is asymptotically proportional to ϵ 2 https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq16_HTML.gif. In this case, v ϵ https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq23_HTML.gif can be approximated on the fast time-scale t / ϵ 1 https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq25_HTML.gif by the periodic solution v ¯ μ ( t ) = 1 1 + μ 2 ( sin ( μ t ) μ cos ( μ t ) ) https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq26_HTML.gif of d v d t = v + sin ( μ t ) https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq27_HTML.gif. As a consequence, w ϵ https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq20_HTML.gif will be close to the solution of
d w d t = w + 1 2 ( 1 + μ 2 ) , https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_Equj_HTML.gif

since 1 2 π 0 2 π v ¯ μ ( t / μ ) 2 d t = 1 2 ( 1 + μ 2 ) https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq28_HTML.gif.

Thus, we have seen in this example that
  1. 1.

    the two limits ϵ 1 0 https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq6_HTML.gif and ϵ 2 0 https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq7_HTML.gif do not commute,

     
  2. 2.

    the ratio μ between the internal time-scale separation ϵ 1 https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq15_HTML.gif and the input time-scale ϵ 2 https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq16_HTML.gif is a key parameter in the study of slow-fast systems subject to a time-dependent perturbation.

     

2.2 Stochastic averaging principle

Time-scales separation is a key property to investigate the dynamical behavior of non-linear multiscale systems, with techniques ranging from averaging principles to geometric singular perturbation theory. This property appears to be also crucial to understanding the impact of noise. Instead of carrying a small noise analysis, a multiscale approach based on the stochastic averaging principle [2] can be a powerful tool to unravel subtle interplays between noise properties and non-linearities. More precisely, consider a system of SDEs in R p + q https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq29_HTML.gif:
d v t ϵ = 1 ϵ F ( v t ϵ , w t ϵ ) d t + 1 ϵ Σ ( v t ϵ , w t ϵ ) d B ( t ) , d w t ϵ = G ( v t ϵ , w t ϵ ) d t , https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_Equk_HTML.gif

with initial conditions v ϵ ( 0 ) = v 0 https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq30_HTML.gif, w ϵ ( 0 ) = w 0 https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq31_HTML.gif, and where w ϵ R q https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq32_HTML.gif is called the slow variable, v ϵ R p https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq33_HTML.gif is the fast variable, with F, G, Σ smooth functions ensuring the existence and uniqueness for the solution ( v ϵ , w ϵ ) https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq34_HTML.gif, and B ( t ) https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq35_HTML.gif a p-dimensional standard Brownian motion, defined on a filtered probability space ( Ω , F , P ) https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq36_HTML.gif. Time-scale separation in encoded in the small parameter ϵ, which denotes in this section a single positive real number.

In order to approximate the behavior of ( v ϵ , w ϵ ) https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq37_HTML.gif for small ϵ, the idea is to average out the equation for the slow variable with respect to the stationary distribution of the fast one. More precisely, one first assumes that for each w R q https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq38_HTML.gif fixed, the frozen fast SDE,
d v t = F ( v t , w ) d t + Σ ( v t , w ) d B ( t ) , https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_Equl_HTML.gif
admits a unique invariant measure, denoted ρ w ( d v ) https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq39_HTML.gif. Then, one defines the averaged drift vector field G ¯ https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq11_HTML.gif
G ¯ ( w ) : = R m G ( v , w ) ρ w ( d v ) https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_Equ2_HTML.gif
(2)

and w the solution of d w d t = G ¯ ( w ) https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq40_HTML.gif with the initial condition w ( 0 ) = y 0 https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq41_HTML.gif. Under some dissipativity assumptions, the stochastic averaging principle [2] states:

Theorem 2.1 For any δ > 0 https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq42_HTML.gif and T > 0 https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq43_HTML.gif,
lim ϵ 0 P [ sup t [ 0 , T ] w t ϵ w t 2 > δ ] = 0 . https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_Equ3_HTML.gif
(3)

As a consequence, analyzing the behavior of the deterministic solution w can help to understand useful features of the stochastic process ( v ϵ , w ϵ ) https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq34_HTML.gif.

Example 2.2 In this example we consider a similar system as in Example 2.1, but with a noise term instead of the periodic perturbation. Namely, we consider ( v ϵ , w ϵ ) https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq34_HTML.gif the solution of the system of SDEs,
d v ϵ = 1 ϵ v ϵ d t + σ ϵ d B ( t ) , d w ϵ = ( w ϵ + ( v ϵ ) 2 ) d t , https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_Equm_HTML.gif
with ϵ > 0 https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq44_HTML.gif a small parameter and σ > 0 https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq45_HTML.gif a positive constant. From Theorem 2.1, the stochastic slow variable w ϵ https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq46_HTML.gif can be approximated in the sense of (3) by the deterministic solution w of
d w d t = v R ( w + v 2 ) ρ ( d v ) , https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_Equn_HTML.gif
where ρ ( d v ) https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq47_HTML.gif is the stationary measure of the linear diffusion process,
d v = v d t + σ d B ( t ) , https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_Equo_HTML.gif
that is,
ρ ( d v ) = 1 σ π e v 2 σ 2 . https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_Equp_HTML.gif
Consequently, w ϵ https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq46_HTML.gif can be approximated in the limit ϵ 0 https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq48_HTML.gif by the solution of
d w d t = w + σ 2 2 . https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_Equq_HTML.gif
Applying (3) leads to the following result: for any T > 0 https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq43_HTML.gif and δ > 0 https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq42_HTML.gif,
lim ϵ 0 P [ sup t [ 0 , T ] | w t ϵ ( y 0 σ 2 2 ) e t + σ 2 2 | 2 > δ ] = 0 . https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_Equr_HTML.gif

Interestingly, the asymptotic behavior of w ϵ https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq46_HTML.gif for small ϵ is characterized by a deterministic trajectory that depends on the strength σ of the noise applied to the system. Thus, the stochastic averaging principle appears particularly interesting when unraveling the impact of noise strength on slow-fast systems.

Many other results have been developed since, extending the set-up to the case where the slow variable has a diffusion component or to infinite-dimensional settings for instance, and also refining the convergence study, providing homogenization results concerning the limit of ϵ 1 / 2 ( w ϵ w ) https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq49_HTML.gif or establishing large deviation principles (see [23] for a recent monograph). However, fewer results are available in the case of non-homogeneous SDEs, that is, when the system is perturbed by an external time-dependent signal. This setting is of particular interest in the framework of stochastic learning models, and we present the main relevant mathematical results in the following section.

2.3 Double averaging principle

Combining ideas of periodic and stochastic averaging introduced previously, we present here theoretical results concerning multiscale SDEs driven by an external time-periodic input. Consider ( v ϵ , w ϵ ) https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq50_HTML.gif the solution of
d v ϵ = 1 ϵ 1 [ F ( v ϵ , w ϵ , t ϵ 2 ) ] d t + 1 ϵ 1 Σ ( v ϵ , w ϵ ) d B ( t ) , d w ϵ = G ( v ϵ , w ϵ ) d t , https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_Equ4_HTML.gif
(4)

with t F ( v , w , t ) R p https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq51_HTML.gif a τ-periodic function and ϵ = ( ϵ 1 , ϵ 2 ) R + 2 https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq52_HTML.gif. Parameter ϵ 1 https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq15_HTML.gif represents the internal time-scale separation and ϵ 2 https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq16_HTML.gif the input time-scale. We consider the case where both ϵ 1 https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq15_HTML.gif and ϵ 2 https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq16_HTML.gif are small, that is, a strong time-scale separation between the fast variable v ϵ R p https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq2_HTML.gif and the slow one w ϵ R q https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq3_HTML.gif, and a fast periodic modulation of the fast drift F ( v , w , ) https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq53_HTML.gif.

We further denote z = ( v , w ) https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq54_HTML.gif.

Definition 2.1 We define the asymptotic time-scale ratio
μ : = lim | ϵ | 0 ϵ 1 ϵ 2 . https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_Equ5_HTML.gif
(5)

Accordingly, we denote lim | ϵ | 0 μ https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq55_HTML.gif the distinguished limit when ϵ 1 0 https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq6_HTML.gif, ϵ 2 0 https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq7_HTML.gif with ϵ 1 / ϵ 2 μ https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq56_HTML.gif.

The following assumption is made to ensure existence and uniqueness of a strong solution to system (4). In the following, z 1 , z 2 https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq57_HTML.gif will denote the usual scalar product for vectors.

Assumption 2.1 Existence and uniqueness of a strong solution
  1. (i)
    The functions F, G, and Σ are locally Lipschitz continuous in the space variable z. More precisely, for any R > 0 https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq58_HTML.gif, there exists a constant α R https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq59_HTML.gif such that
    F ( z ) F ( z ) α R z z for any  z , z R p + q  with  z R  and  z R . https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_Equs_HTML.gif
     
  2. (ii)
    There exists a constant R > 0 https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq58_HTML.gif such that
    sup z > R , t > 0 ( F ( z , t ) , G ( z ) ) , z z 2 < 0 . https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_Equt_HTML.gif
     

To control the asymptotic behavior of the fast variable, one further assumes the following.

Assumption 2.2 Asymptotic behavior of the fast process:
  1. (i)
    The diffusion matrix Σ is bounded
    M Σ > 0 s.t.  z , Σ ( z ) < M Σ https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_Equu_HTML.gif
     
and uniformly non-degenerate
η 0 > 0 s.t.  v , z Σ ( z ) Σ ( z ) v , v η 0 v 2 . https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_Equv_HTML.gif
  1. (ii)
    There exists r 0 < 0 https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq60_HTML.gif such that for all t 0 https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq61_HTML.gif and for all z , x R p + q https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq62_HTML.gif,
    z F ( z , t ) x , x r 0 x 2 . https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_Equw_HTML.gif
     

According to the value of μ { 0 , R + , } https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq63_HTML.gif, the stochastic averaging principle is based on a description of the asymptotic behavior of various rescaled fast frozen processes. More precisely, under Assumptions 2.1 and 2.2, one can deduce that:

  • For any fixed w 0 R q https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq64_HTML.gif and t 0 > 0 https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq65_HTML.gif fixed, the law of the rescaled time-homogeneous frozen process,
    d v = F ( v , w 0 , t 0 ) d t + Σ ( v , w 0 ) d B ( t ) , https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_Equx_HTML.gif

converges exponentially fast to a unique invariant probability measure denoted by ρ w 0 , t 0 ( d v ) https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq66_HTML.gif.

  • For any fixed w 0 R q https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq64_HTML.gif, there exists a τ μ https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq67_HTML.gif-periodic evolution system of measures ν μ w 0 ( t , d v ) https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq68_HTML.gif, different from ρ w 0 , t ( d v ) https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq69_HTML.gif above, such that the law of the rescaled time-inhomogeneous frozen process,
    d v = F ( v , w 0 , μ t ) d t + Σ ( v , w 0 ) d B ( t ) , https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_Equ6_HTML.gif
    (6)

converges exponentially fast towards ν μ w 0 ( t , ) https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq70_HTML.gif, uniformly with respect to w 0 https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq10_HTML.gif (cf. the Appendix Theorem A.1).

  • For any fixed w 0 R q https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq64_HTML.gif, the law of the rescaled time-homogeneous frozen process,
    d v = F ¯ ( v , w 0 ) d t + Σ ( v , w 0 ) d B ( t ) , https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_Equy_HTML.gif

where F ¯ ( v , w 0 ) : = τ 1 0 τ F ( v , w 0 , t ) d t https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq71_HTML.gif, converges exponentially fast towards a unique invariant probability measure denoted by ρ ¯ w 0 ( d v ) https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq72_HTML.gif.

According to the value of μ, we introduce a vector field G ¯ μ https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq73_HTML.gif which will play a role similar to G ¯ https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq11_HTML.gif introduced in equation (2).

Definition 2.2 We define G ¯ μ : R q R q https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq74_HTML.gif as follows. In the time-scale matching case, that is, when 0 < μ < https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq75_HTML.gif, then
G ¯ μ ( w ) : = ( τ μ ) 1 0 τ μ v R p G ( v , w ) ν μ w ( t , d v ) d t . https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_Equ7_HTML.gif
(7)

Notation We may denote the periodic system of measures ν μ w ( t , d v ) https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq76_HTML.gif associated with (6) by ν μ w [ F , Σ ] ( t , d v ) https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq77_HTML.gif to emphasize its relationship with F and Σ. Accordingly, we may denote G ¯ μ ( w ) https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq78_HTML.gif by G ¯ μ [ F , Σ ] ( w ) https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq79_HTML.gif.

We are now able to present our main mathematical result. Extending Theorem 2.1, the following theorem describes the asymptotic behavior of the slow variable w ϵ https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq80_HTML.gif when ϵ 0 https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq48_HTML.gif with ϵ 1 / ϵ 2 μ https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq56_HTML.gif. We refer to [6] for more details about the full mathematical proof of this result.

Theorem 2.2 Let μ ( 0 , ) https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq9_HTML.gif. If w is the solution of
d w d t = G ¯ μ ( w ) with  w ( 0 ) = w ϵ ( 0 ) , https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_Equ8_HTML.gif
(8)
then the following convergence result holds, for all T > 0 https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq43_HTML.gif and δ > 0 https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq42_HTML.gif:
lim | ϵ | 0 μ P [ sup t [ 0 , T ] | w t ϵ w t | 2 > δ ] = 0 . https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_Equz_HTML.gif
Remark 2.1
  1. 1.

    The extremal cases μ = 0 https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq81_HTML.gif and μ = https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq22_HTML.gif are not covered in full rigor by Theorem 2.2. However, the study of the sequential limits ϵ 1 0 https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq6_HTML.gif followed by ϵ 2 0 https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq82_HTML.gif or ϵ 2 0 https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq7_HTML.gif followed by ϵ 1 0 https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq83_HTML.gif can be deduced from an appropriate combination of classical periodic and stochastic averaging theorems:

     
  • Slow input: If one considers the case where the limit ϵ 1 0 https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq6_HTML.gif is taken first, so that from Theorem 2.1 with fast variable v ϵ https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq84_HTML.gif and slow variables w ϵ https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq85_HTML.gif and t (with the trivial equation t ˙ = 1 https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq86_HTML.gif), w ϵ https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq80_HTML.gif is close in probability on finite time-intervals to the solution of the following inhomogeneous ordinary differential equation:
    d w ˜ d t = v R p G ( v , w ˜ ) ρ w ˜ , t / ϵ 2 ( d v ) : = G ˜ ( w ˜ , t / ϵ 2 ) . https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_Equaa_HTML.gif
Then taking the limit ϵ 2 0 https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq7_HTML.gif, one can apply the deterministic averaging principle to the fast periodic vector field G ˜ ( w , t / ϵ 2 ) https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq87_HTML.gif, so that w ˜ https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq88_HTML.gif converges when ϵ 2 0 https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq7_HTML.gif to the solution of
d w d t = τ 1 0 τ G ˜ ( v , w ) d t = G ¯ 0 ( w ) , https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_Equab_HTML.gif
where
G ¯ 0 ( w ) : = τ 1 0 τ v R p G ( v , w ) ρ w , t ( d v ) d t . https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_Equac_HTML.gif
  • Fast input: If the limit ϵ 2 0 https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq89_HTML.gif is taken first, one first has to perform a classical averaging of the periodic drift F ( v , w , t / ϵ 2 ) https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq90_HTML.gif leading to the homogeneous system of SDEs (4), but with F ¯ ( v , w ) https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq91_HTML.gif instead of F ( v , w , t / ϵ 2 ) https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq90_HTML.gif. Then, an application of Theorem 2.1 on this system gives an averaged vector field
    G ¯ ( w ) : = v R p G ( v , w ) ρ ¯ w ( d v ) . https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_Equad_HTML.gif
  1. 2.

    To study the extremal cases μ = 0 https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq17_HTML.gif and μ = https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq22_HTML.gif in full generality, one would need to consider all the possible relationships between ϵ 1 https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq92_HTML.gif and ϵ 2 https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq93_HTML.gif, not only the linear one as in the present article, but also of the type ϵ 1 = ϵ 2 α https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq94_HTML.gif for example. In this case, we believe that the regime α < 1 https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq95_HTML.gif converges to the same limit as taking the limit ϵ 2 https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq93_HTML.gif first and the regime α > 1 https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq96_HTML.gif corresponds to taking the limit ϵ 1 https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq92_HTML.gif first. The intermediate regime α = 1 https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq97_HTML.gif seems to be the only one for which the limit cannot be obtained by combining classical averaging principles. Therefore, the present article is focused on this case, in which the averaged system depends explicitly on the scaling parameter μ. Moreover, in terms of applications, this parameter can have a relatively easy interpretation in terms of the ratio of time-scales between intrinsic neuronal activity and typical stimulus time-scales in a given situation. Although the zeroth order limit (i.e., the averaged system) seems to depend only on the position of α with respect to 1, it seems reasonable to expect that the fluctuations around the limit would depend on the precise value of α. This is a difficult question which may deserve further analysis.

     
The case 0 < μ < https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq98_HTML.gif is already very rich in the sense that it combines simultaneously both the periodic and stochastic averaging principles in a new way that cannot be recovered by sequential applications of those principles. A particular role is played by the frozen periodically-forced SDE (6). The equivalent of the quasi-stationary measure ρ w https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq99_HTML.gif of Theorem 2.1 is given by the asymptotically periodic behavior of equation (6), represented by the periodic family of measures ν μ w ( t , d v ) https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq76_HTML.gif.
  1. 3.
    By a rescaling of the frozen process (6), one deduces the following scaling relationships:
    ν μ w [ F , Σ ] ( t , d v ) = ν 1 w [ F μ , Σ μ ] ( μ t , d v ) https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_Equae_HTML.gif
     
and
G ¯ μ [ F , Σ ] ( w ) = G ¯ 1 [ F μ , Σ μ ] ( w ) . https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_Equaf_HTML.gif
Therefore, if one knows, in the case μ = 1 https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq100_HTML.gif, the averaged vector field associated with the fast process generated by a drift F and a diffusion coefficient σ, denoted G ¯ 1 [ F , Σ ] https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq101_HTML.gif, it is possible to deduce G ¯ μ https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq73_HTML.gif in the general case μ ( 0 , ) https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq102_HTML.gif with a change F μ F https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq103_HTML.gif and Σ μ Σ https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq104_HTML.gif.
  1. 4.

    It seems reasonable to expect that the above result is still valid when considering ergodic, but not necessarily periodic, time dependency of the function F ( v , w , ) https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq105_HTML.gif. In equation (7), instead of integrating ν μ w ( t , d v ) https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq76_HTML.gif over one period, one should integrate it with respect to an ergodic stationary measure. However, this extension requires non-trivial technical improvements of [5] which are beyond the scope of this paper.

     

2.3.1 Case of a fast linear SDE with periodic input

We present here an elementary case where one can compute explicitly the quasi-stationary time-periodic family of measures ν μ w ( t , x ) https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq106_HTML.gif, when the equation for the fast variable is linear. Namely, we consider v R p https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq107_HTML.gif the solution of
d v ( t ) = ( A v ( t ) + u ( μ t ) ) d t + Σ d B ( t ) , https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_Equag_HTML.gif

with initial condition v ( 0 ) = v 0 R p https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq108_HTML.gif, and where A R p × p https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq109_HTML.gif is a matrix whose eigenvalues have positive real parts and u ( ) https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq110_HTML.gif is a τ-periodic function.

We are interested in the large time behavior of the law of v ( t ) https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq111_HTML.gif, which is a time-inhomogeneous Ornstein-Uhlenbeck process. From [5] we know that its law converges to a τ-periodic family of probability measures ν ( t , d v ) https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq112_HTML.gif. Due to the linearity in the previous equation, ν ( t , d v ) https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq112_HTML.gif is Gaussian with a time-dependent mean and a constant covariance matrix
ν ( t , d v ) = N v ¯ ( t ) , Q ( d v ) , https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_Equah_HTML.gif
where v ¯ https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq113_HTML.gif is the τ μ https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq114_HTML.gif-periodic attractor of d v ¯ d t = A v ¯ ( t ) + u ( μ t ) https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq115_HTML.gif, i.e.,
v ¯ ( t ) = t e A ( t s ) u ( μ s ) d s , https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_Equai_HTML.gif
and Q is the unique solution of the Lyapunov equation
A Q + Q A + Σ Σ = 0 . https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_Equ9_HTML.gif
(9)
Indeed, if one denotes c ( t ) = v ( t ) v ¯ ( t ) https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq116_HTML.gif, then c ( t ) https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq117_HTML.gif is a solution of the classical homogeneous Ornstein-Uhlenbeck equation
d c ( t ) = A c ( t ) d t + Σ d B ( t ) , https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_Equaj_HTML.gif

whose stationary distribution is known to be a centered Gaussian measure with the covariance matrix Q solution of (9); see Chapter 3.2 of [24]. Notice that if A is self-adjoint with respect to ( Σ Σ ) 1 https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq118_HTML.gif (i.e., A ( Σ Σ ) = ( Σ Σ ) A https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq119_HTML.gif), then the solution is Q = A 1 ( Σ Σ ) 2 = ( Σ Σ ) A 1 2 https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq120_HTML.gif, which will be used in Appendix B.2.

Hence, in the linear case, the averaged vector field of equation (7) becomes
G ¯ μ ( w ) : = ( τ μ ) 1 0 τ μ v R p G ( v ¯ ( t ) + v , w ) N 0 , Q ( d v ) d t , https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_Equ10_HTML.gif
(10)

where N x , Q https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq121_HTML.gif is the probability density function of the Gaussian law with mean x R q https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq122_HTML.gif and covariance Q R p × p https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq123_HTML.gif.

Therefore, due to the linearity of the fast SDE, the periodic system of measure ν is just a constant Gaussian distribution shifted by a periodic function of time v ( t ) https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq124_HTML.gif. In case G is quadratic in v, this remark implies that one can perform independently the integral over time and over R p https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq125_HTML.gif in formula (10) (noting that the crossed term has a zero average). In this case, contributions from the periodic input and from noise appear in the averaged vector field in an additive way.

Example 2.3 In this last example, we consider a combination between Example 2.1 and Example 2.2, namely we consider the following system of periodically forced SDEs:
d v ϵ = 1 ϵ 1 [ v ϵ + sin ( t ϵ 2 ) ] d t + σ ϵ 1 d B ( t ) , d w ϵ = ( w ϵ + ( v ϵ ) 2 ) d t . https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_Equak_HTML.gif

As in Example 2.1 and as shown above, the behavior of this system when both ϵ 1 https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq15_HTML.gif and ϵ 2 https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq16_HTML.gif are small depends on the parameter μ defined in (5). More precisely, we have the following three regimes:

  • Regime 1: slow input:
    G ¯ 0 ( w ) = w + σ 2 2 + 1 2 . https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_Equal_HTML.gif
  • Regime 2: fast input:
    G ¯ ( w ) = w + σ 2 2 . https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_Equam_HTML.gif
  • Regime 3: time-scale matching:
    G ¯ μ ( w ) = w + σ 2 2 + 1 2 ( 1 + μ 2 ) . https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_Equan_HTML.gif

2.4 Truncation and asymptotic well-posedness

In some cases, Assumptions 2.1-2.2 may not be satisfied on the entire phase space R p × R q https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq126_HTML.gif, but only on a subset. Such situations will appear in Section 3 when considering learning models. We introduce here a more refined set of assumptions ensuring that Theorem 2.2 still applies.

Let us start with an example, namely the following bi-dimensional system with white noise input:
{ d v ϵ = 1 ϵ ( l v ϵ + w ϵ v ϵ ) d t + σ ϵ d B ( t ) , d w ϵ = ( κ w ϵ + ( v ϵ ) 2 ) d t , https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_Equ11_HTML.gif
(11)

with ϵ > 0 https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq44_HTML.gif, σ > 0 https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq45_HTML.gif, l > 0 https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq127_HTML.gif, μ > 0 https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq128_HTML.gif.

For the fast drift ( l w ) v https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq129_HTML.gif to be non-explosive, it is necessary to have w < l α https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq130_HTML.gif with α > 0 https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq131_HTML.gif for all time. The concern about this system comes from the fact that the slow variable w may reach l due to the fluctuations captured in the term v 2 https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq132_HTML.gif, for instance, if κ is not large enough. Such a system may have exponentially growing trajectories. However, we claim that for small enough ϵ, w ϵ https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq46_HTML.gif will remain close to its averaged limit w for a very long time, and if this limit remains below l α https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq133_HTML.gif, then w ϵ https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq46_HTML.gif can be considered as well-posed in the asymptotic limit ϵ 0 https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq48_HTML.gif. To make this argument more rigorous, we suggest the following definition.

Definition 2.3 A stochastic differential equation with a given initial condition is asymptotically well posed in probability if for the given initial condition,
  1. 1.

    a unique solution exists until a random time τ ϵ https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq134_HTML.gif

     
  2. 2.
    for all T > 0 https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq43_HTML.gif,
    lim ϵ 0 P [ τ ϵ T ] = 1 . https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_Equao_HTML.gif
     

We give in the following proposition sufficient conditions for system (4) to be asymptotically well posed in probability and to satisfy conclusions of Theorem 2.2.

Let us introduce the following set of additional assumptions.

Assumption 2.3 Moment conditions:
  1. (i)
    There exists p > 2 https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq135_HTML.gif such that
    for any  T > 0 , sup ϵ E [ sup 0 t T v t ϵ p + w t ϵ p ] < . https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_Equap_HTML.gif
     
  2. (ii)
    For any T > 0 https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq43_HTML.gif and any bounded subset K of R q https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq136_HTML.gif,
    sup ϵ 1 > 0 , ϵ 2 > 0 , w K E [ sup 0 t T G ( v t ϵ , w ) 2 ] < . https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_Equaq_HTML.gif
     

Remark 2.2 This last set of assumptions will be satisfied in all the applications of Section 3 since we consider linear models with additive noise for the equation of v, implying this variable to be Gaussian and the function G only involves quadratic moments of v; therefore, the moment conditions (i) and (ii) will be satisfied without any difficulty. Moreover, if one considers non-linear models for the variable v, then the Gaussian property may be lost; however, adding sigmoidal non-linearity has, in general, the effect of bounding the dynamics, thus making these moment assumptions reasonable to check in most models of interest.

Property 2.3 If there exists a subset of R q https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq136_HTML.gif such that
  1. 1.

    The functions F, G, Σ satisfy Assumptions  2.1-2.3 restricted on R p × E https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq137_HTML.gif.

     
  2. 2.

    is invariant under the flow of G ¯ μ https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq73_HTML.gif, as defined in (7).

     

Then for any initial condition w 0 E https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq138_HTML.gif, system (4) is asymptotically well posed in probability and w ϵ https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq80_HTML.gif satisfies the conclusion of Theorem  2.2.

Proof See Appendix A.2. □

Here, we show that it applies to system (11). First, with E α = { w R , w < l α } https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq139_HTML.gif, for some α ] 0 , l [ https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq140_HTML.gif, it is possible to show that Assumptions 2.1-2.2 are satisfied on R p × E α https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq141_HTML.gif. Then, as a special case of (10), we obtain the following averaged system:
d w d t = κ w + σ 2 2 ( l w ) : = G ¯ ( w ) . https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_Equar_HTML.gif
It remains to check that the solution of this system satisfies
α > 0 , such that  w ( 0 ) < l α t > 0 , w ( t ) < l α , https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_Equas_HTML.gif

that is, the subset E α https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq142_HTML.gif is invariant under the flow of G ¯ https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq11_HTML.gif.

This property is satisfied as soon as
η : = 2 σ 2 κ l 2 < 1 . https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_Equat_HTML.gif
Indeed, one can show that G ¯ ( w ) = 0 https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq143_HTML.gif admits two solutions iff η < 1 https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq144_HTML.gif,
w ± = l 2 ( 1 ± 1 η ) ( 0 , l ) , https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_Equau_HTML.gif

and that w https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq145_HTML.gif is stable whereas w + https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq146_HTML.gif is unstable. Thus, if w ( 0 ) < l α https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq147_HTML.gif with α = l w + > 0 https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq148_HTML.gif, then w ( t ) < l α https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq149_HTML.gif for all t > 0 https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq150_HTML.gif. In fact, the invariance property is true for all α ] l w , l w + [ https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq151_HTML.gif.

3 Averaging learning neural networks

In this section, we apply the temporal averaging methods derived in Section 2 on models of unsupervised learning neural networks. First, we design a generic learning model and show that one can define formally an averaged system with equation (7). However, going beyond the mere definition of the averaged system seems very difficult and we only manage to get explicit results for simple systems where the fast activity dynamics is linear. In the three last subsections, we push the analysis for three examples of increasing complexity.

In the following, we always consider that the initial connectivity is 0. This is an arbitrary choice but without consequences, because we focus on the regime where there is a single globally stable equilibrium point (see Section 3.2.3).

3.1 A generic learning neural network

We now introduce a large class of stochastic neuronal networks with learning models. They are defined as coupled systems describing the simultaneous evolution of the activity of n N https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq152_HTML.gif neurons and the connectivity between them. We define v R n https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq12_HTML.gif, the activity field of the network, and W R n × n https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq153_HTML.gif, the connectivity matrix.

Each neuron variable v i https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq154_HTML.gif is assumed to follow the SDE
d v i = ( f i ( v i ) + u i ) d t + Σ d B i ( t ) , https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_Equav_HTML.gif

where the function f i https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq155_HTML.gif characterizes the intrinsic non-linear dynamical behavior of neuron i and u i https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq156_HTML.gif is the input received by neuron i. The stochastic term Σ d B i ( t ) https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq157_HTML.gif is added to account for internal sources of noise. In terms of notations, ( B ( t ) ) t 0 https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq158_HTML.gif is a standard n-dimensional Brownian motion, Σ is an n × n https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq159_HTML.gif matrix, possibly function of v or other variables, and Σ d B i ( t ) https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq157_HTML.gif denotes the i th component of the vector Σ d B ( t ) https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq160_HTML.gif.

The input u i https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq156_HTML.gif to neuron i has mainly two components: the external input u i ext https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq161_HTML.gif and the input coming from other neurons in the network u i syn https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq162_HTML.gif. The latter is a priori a complex combination of post-synaptic potentials coming from many other neurons. The coefficient W i j https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq163_HTML.gif of the connectivity matrix accounts for the strength of a synapse j i https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq164_HTML.gif. Note that neurons can be connected to themselves, i.e., W i i https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq165_HTML.gif is not necessarily null. Thus, we can write
u i syn : = S ( j = 1 n W i j H ( v i , v j ) ) , https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_Equaw_HTML.gif

where S : R R https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq166_HTML.gif and is a function taking the history of v i https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq154_HTML.gif and v j https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq167_HTML.gif and returning a real for each time t (to take convolutions into account). In practical cases, they are often taken to be sigmoidal functions. We abusively redefine S https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq168_HTML.gif and as vector valued operators corresponding to the element-wise application of their real counterparts. We also define the function F : R n R n https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq169_HTML.gif such that F ( v ) i = f i ( v i ) https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq170_HTML.gif. Together with a slow generic learning rule, this leads to defining a stochastic learning model as the following system of SDEs.

Definition 3.1
{ d v ϵ = 1 ϵ [ F ( v ϵ ) + S ( W ϵ H ( v ϵ ) ) + u ext ( t ) ] d t + 1 ϵ Σ ( v ϵ , W ϵ ) d B ( t ) , d W ϵ = G ( W ϵ , v ϵ ) d t . https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_Equax_HTML.gif

Before applying the general theory of Section 2, let us make several comments about this generic model of neural network with learning. This model is a non-autonomous, stochastic, non-linear slow-fast system.

In order to apply Theorem 2.2, one needs Assumptions 2.1, 2.2, and 2.3 to be satisfied, restricting the space of possible functions S https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq168_HTML.gif, , , Σ, and G. In particular, Assumption 2.2(ii) seems rather restrictive since it excludes systems with multiple equilibria and suggests that the general theory is more suited to deal with rate-based networks. However, one should keep in mind that these assumptions are only sufficient, and that the double averaging principle may work as well in systems which do not satisfy readily those assumptions.

As we will show in Section 3.3, a particular form of history-dependence can be taken into account, to a certain extent. Indeed, for instance, if the function is actually a functional of the past trajectory of variable v ϵ https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq171_HTML.gif which can be expressed as the solution of an additional SDE, then it may be possible to include a certain form of history-dependence. However, purely time-delayed systems do not enter the scope of this theory, although it might be possible to derive an analogous averaging method in this framework.

The noise term can be purely additive or set by a particular function Σ ( v , W ) https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq172_HTML.gif as long as it satisfies Assumption 2.2(i), meaning that it must be uniformly non-degenerate.

In the following subsection, we apply the averaging theory to various combinations of neuronal network models, embodied by choices of functions S https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq168_HTML.gif, , , Σ, and various learning rules, embodied by a choice of the function G. We will also analyze the obtained averaged system, describing the slow dynamics of the connectivity matrix in the limit of perfect time-scale separation and, in particular, study the convergence of this averaged system to an equilibrium point.

3.2 Symmetric Hebbian learning

One of the simplest, yet non-trivial, stochastic learning models is obtained when considering

  • A linear model for neuronal activity, namely f i ( v i ) = l v i https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq173_HTML.gif with l a positive constant.

  • A linear model for the synaptic transmission, namely S ( v i ) = v i https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq174_HTML.gif and H ( v i , v j ) = v j https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq175_HTML.gif.

  • A constant diffusion matrix Σ (additive noise) proportional to the identity Σ = σ I d https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq176_HTML.gif (spatially uncorrelated noise).

  • A Hebbian learning rule with linear decay, namely G i j ( W , v ) = κ W i j + v i v j https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq177_HTML.gif. Actually, it corresponds to the tensor product: { v v } i j = v i v j https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq178_HTML.gif.

This model can be written as follows:
{ d v ϵ = 1 ϵ 1 ( L v ϵ + W ϵ v ϵ + u ( t ϵ 2 ) ) d t + σ ϵ 1 d B ( t ) , d W ϵ d t = G ( v ϵ , W ϵ ) = κ W ϵ + v ϵ v ϵ , https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_Equ12_HTML.gif
(12)

where neurons are assumed to have the same decay constant: L = l I d https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq179_HTML.gif; u is a periodic continuous input (it replaces u ext https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq180_HTML.gif in the previous section); σ , ϵ 1 , ϵ 2 , κ R + https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq181_HTML.gif with ϵ 1 , ϵ 2 1 https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq182_HTML.gif and B ( t ) https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq35_HTML.gif is n-dimensional Brownian noise.

The first question that arises is about the well-posedness of the system: What is the definition interval of the solutions of system (12)? Do they explode in finite time? At first sight, it seems there may be a runaway of the solution if the largest real part among the eigenvalues of W grows bigger than l. In fact, it turns out this scenario can be avoided if the following assumption linking the parameters of the system is satisfied.

Assumption 3.1 There exists p ] 0 , 1 [ https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq183_HTML.gif such that
( σ 2 l 2 p ( 1 p ) + u m 2 p ( 1 p ) 2 ) < κ l 3 , https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_Equay_HTML.gif

where u m = sup t R + u ( t ) 2 https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq184_HTML.gif.

It corresponds to making sure the external (i.e., u m https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq185_HTML.gif) or internal (i.e., σ) excitations are not too large compared to the decay mechanism (represented by κ and l). Note that if p ] 0 , 1 [ https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq183_HTML.gif, u m https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq185_HTML.gif and d are fixed, it is sufficient to increase κ or l for this assumption to be satisfied.

Under this assumption, the space
E p = { W R n × n : W  is symmetric , W 0  and  W < p L } https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_Equaz_HTML.gif

is invariant by the flow of the averaged system G ¯ https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq11_HTML.gif, where W 0 https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq186_HTML.gif means W is semi-definite positive and W < p L https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq187_HTML.gif means p L W https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq188_HTML.gif is definite positive. Therefore, the averaged system is defined and bounded on R + https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq189_HTML.gif. The slow/fast system being asymptotically close to the averaged system, it is therefore asymptotically well-defined in probability. This is summarized in the following theorem.

Theorem 3.1 If Assumption  3.1 is verified for p ] 0 , 1 [ https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq190_HTML.gif, then system (12) is asymptotically well posed in probability and the connectivity matrix W ϵ https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq191_HTML.gif, the solution of system (12), converges to W in the sense that for all δ , T > 0 https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq192_HTML.gif,
lim ϵ 0 μ P [ sup t [ 0 , T ] | W t ϵ W t | 2 > δ ] = 0 , https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_Equba_HTML.gif
where W is the deterministic solution of
d W i j d t = G ¯ ( W ) i j = κ W i j decay + μ τ 0 τ μ v ¯ i ( s ) v ¯ j ( s ) d s correlation + σ 2 2 ( L W ) i j 1 noise , https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_Equ13_HTML.gif
(13)

where v ¯ ( t ) https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq193_HTML.gif is the τ μ https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq114_HTML.gif-periodic attractor of d v ¯ d t = ( W L ) v ¯ + u ( μ t ) https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq194_HTML.gif, where W R n × n https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq195_HTML.gif is supposed to be fixed.

Proof See Theorem B.1 in Appendix B.2. □

In the following, we focus on the averaged system described by (13). Its right-hand side is made of three terms: a linear and homogeneous decay, a correlation term, and a noise term. The last two terms are made explicit in the following.

3.2.1 Noise term

As seen in Section 2, in the linear case, the noise term Q is the unique solution of the Lyapunov equation (9) with A = W L https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq196_HTML.gif and Σ = σ I d https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq197_HTML.gif. Because the noise is spatially uncorrelated and identical for each neuron and also because the connectivity is symmetric, observe that Q = σ 2 2 ( L W ) 1 https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq198_HTML.gif is the unique solution of the system.

In more complicated cases, the computation of this term appears to be much more difficult as we will see in Section 3.4.

3.2.2 Correlation term

This term corresponds to the auto-correlation of neuronal activity. It is only implicitly defined; thus, this section is devoted to finding an explicit form depending only on the parameters l, μ, τ, the connectivity W, and the inputs u. Actually, one can perform an expansion of this term with respect to a small parameter corresponding to a weakly connected expansion. Most terms vanish if the connectivity W is small compared to the strength of the intrinsic decaying dynamics of neurons l.

The auto-correlation term of a τ μ https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq114_HTML.gif-periodic function can be rewritten as
{ v ¯ v ¯ } i j = 0 τ μ v ¯ i ( s ) v ¯ j ( s ) d s . https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_Equbb_HTML.gif

With this notation, it is simple to think of v as a ‘semi-continuous matrix’ of R n × [ 0 , τ μ [ https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq199_HTML.gif. Hence, the operator ‘’ can be though of as a matrix multiplication. Similarly, the transpose operator turns a matrix v ¯ R n × [ 0 , τ μ [ https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq200_HTML.gif into a matrix v ¯ R [ 0 , τ μ [ × n https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq201_HTML.gif. See Appendix B.1 for details about the notations.

It is common knowledge, see [17] for instance, that this term gathers information about the correlation of the inputs. Indeed, if we assume that the input is sufficiently slow, then v ¯ https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq113_HTML.gif has enough time to converge to u ( t ) https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq202_HTML.gif for all t [ 0 , + [ https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq203_HTML.gif. Therefore, in the first order v ¯ ( t ) ( W L ) 1 u ( t ) https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq204_HTML.gif. This leads to v ¯ v ¯ ( W L ) 1 u u ( W L ) 1 https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq205_HTML.gif. In the weakly connected regime, one can assume that W L L https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq206_HTML.gif leading to v ¯ v ¯ 1 l 2 u u https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq207_HTML.gif which is the auto-correlation of the inputs.

Actually, without the assumption of a slow input, lagged correlations of the input appear in the averaged system. Before giving the expression of these temporal correlations, we need to introduce some notations. First, define the convolution filter g l / μ : t l μ e l μ t H ( t ) https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq208_HTML.gif, where H is the Heaviside function. This family of functions is displayed for different values of l μ https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq209_HTML.gif in Figure 4(a). Note that g l / μ δ 0 https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq210_HTML.gif when l μ + https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq211_HTML.gif, where δ 0 https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq212_HTML.gif is the Dirac distribution centered at the origin. In this asymptotic regime, the convolution filter and its iterates g l / μ g l / μ https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq213_HTML.gif are equal to the identity.

We also define the filtered correlation of the inputs C k , p R n × n https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq214_HTML.gif by
C k , q = def 1 u m 2 τ ( u g l / μ ( k + 1 ) ) ( u g l / μ ( q + 1 ) ) , https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_Equbc_HTML.gif
where g l / μ ( k + 1 ) = g l / μ g l / μ https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq215_HTML.gif is the k th convolution of g l / μ https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq216_HTML.gif with itself and u m = sup t R + u ( t ) 2 https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq217_HTML.gif. This is the correlation matrix of the inputs filtered by two different functions. It is easy to show that this is similar to computing the cross-correlation of the inputs with the inputs filtered by another function,
C k , q = 1 u m 2 τ ( u ( g l / μ ( k + 1 ) g l / μ ( q + 1 ) ) ) u = 1 u m 2 τ u ( u ( g l / μ ( k + 1 ) g l / μ ( q + 1 ) ) ) , https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_Equ14_HTML.gif
(14)
which motivates the definition of the ( k , p ) https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq218_HTML.gif-temporal profile g l / μ ( k + 1 ) g l / μ ( q + 1 ) https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq219_HTML.gif, where ( g l / μ ) ( k ) ( t ) = ( g l / μ ( k ) ) ( t ) = g l / μ ( k ) ( t ) https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq220_HTML.gif. This notation is deliberately similar to that of the transpose operator we use in the proofs. These functions are shown in Figure 1. We have not found a way to make them explicit; therefore, the following remarks are simply based on numerical illustrations. When k = q https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq221_HTML.gif, the temporal profiles are centered. The larger the difference k q https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq222_HTML.gif, the larger the center of the bell. The larger the sum k + q https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq223_HTML.gif, the larger the standard deviation. This motivates the idea that C k , p https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq224_HTML.gif can be thought of as the k q https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq222_HTML.gif lagged correlation of the inputs. One can also say that C 10 , 10 https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq225_HTML.gif is more blurred than C 0 , 0 https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq226_HTML.gif in the sense that the inputs are temporally integrated over a ‘wider’ window in the first case.
https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_Fig1_HTML.jpg
Fig. 1

This shows the ( k , q ) https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq227_HTML.gif-temporal profiles with l μ = 1 https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq228_HTML.gif, i.e., the functions g 1 ( k + 1 ) g 1 ( q + 1 ) https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq229_HTML.gif for q = 0 https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq230_HTML.gif and k ranging from 0 to 6. For k = q = 0 https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq231_HTML.gif, the temporal profile is even and this also occurs to be true for any k = q https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq221_HTML.gif. When k > q https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq232_HTML.gif, the function reaches its maximum for strictly positive values that grow with the difference k q https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq222_HTML.gif. Besides, the temporal profiles are flattened when k + q https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq223_HTML.gif increases.

Observe that g l / μ ( k + 1 ) ( t ) = l k + 1 μ k + 1 k ! t k e l μ t H ( t ) https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq233_HTML.gif. Therefore, g l / μ ( k + 1 ) 1 = Γ ( k + 1 ) k ! = 1 https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq234_HTML.gif. Thanks to Young’s inequality for convolutions, which says that u g l / μ ( k ) 2 u 2 g l / μ ( k ) 1 https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq235_HTML.gif, it can be proved that C k , q 2 1 https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq236_HTML.gif.

We intend to express the correlation term as an infinite converging sum involving these filtered correlations. In this perspective, we use a result we have proved in [25] to write the solution of a general class of non-autonomous linear systems (e.g., d v ¯ d t = ( W L ) v ¯ + u ( t ) https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq237_HTML.gif) as an infinite sum, in the case μ = 1 https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq100_HTML.gif.

Lemma 3.2 If v ¯ https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq113_HTML.gif is the solution, with zero as initial condition, of d v ¯ d t = ( W L ) v ¯ + u ( t ) https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq238_HTML.gif it can be written by the sum below which converges if W is in E p https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq239_HTML.gif for p ] 0 , 1 [ https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq183_HTML.gif.
v ¯ = k = 0 + W k l k + 1 u g l ( k + 1 ) , https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_Equbd_HTML.gif

where g l : t l e l t H ( t ) https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq240_HTML.gif.

Proof See Lemma B.2 in Appendix B.2. □

This is a decomposition of the solution of a linear differential system on the basis of operators where the spatial and temporal parts are decoupled. This important step in a detailed study of the averaged equation cannot be achieved easily in models with non-linear activity. Everything is now set up to introduce the explicit expansion of the correlation we are using in what follows. Indeed, we use the previous result to rewrite the correlation term as follows.

Property 3.3 The correlation term can be written
μ τ v ¯ v ¯ = u m 2 l 2 k , q = 0 + W k l k C k , q W q l q . https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_Eqube_HTML.gif

Proof See Theorem B.3 in Appendix B.2. □

This infinite sum of convolved filters is reminiscent of a property of Hawkes processes that have a linear input-output gain [26].

The speed of inputs characterized by μ only appears in the temporal profiles g l / μ ( k ) g l / μ ( q ) https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq241_HTML.gif. In particular, if the inputs are much slower than neuronal activity time-scale, i.e., μ = 0 https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq81_HTML.gif, then g + = δ 0 https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq242_HTML.gif and u g + = u https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq243_HTML.gif. Therefore, C k , q = C 0 , 0 https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq244_HTML.gif and the sums in the formula of Property 3.3 are separable, leading to v ¯ v ¯ = ( L W ) 1 u u ( L W ) 1 https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq245_HTML.gif, which corresponds to the heuristic result previously explained.

Therefore, the averaged equation can be explicitly rewritten
d W d t = G ¯ ( W ) = κ W + u m 2 l 2 k , q = 0 + W k l k C k , q W q l q + σ 2 2 ( L W ) 1 . https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_Equ15_HTML.gif
(15)
In Figure 2, we illustrate this result by comparing, for different ϵ = ϵ 1 = ϵ 2 https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq246_HTML.gif (i.e., we choose μ = 1 https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq247_HTML.gif in this example), the stochastic system and its averaged version. The above decomposition has been used as the basis for numerical computation of trajectories of the averaged system.
https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_Fig2_HTML.jpg
Fig. 2

The first two figures, (a) and (b), represent the evolution of the connectivity for original stochastic system (12), superimposed with averaged system (13), for two different values of ϵ: respectively ϵ = 0.01 https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq248_HTML.gif and ϵ = 0.001 https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq249_HTML.gif, where we have chosen ϵ = ϵ 1 = ϵ 2 https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq250_HTML.gif. Each color corresponds to the weight of an edge in a network made of n = 3 https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq251_HTML.gif neurons. As expected, it seems that the smaller ϵ, the better the approximation. This can be seen in the picture (c) where we have plotted the precision on the y-axis and ϵ on the x-axis. The parameters used here are l = 12 https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq252_HTML.gif, μ = 1 https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq253_HTML.gif, κ = 100 https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq254_HTML.gif, σ = 0.05 https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq255_HTML.gif. The inputs have a random (but frozen) spatial structure and evolve according to a sinusoidal function.

3.2.3 Global stability of the equilibrium point

Now that we have found an explicit formulation for the averaged system, it is natural to study its dynamics. Actually, we prove in the following that if the connectivity W is kept smaller than l 3 https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq256_HTML.gif, i.e., Assumption 3.1 is verified with p 1 3 https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq257_HTML.gif, then the dynamics is trivial: the system converges to a single equilibrium point. Indeed, under the previous assumption, the system can be written G ¯ ( W ) = κ W + F ( W ) https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq258_HTML.gif, where F is a contraction operator on E 1 3 https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq259_HTML.gif. Therefore, one can prove the uniqueness of the fixed point with the Banach fixed point argument and exhibit an energy function for the system.

Theorem 3.4 If Assumption  3.1 is verified for p 1 3 https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq257_HTML.gif, then there is a unique equilibrium point in the invariant subset E p https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq239_HTML.gif which is globally, asymptotically stable.

Proof See Theorem B.4 in Appendix B.2. □

The fact that the equilibrium point is unique means that the ‘knowledge’ of the network about its environment (corresponding by hypothesis to the connectivity) eventually is unique. For a given input and any initial condition, the network can only converge to the same ‘knowledge’ or ‘understanding’ of this input.

3.2.4 Explicit expansion of the equilibrium point

When the network is weakly connected, the high-order terms in expansion (15) may be neglected. In this section, we follow this idea and find an explicit expansion for the equilibrium connectivity where the strength of the connectivity is the small parameter enabling the expansion. The weaker the connectivity, the more terms can be neglected in the expansion.

In fact, it is not natural to speak about a weakly connected learning network since the connectivity is a variable. However, we are able to identify a weak connectivity index which controls the strength of the connectivity. We say the connectivity is weak when it is negligible compared to the intrinsic leak term, i.e., W l https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq260_HTML.gif is small. We show in the Appendix that this weak connectivity index depends only on the parameters of the network and can be written
p ˜ = u m 2 κ l 3 + σ 2 2 κ l 2 . https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_Equbf_HTML.gif

In the asymptotic regime p ˜ 0 https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq261_HTML.gif, we have W p ˜ l = O ( 1 ) https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq262_HTML.gif. This index is the ‘small’ parameter needed to perform the expansion. We also define λ = σ 2 l 2 u m 2 https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq263_HTML.gif, which has information about the way p ˜ https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq264_HTML.gif is converging to zero. In fact, it is the ratio of the two terms of p ˜ https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq264_HTML.gif.

With these, we can prove that the equilibrium connectivity W https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq265_HTML.gif has the following asymptotic expansion in p ˜ https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq264_HTML.gif.

Theorem 3.5
W = p ˜ l 1 + λ ( λ + C 0 , 0 ) + p ˜ 2 l ( 1 + λ ) 2 ( λ 2 + λ ( C 0 , 0 + C 1 , 0 + C 0 , 1 ) + C 0 , 0 C 1 , 0 + C 0 , 1 C 0 , 0 ) + O ( p ˜ 3 ) . https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_Equbg_HTML.gif

Proof See Theorem B.5 in Appendix B.2. □

At the first order, the final connectivity is C 0 , 0 https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq226_HTML.gif, the filtered correlation of the inputs convolved with a bell-shaped centered temporal profile. In the case of Figure 3, this is quite a good approximation of the final connectivity.
https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_Fig3_HTML.jpg
Fig. 3

(a) shows the temporal evolution of the input to a n = 8 https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq266_HTML.gif neurons network. It is made of two spatially random patterns that are shown alternatively. (b) shows the correlation matrix of the inputs. The off-diagonal terms are null because the two patterns are spatially orthogonal. (c), (d), and (e) represent the first order of Theorem 3.5 expansion for different μ. Actually, this approximation is quite good since the percentage of error between the averaged system and the first order, computed by error = W order 1 1 W 1 https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq267_HTML.gif, have an order of magnitude of 10−4% for the three figures. These figures make it possible to observe the role of μ. If μ is small, i.e., the inputs are slow, then the transient can be neglected and the learned connectivity is roughly the correlation of the inputs; see (a). If μ increases, i.e., the inputs are faster, then the connectivity starts to encode a link between the two patterns that were flashed circularly and elicited responses that did not fade away when the other pattern appeared. The temporal structure of the inputs is also learned when μ is large. The parameters used in this figure are ϵ = 0.001 https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq249_HTML.gif, l = 12 https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq268_HTML.gif, κ = 100 https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq254_HTML.gif, σ = 0.02 https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq269_HTML.gif.

Not only the spatial correlation is encoded in the weights, but there is also some information about the temporal correlation, i.e., two successive but orthogonal events occurring in the inputs will be wired in the connectivity although they do not appear in the spatial correlations; see Figure 3 for an example.

3.3 Trace learning: band-pass filter effect

In this section, we study an improvement of the learning model by adding a certain form of history dependence in the system and explain the way it changes the results of the previous section. Given that Theorem 2.2 only applies to an instantaneous process, we will only be able to treat the history-dependent systems which can be reformulated as instantaneous processes. Actually, this class of systems contains models which are biologically more relevant than the previous model and which will exhibit interesting additional functional behaviors. In particular, this covers the following features:

  • Trace learning.

It is likely that a biological learning rule will integrate the activity over a short time. As Földiàk suggested in [27], it makes sense to consider the learning equation as being
d W ϵ d t = κ W ϵ + ( v ϵ g 1 ) ( v ϵ g 1 ) , https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_Equbh_HTML.gif

where is the convolution and g 1 : t R β 1 e β 1 t H ( t ) https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq270_HTML.gif. Rolls and Deco numerically show [15] that the temporal convolution, leading to a spatio-temporal learning, makes it possible to perform invariant object recognition. Besides, trace learning appears to be the symmetric part of the biological STDP rule that we detail in Section 3.4.

  • Damped oscillatory neurons.

Many neurons have an oscillatory behavior. Although we cannot take this into account in a linear model, we can model a neuron by a damped oscillator, which also introduces a new important time-scale in the system. Adding adaptation to neuronal dynamics is an elementary way to implement this idea. This corresponds to modeling a single neuron without inputs by the equivalent formulations
{ d v ϵ d t = l z ϵ , d z ϵ d t = β 2 ( v ϵ z ϵ ) { d v ϵ d t = l v ϵ g 2 , where  g 2 ( t ) = β 2 e β 2 t H ( t ) . https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_Equbi_HTML.gif
  • Dynamic synapses.

The electro-chemical process of synaptic communication is very complicated and non-linear. Yet, one of the features of synaptic communication we can take into account in a linear model is the shape of the post-synaptic potentials. In this section, we consider that each synapse is a linear filter whose finite impulse response (i.e., the post-synaptic potential) has the shape g 3 ( t ) = β 3 e β 3 t H ( t ) https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq271_HTML.gif. This is a common assumption which, for instance, is at the basis of traditional rate based models; see Chapter 11 of [7].

For mathematical tractability, we assume in the following that β = β 1 = β 2 = β 3 R + https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq272_HTML.gif such that g β = g 1 = g 2 = g 3 https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq273_HTML.gif, i.e., the time-scales of the neurons, those of the synapses and those of the learning windows are the same. Actually, there is a large variety of temporal scales of neurons, synapses, and learning windows, which makes this assumption not absurd. Besides, in many brain areas, examples of these time constants are in the same range (10 ms). Yet, investigating the impact of breaking this assumption would be necessary to model better biological networks. This leads to the following system:
{ d v ϵ = 1 ϵ 1 ( ( W ϵ L ) v ϵ g β + u ( t ϵ 2 ) ) d t + σ ϵ 1 d B ( t ) , d W ϵ d t = κ W ϵ + ( v ϵ g β ) ( v ϵ g β ) , https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_Equ16_HTML.gif
(16)

where the notations are the same as in Section 3.2. The behavior of a single neuron will be oscillatory damped if Δ = 1 4 l β https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq274_HTML.gif is a pure imaginary number, i.e., 4 l > β https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq275_HTML.gif. This is the regime on which we focus. Actually, the Hebbian linear case of Section 3.2 corresponds to β = + https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq276_HTML.gif in this delayed system.

To comply with the hypotheses of Theorem 2.2 (i.e., no dependence of the history of the process), we can add a variable z to the system which takes care of integrating the variable v over an exponential window. It leads to the equivalent system (in the limit σ z 0 https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq277_HTML.gif)
{ d ( v ϵ z ϵ ) = 1 ϵ 1 [ ( 0 W L β β ) ( v ϵ z ϵ ) + ( u ( t ϵ 2 ) 0 ) ] d t + ( σ ϵ 1 d B ( t ) σ z ϵ 1 d B ( t ) ) , d W ϵ d t = κ W ϵ + z ϵ z ϵ . https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_Equbj_HTML.gif

This trick makes it possible to deal with some history-based processes where the dependence on the past is exponential.

It turns out most of the results of Section 3.2 remain true for system (16) as detailed in the following. The existence of the solution on R + https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq189_HTML.gif is proved in Theorem B.6. The computations show that in the averaged system, the noise term remains identical, whereas the correlation term is to be replaced by μ τ ( v ¯ g β ) ( v ¯ g β ) https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq278_HTML.gif. Besides, Lemma 3.2 can be extended to our delayed system by changing only the temporal filters; see Lemma 34. Together with Lemma C.3, this proves the result of Theorem B.8.
μ τ ( v ¯ g β ) ( v ¯ g β ) = u m 2 v 1 2 l 2 k , q = 0 + W k ( l / v 1 ) k C ˜ k , q W q ( l / v 1 ) q , https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_Equbk_HTML.gif
where
C ˜ k , q = 1 u m 2 τ v 1 k + q + 2 ( u v ( k + 1 ) ) ( u v ( q + 1 ) ) , https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_Equbl_HTML.gif

where v : t l μ Δ ( e β 2 μ ( 1 Δ ) t e β 2 μ ( 1 + Δ ) t ) H ( t ) https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq279_HTML.gif. Observe that applying Young’s inequality to convolutions leads to C ˜ k , q 2 1 https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq280_HTML.gif. Actually, Lemma C.3 shows that v ( k ) = v k : t π β k ! e β 2 t ( t | Δ | ) k + 1 2 J k + 1 2 ( β | Δ | 2 t ) H ( t ) https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq281_HTML.gif, where J n ( z ) https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq282_HTML.gif is the Bessel function of the first kind. The value of the L1 norm of v is computed in Appendix C.3. It leads to v 1 = coth ( π 2 Δ ) https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq283_HTML.gif if Δ is a pure imaginary number and v 1 = 1 https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq284_HTML.gif else.

Therefore, the averaged system can be rewritten
d W d t = G ¯ ( W ) = κ W + u m 2 v 1 2 l 2 k , q = 0 + W k ( l / v 1 ) k C ˜ k , q W q ( l / v 1 ) q + σ 2 2 ( L W ) 1 . https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_Equbm_HTML.gif

As before, the existence and uniqueness of a globally attractive equilibrium point is guaranteed if Assumption 3.1 is verified for p 1 2 v 1 3 + 1 https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq285_HTML.gif; see Theorem B.9.

Besides, the weakly connected expansion of the equilibrium point we did in Section 3.2.4 can be derived in this case (see Theorem B.10). At the first order, this leads to the equilibrium connectivity
W = p ˜ l 1 + λ ( λ + v 1 2 C ˜ 0 , 0 ) + O ( p ˜ 2 v 1 ) . https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_Equbn_HTML.gif
The second order is given in Theorem B.10. The main difference with the Hebbian linear case is the shape of the temporal filters. Actually, the temporal filters have an oscillatory damped behavior if Δ is purely imaginary. These two cases are compared in Figure 4.
https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_Fig4_HTML.jpg
Fig. 4

These represent the temporal filter v : t v ( t ) https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq286_HTML.gif for different parameters. (a) When β = + https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq287_HTML.gif, we are in the Hebbian linear case of Appendix B.2. The temporal filters are just decaying exponentials which averaged the signal over a past window. (b) When the dynamics of the neurons and synapse are oscillatory damped, some oscillations appear in the temporal filters. The number of oscillations depends on Δ. If Δ is real, then there are no oscillations as in the previous case. However, when Δ becomes a pure imaginary number, it creates a few oscillations which are even more numerous if | Δ | https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq288_HTML.gif increases.

These oscillatory damped filters have the effect of amplifying a particular frequency of the input signal. As shown in Figure 5, if Δ is a pure imaginary number, then D 0 , 0 https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq289_HTML.gif is the cross-correlation of the band-pass filtered inputs with themselves. This band-pass filter effect can also be observed in the higher-order terms of the weakly connected expansion. This suggests that the biophysical oscillatory behavior of neurons and synapses leads to selecting the corresponding frequency of the inputs and performing the same computation as for the Hebbian linear case of the previous section: computing the correlation of the (filtered) inputs.
https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_Fig5_HTML.jpg
Fig. 5

This is the spectral profile | v v ˆ | ( ξ ) https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq290_HTML.gif for β = 1 https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq291_HTML.gif and l ] 0 , 2 ] https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq292_HTML.gif, where v v ˆ https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq293_HTML.gif denotes the Fourier transform of v v https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq294_HTML.gif. When 4 l < β https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq295_HTML.gif, the filter reaches its maximum for the null frequency, but if l increases beyond β 4 https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq296_HTML.gif, the filter becomes a band-pass filter with long tails in  1 ξ 2 https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq297_HTML.gif.

3.4 Asymmetric ‘STDP’ learning with correlated noise

Here, we extend the results to temporally asymmetric learning rules and spatially correlated noise. We consider a learning rule that is similar to the spike-timing-dependent plasticity (STDP) which is closer to biological experiments than the previous Hebbian rules. It has been observed that the strength of the connection between two neurons depends mainly on the difference between the time of the spikes emitted by each neuron as shown in Figure 6; see [12].
https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_Fig6_HTML.jpg
Fig. 6

This figure represents the synapse strength modification when the post-synaptic neuron emits a spike. The y-axis corresponds to an additive or multiplicative update of the connectivity. For instance, in [28], this is Δ W i j W i j https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq298_HTML.gif for the negative part of the curve. However, we assume an additive update in this paper. The x-axis is the time at which a pre-synaptic spike reaches the synapse, relatively to the time of post-synaptic time chosen to be 0.

Assuming that the decay time of the positive and negative parts of Figure 6 are equal, we approximate this function by t a + g γ ( t ) a g γ ( t ) https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq299_HTML.gif, where g γ ( t ) = γ e γ t H ( t ) https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq300_HTML.gif. Actually, this corresponds to W ϵ ˙ i j = κ W i j ϵ + a + v i ( v j ϵ g γ ) a ( v i ϵ g γ ) v j ϵ https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq301_HTML.gif. If the neuron has a spiking behavior, then the term a + v i ϵ ( t ) ( v j ϵ g γ ) ( t ) https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq302_HTML.gif is significant when the post-synaptic neuron i is spiking at time t, and then it counts the number of previous spikes from the pre-synaptic neuron j that might have caused the post-synaptic spike. This calculus is weighted by an exponentially decaying function. This accounts for the left part of Figure 6. The last term a ( v i ϵ g γ ) v j ϵ https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq303_HTML.gif takes the opposite perspective. It is significant when the pre-synaptic neuron j is spiking and counts the number of previous spikes from the post-synaptic neuron i that are not likely to have been caused by the pre-synaptic neuron. The computation is also weighted by the mirrored function of an exponentially decaying function. This accounts for the right part of Figure 6. This leads to the coupled system
{ d v ϵ = 1 ϵ 1 ( f ( v ϵ ) + W v ϵ + u ( t ϵ 2 ) ) d t + 1 ϵ 1 Σ d B ( t ) , d W ϵ d t = G ( v ϵ , W ϵ ) = κ W ϵ + a + v ϵ ( v ϵ g γ ) a ( v ϵ g γ ) v ϵ , https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_Equ17_HTML.gif
(17)

where the non-linear intrinsic dynamics of the neurons is represented by f. Indeed, the term { a + v ϵ ( t ) ( v ϵ g γ ) ( t ) } i j = a + v i ϵ ( t ) ( v ϵ g γ ) j ( t ) https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq304_HTML.gif is negligible when the neuron is quiet and maximal at the top of the spikes emitted by neuron i. Therefore, it records the value of the pre-synaptic membrane potential weighted by the function g γ https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq305_HTML.gif when the post-synaptic neuron spikes. This accounts for the positive part of Figure 6. Similarly, the negative part corresponds to a ( v ϵ g γ ) v ϵ https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq306_HTML.gif.

Actually, this formulation is valid for any non-linear activity with correlated noise. However, studying the role of STDP in spiking networks is beyond the scope of this paper since we are only able to have explicit results for models with linear activity. Therefore, we will assume that the activity is linear while keeping the learning rule as it was derived in the spiking case, i.e., we assume f ( v ) = l v = L v https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq307_HTML.gif in the system above.

We also use the trick of adding additional variables to get rid of the history-dependency. This reads
{ d ( v ϵ z ϵ ) = 1 ϵ 1 [ ( W L 0 γ γ ) ( v ϵ z ϵ ) + ( u ( t ϵ 2 ) 0 ) ] d t + ( σ ϵ 1 d B ( t ) σ z ϵ 1 d B ( t ) ) , d W ϵ d t = κ W ϵ + a + v ϵ z ϵ a z ϵ v ϵ . https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_Equbo_HTML.gif

In this framework, the method exposed in Section 3.2 holds with small changes. First, the well-posedness assumption becomes

Assumption 3.2 There exists p ] 0 , 1 [ https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq183_HTML.gif such that
| a + | + | a | p ( 1 p ) ( s 2 γ 2 ( 1 + γ / l p ) + u m 2 ( 1 p ) ) < κ l 3 , https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_Equbp_HTML.gif

where s 2 https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq308_HTML.gif is the maximal eigenvalue of Σ Σ https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq309_HTML.gif.

Under this assumption, the system is asymptotically well posed in probability (Theorem B.11). And we show the averaged system is
d W d t = G ¯ ( W ) = κ W + u m 2 ( | a + | + | a | ) l 2 k , q = 0 + W k l k D k , q W q l q + Q , https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_Equ18_HTML.gif
(18)
where we have used Theorem B.12 to expand the correlation term. The noise term Q is equal to Q 11 ( L + γ W ) 1 https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq310_HTML.gif, where Q 11 https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq311_HTML.gif is the unique solution of the Lyapunov equation ( W L ) Q 11 + Q 11 ( W L ) + Σ Σ = 0 https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq312_HTML.gif. Lemma D.1 gives a solution for this equation which leads to Q = γ k = 0 + W k Σ Σ ( 2 L W ) ( k + 1 ) ( L + γ W ) 1 https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq313_HTML.gif. In equation (18), the correlation matrices D k , q https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq314_HTML.gif are given by
D k , q = 1 u m 2 τ ( | a + | + | a | ) ( u g l / μ ( k + 1 ) ( a + g γ a g γ ) ) ( u g l / μ ( q + 1 ) ) . https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_Equbq_HTML.gif

According to Theorem B.13, the system is also globally asymptotically convergent to a single equilibrium, which we study in the following.

We perform a weakly connected expansion of the equilibrium connectivity of system (18). As shown in Theorem B.14, the first order of the expansion is
W = p ˜ l 1 + λ ( λ ( α + α ) Σ Σ d + D 0 , 0 ) + O ( p ˜ 2 ) . https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_Equbr_HTML.gif
Writing D 0 , 0 = 1 u m 2 τ ( | a + | + | a | ) ( S + A ) https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq315_HTML.gif, where S is symmetric and A is skew-symmetric, leads to
S = a + a 2 u g l / μ ( g γ + g γ ) ( u g l / μ ) , A = a + + a 2 u g l / μ ( g γ g γ ) ( u g l / μ ) . https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_Equbs_HTML.gif
According to Lemma C.1, the symmetric part is very similar to the trace learning case in Section 3.3. Applying Lemma C.2 leads to
S = ( a + a ) ( u g l / μ g γ ) ( u g l / μ g γ ) , A = a + + a γ ( d u d t g l / μ g γ ) ( u g l / μ g γ ) . https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_Equ19_HTML.gif
(19)

Therefore, the STDP learning rule simply adds an antisymmetric part to the final connectivity keeping the symmetric part as the Hebbian case. Besides, the antisymmetric part corresponds to computing the cross-correlation of the inputs with its derivative. For high-order terms, this remains true although the temporal profiles are different from the first order. These results are in line with previous works underlying the similarity between STDP learning and differential Hebbian learning, where G ( v ) v ˙ v https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq316_HTML.gif; see [29].

Figure 7 shows an example of purely antisymmetric STDP learning, i.e., a + = a https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq317_HTML.gif. The final connectivity matrix is therefore antisymmetric as shown in Figure 7(b) and the noise has no impact on learning. It shows the network finally approximates the connectivity given in (19).
https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_Fig7_HTML.jpg
Fig. 7

Antisymmetric STDP learning for a network of n = 3 https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq318_HTML.gif neurons. (a) Temporal evolution of the inputs to the network. The three neurons are successively and periodically excited. The red color corresponds to an excitation of 1 and the blue to no excitation. (b) Equilibrium connectivity. The matrix is antisymmetric and shows that neurons excite one of their neighbors and are inhibited by the other. (c) Temporal evolution of the connectivity strength. The colors correspond to those of (b). The connectivity of system (17) corresponds to the plain thin oscillatory curves. The connectivity of the averaged system (18) (with k , q < 4 https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq319_HTML.gif) corresponds to the plain thick lines. Note that each curve corresponds to the superposition of three connections which remain equal through learning. The dashed curves correspond to the antisymmetric part in (19). The parameters chosen for this simulation were l = 10 https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq320_HTML.gif, κ = 100 https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq321_HTML.gif, γ = 3 https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq322_HTML.gif, a + = a = 1 https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq323_HTML.gif, τ = 3 https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq324_HTML.gif, σ = 0.001 https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq325_HTML.gif, μ = 1 https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq247_HTML.gif, ϵ = 0.001 https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq249_HTML.gif. The system was simulated on the fast time-scale during T = 10 , 000 https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq326_HTML.gif time steps of size d t = 0.01 https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq327_HTML.gif.

4 Discussion

We have applied temporal averaging methods on slow/fast systems modeling the learning mechanisms occurring in linear stochastic neural networks. When we make sure the connectivity remains small, the dynamics of the averaged system appears to be simple: the connectivity always converges to a unique equilibrium point. Then, we performed a weakly connected expansion of this final connectivity whose terms are combinations of the noise covariance and the lagged correlations of the inputs: the first-order term is simply the sum of the noise covariance and the correlation of the inputs.

  • As opposed to the former input/ouput vision of the neurons, we have considered the membrane potential v to be the solution of a dynamical system. The consequence of this modeling choice is that not only the spatial correlations, but also the temporal correlations are learned. Due to the fact we take the transients into account, the activity never converges but it lives between the representation of the inputs. Therefore, it links concepts together.

The parameter μ is the ratio of the time-scales between the inputs and the activity variable. If μ = 0 https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq81_HTML.gif, the inputs are infinitely slow and the activity variable has enough time to converge towards its equilibrium point. When μ grows, the dynamics becomes more and more transient, it has no time to converge. Therefore, if the inputs are extremely slow, the network only learns the spatial correlation of the inputs. If the inputs are fast, it also learns the temporal correlations. This is illustrated in Figure 3.

This suggests that learning associations between concepts, for instance, learning words in two different languages, may be optimized by presenting two words to be associated circularly with a certain frequency. Indeed, increasing the frequency (with a fixed duration of exposition to each word) amounts to increasing μ. Therefore, the network learns better the temporal correlations of the inputs and thus strengthens the link between these two concepts.

  • According to the model of resonator neuron [30], Section 3.3 suggests that neurons and synapses with a preferred frequency of oscillation will preferably extract the correlation of the inputs filtered by a band pass filter centered on the intrinsic frequency of the neurons.

Actually, it has been observed that the auditory cortex is tonotopically organized, i.e., the neurons are arranged by frequency [31]. It is traditionally thought that this is achieved thanks to a particular connectivity between the neurons. We exhibit here another mechanism to select this frequency which is solely based on the parameters of the neurons: a network with a lot of different neurons whose intrinsic frequencies are uniformly spread is likely to perform a Fourier-like operation: decomposing the signal by frequency.

In particular, this emphasizes the fact that the network does not treat space and time similarly. Roughly speaking, associating several pictures and associating several sounds are therefore two different tasks which involve different mechanisms.

  • In this paper, the original hierarchy of the network has been neglected: the network is made of neurons which receive external inputs. A natural way to include a hierarchical structure (with layers for instance), without changing the setup of the paper, is therefore to remove the external input to some neurons. However, according to Theorem 3.5 (and its extensions Theorems B.10 and B.14), one can see that these neurons will be disconnected from the others at the first order (if the noise is spatially uncorrelated). Linear activities imply that the high level neurons disconnect from others, which is a problem. In fact, one can observe that the second-order term in Theorem 3.5 is not null if the noise matrix Σ is not diagonal. In fact, this is the noise between neurons which will recruit the high level neurons to build connections from and to them.

It is likely that a significant part of noise in the brain is locally induced, e.g., local perturbations due to blood vessels or local chemical signals. In a way, the neurons close to each other share their noise and it seems reasonable to choose the matrix Σ so that it reflects the biological proximity between neurons. In fact, Σ specifies the original structure of the network and makes it possible for close-by neurons to recruit each other.

Another idea to address hierarchy in networks would be to replace the synaptic decay term κ W https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq328_HTML.gif by another homeostatic term [32] which would enforce the emergence of a strong hierarchical structure.

  • It is also interesting to observe that most of the noise contribution to the equilibrium connectivity for STDP learning (see Theorem B.14) vanishes if the learning is purely skew-symmetric, i.e., a + = a https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq317_HTML.gif. In fact, it is only the symmetric part of learning, i.e., the Hebbian mechanism, that writes the noise in the connectivity.

  • We have shown that there is a natural analogous STDP learning for spiking neurons in our case of linear neurons. This asymmetric rule converges to a final connectivity which can be decomposed into symmetric and skew-symmetric parts. The first one is similar to the symmetric Hebbian learning case, emphasizing that the STDP is nothing more than an asymmetric Hebbian-like learning rule. The skew-symmetric part of the final connectivity is the cross-correlation between the inputs and their derivatives.

This has an interesting signification when looking at the spontaneous activity of the network post-learning. In fact, if we assume that the inputs are generated by an autonomous system d u d t = ζ ( u ) https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq329_HTML.gif, then according to the bottom equation in formula (19), the spontaneous activity is governed by
d v = ( ζ ( u ) u v l v ) d t + Σ d B ( t ) . https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_Equbt_HTML.gif

In a way, the noise terms generate random patterns which tend to be forgotten by the network due to the leak term l v https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq330_HTML.gif. The only drift is due to ζ ( u ) u v E v , u ( ζ ( u ) ) https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq331_HTML.gif which is the expectation of the vector field defining the dynamics of inputs with a measure being the scalar product between the activity variable and the inputs. In other words, if the activity is close to the inputs at a given time t R + https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq332_HTML.gif, i.e., v , u ( t ) https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq333_HTML.gif is large, then the activity will evolve in the same direction as what this input would have done. The network has modeled the temporal structure of the inputs. The spontaneous activity predicts and replays the inputs the network has learned.

There are still numerous challenges to carry on in this direction.

First, it seems natural to look for an application of these mathematical methods to more realistic models. The two main limitations of the class of models we study in Section 3 are (i) the activity variable is governed by a linear equation and (ii) all the neurons are assumed to be identical. The mathematical analysis in this paper was made possible by the assumption that the neural network has a linear dynamics, which does not reflect the intrinsic non-linear behavior of the neurons. However, the cornerstone of the application of temporal averaging methods to a learning neural network, namely Property 3.3, is similar to the behavior of Poisson processes [26] which has useful applications for learning neural networks [19, 20]. This suggests that the dynamics studied in this paper might be quite similar to some non-linear network models. Studying more rigorously the extension of the present theory to non-linear and heterogeneous models is the next step toward a better modeling of biologically plausible neural networks.

Second, we have shown that the equilibrium connectivity was made of a symmetric and antisymmetric term. In terms of statistical analysis of data sets, the symmetric part corresponds to classical correlation matrices. However, the antisymmetric part suggests a way to improve the purely correlation-based approach used in many statistical analyses (e.g., PCA) toward a causality-oriented framework which might be better suited to deal with dynamical data.

Appendix A: Stochastic and periodic averaging

A.1 Long-time behavior of inhomogeneous Markov processes

In order to construct the averaged vector field G ¯ μ ( w ) https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq334_HTML.gif in the time-scale matching case ( 0 < μ < https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq24_HTML.gif), one needs to understand properly the long-time behavior of the rescaled inhomogeneous frozen process
d v = F ( v , w 0 , μ t ) d t + Σ ( v , w 0 ) d B ( t ) . https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_Equ20_HTML.gif
(20)
Under regularity and dissipativity conditions, [5] proves the following general result about the asymptotic behavior of the solution of
d X t = b ( X t , t ) d t + σ ( X t , t ) d B ( t ) , t > s , X s = x , https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_Equbu_HTML.gif

where t b ( x , t ) https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq335_HTML.gif and t σ ( x , t ) https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq336_HTML.gif are τ-periodic.

The first point of the following theorem gives the definition of evolution systems of measures, which generalizes the notion of invariant measures in the case of inhomogeneous Markov processes. The exponential estimate of 2. in the following theorem is a key point to prove the averaging principle of Theorem 2.2.

Theorem A.1 ([5])

  1. 1.
    There exists a unique τ-periodic family of probability measures { μ ( s , ) , s R } https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq337_HTML.gif such that for all functions ϕ continuous and bounded,
    x R p E [ ϕ ( X t ) ] μ ( s , d x ) = x R p ϕ ( x ) μ ( t , d x ) . https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_Equbv_HTML.gif
     
Such a family is called evolution systems of measures.
  1. 2.
    Furthermore, under stronger dissipativity condition, the convergence of the law of X to μ is exponentially fast. More precisely, for any r ( 1 , + ) https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq338_HTML.gif, there exist M > 0 https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq339_HTML.gif and ω < 0 https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq340_HTML.gif such that for all ϕ in the space of p-integrable functions with respect to μ ( t , ) https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq341_HTML.gif, L r ( R p , μ ( t , ) ) https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq342_HTML.gif,
    x R p E [ ϕ ( X t s , x ) ] x R p ϕ ( x ) μ ( t , d x ) r μ ( s , d x ) M e ω ( t s ) x R p ϕ ( x ) r μ ( t , d x ) . https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_Equbw_HTML.gif
     

A.2 Proof of Property 2.3

Property A.2 If there exists a smooth subset of R q https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq136_HTML.gif such that
  1. 1.

    The functions F, G, Σ satisfy Assumptions  2.1-2.3 restricted on R p × E https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq137_HTML.gif.

     
  2. 2.

    is invariant under the flow of G ¯ μ https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq73_HTML.gif, as defined in (7).

     

Then for any initial condition w 0 E https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq138_HTML.gif, system (4) is asymptotically well posed in probability and w ϵ https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq80_HTML.gif satisfies the conclusion of Theorem  2.2.

Proof The idea of the proof is to truncate the original system, replacing G by a smooth truncation which coincides with G on and which is close to 0 outside . More precisely, for β > 0 https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq343_HTML.gif, we introduce ψ β : R q R q https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq344_HTML.gif a regular function (locally Lipschitz) such that ψ β ( w ) = 0 https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq345_HTML.gif if w E https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq346_HTML.gif or w E https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq347_HTML.gif and lim β 0 ψ β ( w ) = 1 https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq348_HTML.gif if w E E https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq349_HTML.gif. We define
G ˜ β ( v , w ) = G ( v , w ) ψ β ( w ) . https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_Equbx_HTML.gif
Then, we introduce ( v ˜ ϵ , β , w ˜ ϵ , β ) https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq350_HTML.gif the solution of the auxiliary system
d v ˜ ϵ , β = 1 ϵ 1 [ F ( v ˜ ϵ , β , w ˜ ϵ , β , t ϵ 2 ) ] d t + 1 ϵ 1 Σ ( v ˜ ϵ , β , w ˜ ϵ , β ) d B ( t ) , d w ˜ ϵ , β = G ˜ β ( v ˜ ϵ , β , w ˜ ϵ , β ) d t https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_Equby_HTML.gif

with the same initial condition as ( v ϵ , w ϵ ) https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq351_HTML.gif.

Let T , δ , η > 0 https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq352_HTML.gif be three positive reals. Let us introduce a few more notations. We will need to consider a subset of defined by
E β : = { w E ; ψ β ( w ) 1 ( η ) 1 / 2 δ } . https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_Equbz_HTML.gif
We also introduce the following stopping times:
τ ϵ : = inf { t 0 ; w t ϵ E } , τ ϵ β : = inf { t 0 ; w t ϵ E β } , τ ˜ ϵ : = inf { t 0 ; w ˜ t ϵ , β E } , τ ˜ ϵ β : = inf { t 0 ; w ˜ t ϵ , β E β } . https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_Equca_HTML.gif

Finally, we define T ϵ : = min ( T , τ ϵ , τ ˜ ϵ ) https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq353_HTML.gif and T ϵ β : = min ( T , τ ϵ β , τ ˜ ϵ β ) https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq354_HTML.gif.

Let us remark at this point that in order to prove that P [ τ ϵ T ] 1 https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq355_HTML.gif (which is our aim), it is sufficient to work on the bounded stopping time min ( T , τ ϵ ) https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq356_HTML.gif, since P [ τ ϵ T ] = P [ min ( T , τ ϵ ) T ] https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq357_HTML.gif. In other words, the realizations of w ϵ https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq358_HTML.gif which stay longer than T inside are not problematic. Therefore, we introduce τ ϵ ˆ : = min ( T , τ ϵ ) https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq359_HTML.gif.

Our first claim is that on finite time intervals [ 0 , T ] https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq360_HTML.gif, w ˜ ϵ , β https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq361_HTML.gif is a good approximation of w ϵ https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq358_HTML.gif inside as long as one chooses β sufficiently small. To prove our claim, we proceed in two steps, first working inside E β https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq362_HTML.gif and then in E E β https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq363_HTML.gif:
  1. 1.
    For any β > 0 https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq364_HTML.gif, one controls the difference between w ϵ https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq358_HTML.gif and w ˜ ϵ , β https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq365_HTML.gif on E β https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq366_HTML.gif since one controls the difference between the drifts. By an application of Lemma A.3 below (we need here the moment Assumption 2.3(i)), there exists a constant C (which may depend on T , β , https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq367_HTML.gif) such that
    E [ sup 0 t T ϵ β w t ϵ w ˜ t ϵ , β 2 ] C η δ 2 . https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_Equ21_HTML.gif
    (21)
     
We conclude by an application of the Markov inequality, implying
P [ sup 0 t T ϵ β w t ϵ w ˜ t ϵ , β > δ ] 1 δ 2 E [ sup 0 t T ϵ β w t ϵ w ˜ t ϵ , β 2 ] C η . https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_Equ22_HTML.gif
(22)
  1. 2.
    One needs now to control the situation outside E β https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq366_HTML.gif, that is, on E E β https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq368_HTML.gif. The idea is that while one does not control the difference between G and G ˜ β https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq369_HTML.gif anymore, one can still choose β sufficiently small such that E β https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq362_HTML.gif becomes arbitrary close to , hence implying that τ ˆ ϵ https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq370_HTML.gif and T ϵ β https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq371_HTML.gif are arbitrary close with high probability, namely
    θ , λ > 0 , β > 0 , P [ τ ϵ T ϵ β > λ ] < θ . https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_Equ23_HTML.gif
    (23)
     
With θ = ( δ η ) 2 https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq372_HTML.gif and λ = δ η https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq373_HTML.gif, one obtains that for sufficiently small β,
P [ τ ϵ ˆ T ϵ β > η δ ] < ( η δ ) 2 . https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_Equ24_HTML.gif
(24)
Let us denote S : = sup T ϵ β t τ ϵ ˆ w t ϵ w ˜ t ϵ , β https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq374_HTML.gif. Then, one can split the calculus of E [ S ] https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq375_HTML.gif according to the event A = { τ ϵ ˆ T ϵ β > η δ } https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq376_HTML.gif:
E [ S ] = E [ S I A ] + E [ S I A c ] ( 2 K G T P [ A ] ) 1 / 2 + ( 2 K G E [ ( τ ˆ ϵ T ϵ β ) 2 I A c ] ) 1 / 2 C 2 η δ , https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_Equcb_HTML.gif

where we have used the Cauchy-Schwarz inequality and the moment Assumption 2.3(ii) (yielding the constant K G https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq377_HTML.gif) in the second line.

So, we deduce by the Markov inequality that sup T ϵ β t τ ˆ ϵ w t ϵ w ˜ t ϵ , β https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq378_HTML.gif is arbitrary small in probability.

From the combination of 1. and 2., we deduce that one can choose β small enough such that
P [ sup 0 t T τ ϵ w t ϵ w ˜ t ϵ , β > δ ] ( C 1 + C 2 ) η . https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_Equ25_HTML.gif
(25)
We can now proceed to the application of Theorem 2.2 to the truncated system. As ( v ˜ ϵ , β 0 , w ˜ ϵ , β 0 ) https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq379_HTML.gif remains in R p × E https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq137_HTML.gif, one can extend smoothly F and Σ outside so that ( F , Σ ) https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq380_HTML.gif satisfies Assumptions 2.1-2.2. Therefore, one can apply Theorem 2.2 to the auxiliary system: for all δ , T > 0 https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq192_HTML.gif,
lim ϵ 0 μ P [ sup t [ 0 , T ] w ˜ t ϵ , β 0 w t > δ ] = 0 , https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_Equcc_HTML.gif
where w is defined by (8). As a consequence, there exists ϵ 0 https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq381_HTML.gif such that for all ϵ with ϵ < ϵ 0 https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq382_HTML.gif,
P [ sup t [ 0 , T ] w ˜ t ϵ , β 0 w t > δ ] < η . https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_Equcd_HTML.gif
Then, as | w ˆ t ϵ w t | | w ˆ t ϵ w ˜ t ϵ , β 0 | + | w ˜ t ϵ , β 0 w t | https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq383_HTML.gif, one deduces that for all ϵ with ϵ < ϵ 0 https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq382_HTML.gif,
P [ sup t [ 0 , T ] w ˆ t ϵ w t > δ ] < ( C 1 + C 2 + 1 ) η , https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_Equce_HTML.gif
that is to say,
lim ϵ 0 μ P [ sup t [ 0 , T ] w ˆ t ϵ w t > δ ] = 0 . https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_Equcf_HTML.gif
We know by assumption 2. of the statement of Property 2.3, for all t 0 https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq384_HTML.gif, w t E https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq385_HTML.gif, so we conclude the proof by observing that for all T > 0 https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq43_HTML.gif,
lim ϵ 0 P [ τ ϵ T ] = 1 . https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_Equcg_HTML.gif

 □

In the following lemma, we show that the solutions of two SDEs, whose drifts are close on a subset of the state space, remain close on a finite time interval. The difficulty here lies in the fact that we deal with only locally Lipschitz coefficients.

Lemma A.3 Suppose x and y are solutions, with identical initial conditions in H R n https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq386_HTML.gif, of the following stochastic differential equations in R n https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_IEq387_HTML.gif:
d x t = a ( x t , t ) d t + b ( x t , t ) d B ( t ) , https://static-content.springer.com/image/art%3A10.1186%2F2190-8567-2-13/MediaObjects/13408_2012_Article_24_Equ26_HTML.gif
(26)
d y t = h ( y t ) a ( y t , t ) d t + b ( y t , t ) d B ( t ) .