Mesoscopic population equations for spiking neural networks with synaptic short-term plasticity

Coarse-graining microscopic models of biological neural networks to obtain mesoscopic models of neural activities is an essential step towards multi-scale models of the brain. Here, we extend a recent theory for mesoscopic population dynamics with static synapses to the case of dynamic synapses exhibiting short-term plasticity (STP). The extended theory offers an approximate mean-field dynamics for the synaptic input currents arising from populations of spiking neurons and synapses undergoing Tsodyks–Markram STP. The approximate mean-field dynamics accounts for both finite number of synapses and correlation between the two synaptic variables of the model (utilization and available resources) and its numerical implementation is simple. Comparisons with Monte Carlo simulations of the microscopic model show that in both feedforward and recurrent networks, the mesoscopic mean-field model accurately reproduces the first- and second-order statistics of the total synaptic input into a postsynaptic neuron and accounts for stochastic switches between Up and Down states and for population spikes. The extended mesoscopic population theory of spiking neural networks with STP may be useful for a systematic reduction of detailed biophysical models of cortical microcircuits to numerically efficient and mathematically tractable mean-field models.


Introduction
One of the primary goals in computational neuroscience is to understand how brain functions arise from the interactions of billions of nerve cells and their underlying biophysical processes at the microscopic scale. Towards that goal, a crucial step is to develop a theoretical framework that links biophysically detailed networks of spiking neurons at the microscopic scale with simplified firing-rate or neural-mass models [1] for neuronal populations at the coarse-grained mesoscopic or macroscopic scale. Firing-rate models are mathematically tractable and thus permit a theoretical understanding of neuronal population dynamics implicated in various neural computations [2][3][4][5][6]. However, firing rate models are heuristic models that lack a clear link to the underlying microscopic proper-ties. On the other hand, highly detailed biophysical models of cortical microcircuits [7] and simplified networks of spiking point neurons [8][9][10] are closely linked to biophysical properties but lack mathematical tractability and do not provide a mechanistic understanding of emergent functional behavior. However, if we were able to systematically reduce biophysically detailed models to simplified networks of spiking point neurons [11] and further to coarse-grained firing-rate models [12], we might be able to understand neural computations at the population level in terms of biophysical parameters. One common strategy for coarse-graining a cortical network is to identify approximately homogeneous neuronal populations that consist of cells with similar properties and inputs; e.g., neurons may be grouped with respect to their cell type and cortical location [12,13]. Biological data suggests that the size of such neuronal populations are typically small containing only on the order of hundred to thousand of neurons [14]. Recently, a mesoscopic mean-field theory that accounts for finite-size noise has been proposed [12] based on the refractory density equations for macroscopic homogeneous populations of neurons (where the number of neurons tends to infinity) [15][16][17][18]. However, in [12], synapses are assumed to be static in the sense that the effective synaptic coupling between two populations of neurons is constant over time.
A ubiquitous feature of cortical dynamics is synaptic short-term plasticity (STP) [19][20][21][22], i.e. dynamic changes of synaptic strength on time scales of 100 ms to 1000 ms induced by presynaptic neural activity. Theoretical studies have shown that STP exerts profound effects on network activity [23][24][25] and information processing capabilities [19,[26][27][28][29]. In particular, in a recent biophysically detailed microcircuit model [7], STP has been a critical factor for reproducing experimentally observed activity patterns. Therefore, a faithful reduction to population rate models should incorporate the effect of STP. Mean-field descriptions for populations of spiking neurons with dynamic synapses are central for such a reduction. Although mean-field theories for STP have been developed for the case of macroscopic populations [4,[30][31][32], a mesoscopic mean-field theory that would account for finite-size fluctuations is still lacking.
In this work, we extend the mesoscopic theory of [12] for static synapses to the case of dynamic synapses exhibiting Tsodyks-Markram STP [30]. In Sect. 2 we expose our theory by considering a feedforward setup [27,33]. We use an assumption (loosely speaking a Poissonian assumption on the spike statistics), Assumption 1, which enables the derivation of mesoscopic mean-field dynamics for the effective input. We then compare numerically the effective input given by the mesoscopic mean-field dynamics with simulations of the full microscopic population, in the case where the presynaptic population consists of N Poisson neurons. The Poisson case is of special interest because Assumption 1 is satisfied. In Sect. 3, we first explain how the theory of Sect. 2 can be applied to general mesoscopic circuits. We then illustrate how the mesoscopic STP model accurately replicates population spikes and switches between Up and Down states exhibited by a recurrent network of time-inhomogeneous Poisson neurons. Finally, we incorporate the mesoscopic STP model into our previous mesoscopic population model [12] for generalized integrate and fire (GIF) neurons. We show that the resulting extension faithfully reproduces population spikes observed in a microscopic simulation. In Sect. 4, we discuss the limitations of our mesoscopic model for GIF neurons and mention possible theoretical extensions. The numerical implementation of the mesoscopic equations are detailed in the Appendix.

Feedforward network with a finite number of dynamic synapses 2.1 Network setting and theoretical approach
To derive the mesoscopic theory, let us consider a feedforward setup: N neurons from a presynaptic population are connected to a given postsynaptic neuron via N synapses ( Fig. 1(A)). This setup is important because under the approximations we will use, the theory obtained for this simple case can be directly applied to general mesoscopic circuits. In addition, the setup is important for biological modeling because feedforward pathways exhibiting STP are prominent in the nervous system. Examples include visual [19], auditory [34], somatosensory [35] and periform [36] cortices.
Before we start with the treatment of dynamic synapses, it is instructive to first recall the simpler case of static synapses [12]. In this case, the synaptic input current (in a currentbased model) I N syn (t), t > t 0 , is modeled microscopically as the sum of synaptically-filtered spike trains: Here, J is the synaptic weight (in units of electrical charge) assumed to be identical for all synapses onto the postsynaptic neuron, (t) is a synaptic filtering kernel (defined as the postsynaptic current normalized by the total charge elicited by a single spike) and is the Dirac-delta spike train of neuron j with spike times {t j k } k∈Z + , t 0 < t j 1 < t j 2 < · · · < ∞. Here and in the following, a superscript N (as in I N syn ) denotes a functional of the N presynaptic spike trains s 1 (t), . . . , s N (t), and hence a mesoscopic quantity.
In mesoscopic models of homogeneous neuronal populations, the central mesoscopic variable is the population activity A N (t) defined as the sum of spike trains of all neurons driving the STP dynamics of u j (t) and x j (t) for each of the N synapses. The postsynaptic input resulting from synapse j is u j (t)x j (t)s j (t) and the total postsynaptic input is y(t) = N -1 N j=1 u j (t)x j (t)s j (t). (C) Mesoscopic picture of one effective synapse with mean-field STP dynamics driven by the population activity A N (t) of N neurons. The population activity A N (t) is defined as the population average of the spike trains of each of the N neurons forming the population. Thus, when the individual spike trains s j (t) are known, A N (t) = N -1 N j=1 s j (t) in a population divided by the number of neurons. In our case, the mesoscopic activity of the presynaptic population is thus given by Using the mesoscopic activity of the presynaptic population, the synaptic input current, Eq. (1), can be rewritten as Thus, for static synapses, the synaptic input current is completely determined by the past population activities {A N (t )} t <t . This property is crucial for mesoscopic population models that are formulated in terms of mesoscopic population activities [12,37]. In particular, in simulations of such mesoscopic models forward in time, the information about past population activities is available at each point in time and can thus be used to compute the input current at present time. In our approach, we thus aim at finding a dynamics of the synaptic input conditioned on the history of the population activity A N (t). We would like to stress that this aim is markedly different from well-known diffusion approximations [38,39] (see also [40,41] for examples in neuroscience), where a jump process is approximated, when jumps become frequent and small, by a diffusion process. In particular, a diffusion approximation would yield a stochastic dynamics if conditioned on the past population activities, in stark contrast to the deterministic conditional dynamics, Eq. (4). Finding a deterministic relationship between the synaptic input and a given realization A N (t), as in Eq. (4), is no longer possible in the case of dynamic synapses. In this case, the input spike trains s j (t) are modulated by a dynamic factor R j (t) modeling the effect of STP. In contrast to static synapses, Eq. (1), the synaptic current for dynamic synapses reads Inverting the sum and the integral we get where we introduced the total postsynaptic input (TPSI) y N (t). Equation (6) shows that determining the synaptic input I N syn (t) from the knowledge of the population activity A N (t) is an underconstrained problem because y N (t) is a weighted average of spike trains whereas A N (t) is an unweighted average. Thus, in mesoscopic simulations, we expect that the synaptic input is strongly but not fully constrained by the knowledge of past population activities. To capture STP in a mesoscopic model, our approach is to find an approximate mean-field dynamics for y N (t) by introducing additional mesoscopic variables that are driven by the population activity and noise that accounts for the inevitable uncertainty of y N (t) given the history of the population activity. Importantly, such mean-field dynamics would be mesoscopic in the sense that its simulation does not require simulating all the individual presynaptic neurons and synapses but only a few mesoscopic variables. In addition, for such an approximation to be useful for coarse-graining, its numerical implementation should be computationally economical compared with the simulation of the full microscopic model. Now we fully specify the dynamics of y N (t) as defined Eq. (6) (the microscopic model) in Sect. 2.2 and derive its mesoscopic approximation in Sect. 2.3.

Microscopic model
In order to fully specify the dynamics of y N (t) defined in Eq. (6) given the collection of spike trains {s j } j=1,...,N , we need to define the dynamics of {R j } j=1,...,N , which modulates the amplitude of the spikes in s j (t). For each synapse j, the time evolution of R j is deterministic given s j and follows the Tsodyks-Markram model for STP [30]. The modulation factor R j (t) can be seen as the amount of neurotransmitter that would be released if a spike occurs at time t. It is given by the product of two synaptic variables, R j (t) = u j (t -)x j (t -), where x j (t) is the amount of available synaptic resources and u j (t) is the utilization of available resources (i.e. the fraction of these available resources that would be released if a spike occurs) at synapse j. Given the presynaptic spike trains s j (t), these variables obey the dynamics where τ F and τ D are the facilitation and depression time constants, respectively, U 0 is the baseline utilization of synaptic resources, U determines the increase in the utilization of synaptic resources by a spike. Here, u j (t -) is a shorthand for the left limit at time t. Note that the presence of the product u j x j in Eq. (7b) introduces a nonlinear coupling between the dynamics of u j and x j . Having specified the STP dynamics, we can rewrite the TPSI, Eq. (6), as follows: In the next section, we present our main result, which provides a mean-field approximation y(t) that is determined by the history of the mesoscopic presynaptic population activity A N (t) rather than individual presynaptic spike trains s j (t) or synaptic variables u j and x j .

Mesoscopic approximation
To relate the TPSI, Eq. (8), to the presynaptic population activity we consider the collection {t k } k∈Z + of all spike times of the superposition spike train S N (t) := NA N (t) = j s j . At the mesoscopic scale, we only know that there is a presynaptic spike arriving at time t k but we do not know at which synapse. That is, we do not know the mapping j(k) that maps spike arrival times t k to specific synapses j. In terms of the spike times t k and the mapping j(k), the population activity and the TPSI can be rewritten as and respectively. Equation (10) still contains the microscopic variables u j and x j and thus the precise spike arrival times t j k at synapse j. To derive a mean-field approximation y(t) that only depends on the spike times {t k } k∈Z + of the superposition spike train S N (t), we make two approximation steps: (i) a randomization of synapse indices j(k) at each spike time t k and (ii) a Gaussian approximation concerning the variables u j(k) (t) and x j(k) (t) determining the "amplitudes" of the delta spikes in Eq. (10). The purpose of the first approximation step is a probabilistic description of the values u j(k) (tk ) and x j(k) (tk ) upon spiking that accounts for the lack of knowledge about synapse identities j(k) at spike times t k . This step rests on an assumption on the law of {s j } j=1,...,N . In feedforward models with biologically realistic spike train statistics or in recurrent networks of spiking neurons, this assumption has to be understood as a useful approximation as it is not satisfied in general.

Assumption 1
The set of spike trains {s j } j=1,...,N has the same law as a set of spike trains {s * j * } j * =1,...,N that is constructed by picking independently and uniformly, for each spike time {t k } k∈Z + of the superposition spike train S N (t), an integer j * (k) between 1 and N and assigning a spike to neuron j * (k) at time t k .
An important case for which Assumption 1 is satisfied is the case where {s j } j=1,...,N is given by N independent Poisson processes with the same (time-varying) intensity. This special case will be extensively used in numerical simulations. Note that Assumption 1 is strictly weaker than assuming that the spike trains of the neurons are given by independent Poisson processes. Here is a simple counterexample: consider the process where at each time t k = t 0 + kτ for k ∈ Z + andτ > 0, a spike occurs at neuron j k , j k being chosen randomly, uniformly and independently at each time t k . This process satisfies Assumption 1 but is clearly not Poisson. Assumption 1 will be useful in the following because it will enable us to replace the unknown synapse j(k) at some spike time t k by a randomly chosen synapse j * (k). This randomization of synapse indices yields a new set of presynaptic spike trains {s * j } j=1,...,N whose population activity A * (t) := 1 N N j=1 s * j (t) is identical to the population activity A N (t) and which by assumption is statistically equivalent to the original spike trains {s j } j=1,...,N . Furthermore, the spike trains {s * j } j=1,...,N induce new STP variables u * j and x * j that obey, analog to Eqs. (7a)-(7b), the dynamics with initial conditions u * j (t 0 ) = u j (t 0 ), x * j (t 0 ) = x j (t 0 ). Assumption 1 implies that the jump processes [u * j (t), x * j (t)], j = 1, . . . , N , have the same law as the original STP variables [u j (t), x j (t)], j = 1, . . . , N .
Using the random indices j * (k), we define a two-dimensional càdlàg jump process i.e.û * (t) andû * (t) are equal to u * j (t) and x * j (t) of that neuron j which had the most recent spike prior or equal to time t. In the following, the process {(û * (t),x * (t))} t>t 0 will appear in the STP dynamics only as prefactors of Dirac-delta spikes. Therefore, only the values (û * (t k ),x * (t k )) at times t k will enter the dynamics. The formulation at all time t > t 0 , Eq. (12), will allow us to factorize modulated spike trains into modulation factor and population activity. For example, using the process (û * (t),x * (t)) we can define the new quantity where the second equality in Eq. (13) follows from the properties of the Dirac-delta function. In the last equation, we have introduced the shorthand notationû * :=û * (t) and x * :=x * (t). By construction of the processesû * (t) andx * (t), we can state an important fact.
Fact 1 Under Assumption 1, the processes y N (t) and y * (t) have the same law. Moreover, conditioned on the population activity A N (t), both processes have the same spike times.
Therefore, we expect that y * (t) yields an accurate approximation of the TPSI y N (t) that is not only statistically equivalent but also captures the precise realization of spike times of the mesoscopic activity A N (t). However, y * (t) is not yet useful for a mesoscopic simulation because the process (û * (t),x * (t)) is constructed from the microscopic variables u * j (t) and x * j (t), j = 1, . . . , N , which still need to be simulated individually.
In order to obtain an approximation that does not rely on microscopic simulations, we introduce new mesoscopic variables given by the empirical means and covariances of u j (t) and x j (t): Similar to y N , we can find an accurate approximation for the new mesoscopic variables using analog definitions based on the asterisk ( * ) variables: have the same law, we can state the following.
are multi-dimensional jump processes that have the same law. Moreover, conditioned on the population activity A N (t), the two jump processes have the same jump times.
The proof is presented in Appendix A. So far, we obtained approximations y * and [u * , x * , P * , Q * , R * ], in whichû * andx * are chosen randomly at spike times so as to account for the missing information about synapse identities at the microscopic scale. However, the process [u * (t), is not yet useful for a mesoscopic simulation because the processesû * (t) andx * (t) require the simulation of the microscopic variables u * j (t) and x * j (t), j = 1, . . . , N . To resolve this problem, we make a Gaussian approximation of the two-dimensional random variables (û * (t k ),x * (t k )) that is solely based on mesoscopic variables. We note again that only the values (û * (t k ),x * (t k )) at spike times matter for the STP dynamics marked with asterisks, Eqs. (14) and (17a)-(17e), because they only appear as prefactors of Dirac delta spikes. For a Gaussian approximation of [û * (t k ),x * (t k )], we need its first two empirical moments given by Eqs. (16a)-(16c). This means that at each spike . To close the system, we also need to approximate [u * , x * , P * , Q * , R * ] by new variables [u, x, P, Q, R] that obey Eqs. (17a)-(17e) but in which [û * (t k ),x * (t k )] is replaced by its Gaussian approximation [û(t k ),x(t k )]. Likewise, we use [û(t k ),x(t k )] in Eq. (14) to approximate the TPSI y * by a variable y. Thus, we obtain our main result.

Second-order mean-field approximation Let {t k } k∈Z + be the spike times of superposition spike train S = NA N . Consider the process (u(t), x(t), P(t), Q(t), R(t),û(t),x(t)) defined by the system of equations
, and for the irrelevant values between spikes we arbitrarily set (û(t), The mesoscopic mean-field approximation of the synaptic input is (18g)

Remarks on the approximation
(1) Various initial conditions are possible for the mesoscopic dynamics. One reasonable choice is The approximation is mesoscopic because the process (u, x, P, Q, R,û,x) defined by Eqs. (18a)-(18e) and (18f) does not involve any microscopic simulations. The process is solely driven by the mesoscopic population activity A N (t).
The key heuristic we use is the Gaussian approximation of the random variables u j * (k) (tk ) and x j * (k) (tk ) at spike times t k . (3) In principle, the jump process, Eqs. (18a)-(18g), can be simulated by a discrete-time forward scheme, where at each spike time t k a bivariate Gaussian random number is drawn. In practice, however, such a simulation would be inefficient because for large N the rate of spike times t k is large and thus the discretization time step needs to be chosen extremely small in order to resolve each spike time. In Appendix B, we present an efficient simulation scheme that allows many spikes per time step. (4) The process (u, x, P, Q, R,û,x) associated with the mesoscopic approximation is well defined. First, as the number of neurons N is finite, we can safely assume that the number of spike in the population is finite on finite time intervals a.s. (all spiking neuron model have a finite number of spikes on finite time interval a.s.). Between spikes, the evolution of the process is deterministic and easy to see. The more tricky increments are at spike times {t k } k∈Z + . At spike time t k , we first draw a random sample (û(t k ),x(t k )) according to Eq. (18f). Then we updateû andx. Finally, we update u, x, P, Q and R according to Eqs. (18a)-(18g). Thus, (u, x, P, Q, R,û,x) is a jump process with random jump sizes, even when conditioned on A N . (5) It is plausible that the second approximation step, in which the process Here is a heuristic argument: (6)). Assuming that is smooth and bounded, I N syn (t) is determined by the small time step integrals t + t t y N (t ) dt , for all t < t and t > 0 arbitrarily small. If the number of spikes in the interval [t , t + t] grows linearly with N , and if t is small enough, under Assumption 1, we can see (heuristically) t + t t y N (t ) dt as the sum of αN (α is some constant) i.i.d random jumps scaled by 1/N . Hence, by a central limit theorem type of argument, t + t t y N (t ) dt only depends on the mean and the variance of the random jumps. In other words, even if the [û * ,x * ] do not become Gaussian when N is large, Eqs. (18a)-(18g) should become accurate when N is sufficiently large because what matters (according to the above heuristic argument) is that [û,x] has the right mean and variance. A rigorous proof of this heuristic argument would be of mathematical interest but goes beyond the scope of the present work. (6) By construction, the process {y(t)} t>t 0 can be easily conditioned on the process {A N (t)} t>t 0 : for any realization of the process A N (t), there is a well defined conditioned process y |A N (t). This is a very practical feature because it means that we have a well defined approximation for any A N (t), stochastic or deterministic. Note that the process A N (t) does not need Assumption 1 to be satisfied to be well defined. Hence, y |A N (t) is well defined even if Assumption 1 is not satisfied. Furthermore, this conditioning feature allows us to generalize the current mesoscopic approximation for feedforward networks to general mesoscopic networks as it will be explained in Sect. 3.
Instead of the Gaussian approximation, Eq. (18f), it is tempting to consider a first-order approximation, whereû * andx * are approximated by the empirical means u * and x * neglecting their variance and covariance. Setting the covariance matrix in Eq. (18f) to zero yields the following.

First-order mean-field approximation We have
This approximation is very similar to the classic mean-field equations derived for N → ∞ by [30] except that it is driven by a sum of spike trains A N (t) = 1 N N j=1 s j (t) and not a continuous rate r(t). In this paper, we name Eqs. (18a)-(18g) the second-order mean-field theory (abbreviated second-order MF) and Eqs. (19a)-(19c) the first-order mean-field theory (abbreviated first-order MF).
In the rest of this work, we compare numerically the more sophisticated second-order MF to the simpler first-order MF. In this section, we focus on the case where the presynaptic spike trains {s j } j=1,...,N are given by N independent Poisson processes with constant rate because in this case, Assumption 1 is satisfied. In Appendix B, we provide an efficient simulation algorithm for Eqs. (18a)-(18g).
Trajectories of u j , x j and R j as well as u, x and R obtained from a microscopic simulation are shown in Fig. 2. The mesoscopic variable R is tracked by the second-order MF dynamics with high accuracy (Fig. 2(b4)). The first-order MF also yields reasonable results, although a small deviation in the mean R over time is apparent (Fig. 2(b4)). This is consistent with previous findings [30], where it has been shown analytically that in the stationary case, relative correlations are small but significant. Note that the second-order MF distinguishes two sources of finite-size noise: noise that comes from the finite-size fluctuations of A N and second, noise that comes from the sampling ofû andx at each spike. This second source of noise is absent in the first-order MF. In a numerical simulation with time step t, it is possible to isolate this second source of noise: if N · t · r (where r is the rate of the Poisson process) is a strictly positive integer α, we can choose, independently at each time step, α neurons uniformly across the N neurons and make them spike. This procedure generates N discretized Poisson spike trains of rate r with a "constant" population activity A N over time. Here, "constant" means that the time-discretized population activity A N (t) = n(t)/(N t) in the numerical simulation with time step t is constant, i.e. the total number of spikes in [t, t + t) is fixed. Note that in this case, the spike trains are not independent of each other but this does not affect our derivation. This procedure is followed in Fig. 2(c1-4): the first-order MF predicts noiseless STP dynamics whereas the second-order MF accurately reproduces the residual finite-size fluctuations.
The deviations of the first-order MF become more pronounced during non-stationary transients caused by stepwise increases of the rate of the Poisson process (Fig. 3). The response to step increases accurately traced by the second-order MF, but not by the firstorder MF which neglects the correlations between u j and x j .

Statistics of the total postsynaptic input
To compare microscopic and mesoscopic descriptions more systematically, we measured the first-and second-order statistics from simulations for varying parameters. At first, we computed the mean of the modulation factor R for the stationary process ( s j (t) = r = const.) using the microscopic dynamics, where · denotes the ensemble (trial) average, i.e. the average over realizations of the Poisson processes s j (t). The mean TPSI is proportional to the mean modulation factor because y = 1 N j u j x j s j = R r. This simple proportionality follows from the fact that u j (t)x j (t) at time t is uncorrelated with s j (t) at the same time because of the Poisson statistic of the spike train and the update of variables after a spike; cf. Eqs. (7a) and (7b). As known from previous work [30], the mean modulation R of a facilitating synapse increases with increasing firing rate of presynaptic neurons when firing rates are small, and decreases again at high rates due to depression (Fig. 4(C)). The first-order MF shows small deviations in the mean modulation R , which are removed by the second-order MF (Fig. 4(C)). A closer inspection of the full parameter regime reveals that the deviation of first-order MF never On the x-and y-axes, τ F · r is a unitless quantity. In (A), the maximum relative error is 4.7% for the first-order MF and 0.3% for the second-order MF. In (B), the maximum relative error is 28.6% for the first-order MF and 4.0% for the second-order MF. As scaling τ F and τ D is equivalent to scaling the firing rate r, the relative error at different firing rates can be read moving along the diagonal (white line). (C) Mean modulating factor R t over time predicted by the first-and second-order MF (dotted blue and solid red lines respectively) compared to microscopic simulations (dashed black line) as a function of the firing rate r for a specific set of synaptic parameters. (D) TPSI variance over time (Var(y) t ) predicted by the first-and second-order MF compared to microscopic simulations as a function of the firing rate r for a specific set of synaptic parameters. Synaptic parameters used in (C)-(D) correspond to the white line in (A)-(B) and are: τ D = 0.3 s, τ F = 0.3 s and U = U 0 = 0.2. Simulation time step is 0.5 ms exceeds 5% (Fig. 4(A)). Therefore the stationary mean TPSI is sufficiently well explained by the first-order MF.
Also, we compared the statistics of fluctuations of the TPSI by measuring the respective power spectral densities (PSD). The PSD can be computed as whereỹ(f ) = T 0 dt exp(2πift)y(t) denotes the Fourier transform of y(t) for a finite but large enough time window T. We found that the second-order MF significantly better captured the variance (Fig. 4(D)) and the PSD (Fig. 5) of the stationary fluctuations than the first-order MF. A closer inspection of the coefficient of variation of the fluctuations, Var(y)/ y , over the full parameter space revealed that the first-order MF deviated up to 30% (especially for slow synaptic dynamics or high rates, Fig. 5(ii), (iv)), whereas the second-order model performed well in the whole parameter space (Fig. 4(B)). We should specify that the error of the first-order MF is negative, i.e. the first-order MF underestimates the coefficient of variation up to 30%. This comes from the fact that in the mesoscopic equations for the first-order MF equations (19a)-(19c), finite-size fluctuations of the mesoscopic variables u and x are ignored.
In conclusion, while mean responses for stationary cases are well captured by the firstorder MF, the second-order MF gives a significantly better description of transient responses (Fig. 3)

Microscopic model
As shown in our previous work [12], networks of multiple interacting homogeneous populations of spiking neurons, with static synapses, can be accurately predicted with a mesoscopic model. Incorporating the effect of STP in this general model using the mesoscopic approximations of Sect. 2.3 for the feedforward case is actually easy. This is due to the fact the mesoscopic approximation y(t) (both the first-order MF and the second-order MF) can be conditioned on the population activity A N (t), as mentioned in Sect. 2.3. In order to illustrate this, we consider for simplicity the special case of a single population with recurrent connectivity. The network architecture is random with fixed in-degree C = pN , where N is the number of neurons in the population and p is the connection probability. The synaptic strength is constant with magnitude w (in mV). The TPSI y i (t) and the synaptic current I syn,i (t) driving the postsynaptic neuron i are related by Eq. (1) with J = τ m R m Npw. In this paper, we use a synaptic filtering kernel with instantaneous rise and exponential decay corresponding to the first-order kinetics τ s dI syn,i dt = -I syn,i + Jy i (t), where τ s is the synaptic filtering time constant. Importantly, the effect of STP is contained in the TPSI y i (t) = 1 C j∈Γ i u j (t)x j (t)s j (t) via the synaptic variables u j and x j given by the dynamics Eq. (7a) and (7b). Here, Γ i denotes the index set of presynaptic neurons that connect to neuron i.
As our derivation of the mesoscopic theory of STP uses the assumption that neurons have Poisson statistics, we first apply our theory to Poisson rate neurons, which do not exhibit a dependence on spike history. Then, using the same setup, the theory will be applied to a network of generalized integrate-and-fire (GIF) neurons with pronounced spike-history effects.

Poisson rate model
The Poisson rate model is defined by a continuous variable h i (t), called input potential. The input potential of neuron i obeys the dynamics where τ m represents the membrane time constant and μ(t) = V rest + R m I ext (t) is the total drive in the absence of synaptic input (constant resting potential V rest and common external stimulus I ext (t)). In the fully-connected network, the synaptic current I syn,i (t) is the same for all neurons i and is given by Eq. (22). In each time interval [t, t + dt), spikes are drawn with probability λ i (t) dt, where the firing rate λ i (t) depends on the input potential as follows: Here, we choose a sigmoidal transfer function of the form Φ(h) = r max /[1 + exp(-β(hh 0 ))] and h 0 = 0 mV.

GIF model
The GIF model for the postsynaptic neuron dynamics is determined by the membrane potential V i (t), the dynamic threshold ϑ i (t) and a conditional intensity λ i (t) for the stochastic spike generating mechanism. Here, the index i = 1, . . . , N represents the neuron label. Between the spikes, the membrane potential satisfies the dynamics where the quantities τ m , μ(t), R m and I syn,i (t) are the same as in the rate model above. After a spike, the voltage is immediately reset to the potential V r , where it is clamped for an absolute refractory period t ref = 4 ms.
Spikes are generated by a conditional intensity (hazard rate) of an exponential form: if the last spike was emitted less than t ref ago, where V th is a constant. This means that the conditional intensity, and hence the probability λ i (t) dt to fire in the interval [t, t + dt), depends on the momentary distance between the membrane potential and threshold. This completes the definition of the microscopic model.

Mesoscopic mean-field model
As explained in [12], the random connectivity can be well approximated by a fully connected network (C = N ), with rescaled synaptic weights pw, corresponding to a mean-field approximation. In the following, we shall therefore choose p = 1 unless stated otherwise. In the mean-field approximation, the TPSI y(t) and the synaptic current I syn (t) do not depend on the identities j of the postsynaptic neurons and are related by Eq. (1) with J = τ m R m Npw. For the case of exponential synapses, the synaptic current reads whereû andx obey the mean-field equations (18a)-(18g). As shown in [12], the population activity A N (t) can be determined by a single mesoscopic variable, the instantaneous rate r(t), whose dynamics depends on the microscopic model and the history of the population activity H t = {A(t )|t < t}. Specifically, A N (t) is given by the normalized spike train where n N (t) is a counting process with (conditional) intensityλ(t) = Nr(t) representing the total number of spikes in the population up to time t. The second equality means that A N (t) is proportional to a spike train with spike times t k generated with rate Nr(t). This is similar to the superposition of Poisson spike train in the feedforward case, Sect. 2, where the pooled spike train also exhibits the rate Nr(t).

Poisson rate model
In the Poisson rate model, the rate r(t) is given by Importantly, the coupling to the STP dynamics is contained in the synaptic current I syn governed by Eq. (27) and the synaptic mean-field dynamics given by Eqs. (18a)-(18g).

GIF population model
For the model with spike-history dependence, the rate r(t) is obtained from an integral over refractory states. A possible representation of the neuronal refractory state is given by the time τ since the last spike ("age of the neurons"; an alternative representation in terms of the last spike timest = tτ is given in Appendix D). In the following, we assume that the process is initialized at t 0 = -∞. Given the distribution of ages in the population, q(τ , t), the rate at time t results from [12], where the density q(τ , t) evolves according to the first-order partial differential equation with time-dependent boundary condition at τ = 0: Here, A N (t) is given by Eq. (28). In Eqs. (31) and (32), the functions λ and Λ are given by where V and W are dynamical variables that obey the following dynamics: The agedependent membrane potential V (τ , t) and variance function W (τ , t) follow the first-order partial differential equations with boundary conditions V (t, 0) = V r and W (t, 0) = 0. The coupling to the STP mean-field dynamics, Eqs. (18a)-(18g), is contained in the synaptic current I syn governed by Eq. (27), which influences the voltage V (t, τ ) (Eq. (34)), and hence changes λ(t, τ ) and Λ(t).
The population equations (31)- (35) have been efficiently integrated numerically by the algorithm presented in [12]. The numerical integration of the STP mean-field dynamics is given in Appendix B.

Finite-size noise induced population spikes
An interesting example of collective neural dynamics, potentially linked to synaptic depression, is the phenomenon of population spikes in cultured neural networks [37]. We asked whether population spikes, a brief period of high average population activity, can be explained by our finite-size population theory with STP. As in previous work [37], we considered a single excitatory population endowed with STP. The mesoscopic mean-field equations allowed us to choose parameters of this model such that the macroscopic meanfield dynamics (N → ∞, see Eqs. (56a)-(56e)) is in an excitable regime for the secondorder MF but not for the first-order MF. Here excitable regime means that the macroscopic dynamics converges to an equilibrium point if the total drive remains below a certain threshold. However, if the threshold is exceeded (e.g. by a brief excitable stimulus or an increase of recurrent synaptic excitation), the activity rises rapidly to large values due to the positive feedback of recurrent excitation. The explosive rise of the activity is terminated by the beginning of synaptic depression, which acts as negative feedback and ultimately As expected for an excitable system driven by noise [42], the population activity exhibits irregular population spikes if the population size is small (here N = 100), i.e. if finite-size noise is sufficiently strong (Fig. 6(A)(i)-(iii)). In our case, the drive that causes population spikes originates from finite-size fluctuations as expressed by the stochastic terms in Eq. (28). For N = 100, the second-order MF accurately predicts the mean activity ( Fig. 6(B)) and power spectrum ( Fig. 6(C)) of the full microscopic simulation whereas the first-order MF deviates quantitatively. Importantly, in the limit of large population size, population spikes vanish in the second-order MF theory consistent with microscopic simulations (Fig. 6(A)(vi), (iv), N = 5000). In marked contrast to microscopic simulations, highly regular population spikes persist even for N = 5000 in the first-order MF approximation corresponding to a deterministic limit-cycle dynamics (Fig. 6(A)(v)).
In summary, the second-order MF approximation accurately reproduces the qualitative behavior and the mean and the power spectrum of excitatory networks of Poisson neurons with synaptic STP. The statistical properties of the first-order MF dynamics exhibit quantitative deviations of statistical properties and in some cases fails to reproduce the qualitative behavior if the system is poised near a bifurcation. The large discrepancies between the first-order MF and the microscopic model in the example we show (Fig. 6) are mainly caused by the error in the mean modulation factor R. Indeed, for our choice of τ D = τ F = 1 s, correlations between u j and x j are relatively strong but are neglected by the first-order MF. We note that this error appears already for the deterministic (i.e. N → ∞)  Fig. 6). A microscopic network of N = 1000 neurons is simulated with three different connection probabilities: p = 1.0, 0.2 and 0.1. Synaptic weights w are set such that the rescaled synaptic weights pw is constant. These three microscopic networks share the same mean-field approximation. Notice that the second-order MF is more accurate that the first-order MF even if the connection probability is reduced. Here, the model parameters are slightly modified from Fig. 6 to ensure that population spikes occur in the microscopic simulation 1000 neurons. Parameters are detailed in Appendix E. Superscript ( N ) of A N is omitted in the legends dynamics. Inaccuracies in the correct description of finite-size noise in the first-order MF model may yield additional sources of errors.
Furthermore, the second-order MF remains accurate when the connection probability p of the network is smaller than 1 (given that the number pN of incoming synapses per neuron is roughly greater than 100). When p is smaller than 1, we argue in Sect. 3.2 that the microscopic network can be approximated by a fully-connected network with rescaled synaptic weights (mean-field approximation). We test this claim numerically in Fig. 7: using the same setup as in Fig. 6, for a population size of N = 1000 neurons, we show that the second-order MF yields more accurate prediction than the first-order MF even when the connection probability is reduced down to 0.1.

Bistable switching between Up and Down states induced by finite-size fluctuations
Another collective phenomenon in neural networks is multistability. In the presence of finite-size noise, systems with multistable behavior exhibit switches between different attractor states [12,43,44]. In particular, bistable neural systems driven by noise support stochastic switches between high and low population activity ("Up and Down states") [12,31,45,46]. As a starting point of our simulations of Up and Down states, following [31], we use an excitatory population with synaptic depression in the bistable regime. The qualitative behavior of the microscopic model exhibiting Up and Down states is captured by both first-and second-order MF (Fig. 8). A closer look at the mean firing rate, which is mainly determined by the ratio of the time spent in the Up or Down state, reveals that the first-order MF dynamics predicts significantly longer residence times in the Up state ( Fig. 8(B)). In contrast, the second-order MF approximation accurately matches the simulation of the microscopic model. In this example we have chosen τ D τ F such that the correlations between u j and x j are negligible. As the consequence, the mean modulation factors R predicted by the first-and second-order MF theories, and hence the mean TPSI y , are almost equal (cf. Fig. 4(A)). The error made by the first-order MF approximation

Recurrent network of GIF neurons-microscopic vs. mesoscopic simulations
As a final demonstration of the mesoscopic MF theory with STP, we consider an excitable regime generating population spikes as in Sect. 3.3.1 but with more realistic neurons described by a GIF spiking neuron model (Sects. 3.1.2 and 3.2.2). Because of spike-history dependencies such as refractoriness, spike arrivals at synapses are no longer Poisson processes. Hence, the Poisson assumption of first-and second-order MF theories is not fulfilled anymore for recurrent GIF networks. Nevertheless, it is interesting to see whether population spikes can still be captured by the mesoscopic MF equations. To this end, we simulated the recently developed mesoscopic population equations for populations of GIF neurons [12] given by Eqs. (22), (31)-(35) extended by the MF equations for STP, Eqs. (18a)-(18g). The full MF theory qualitatively reproduces population spikes at small population sizes (Fig. 9(A)) and their extinction for large populations (Fig. 9(B)). Both the mean (Fig. 9(C)) and fluctuation statistics ( Fig. 9(D)) are roughly captured by the MF equations albeit with small deviations from the microscopic simulation. However, a clear advantage of 2nd vs. first-order approximation is not apparent. This indicates that the second-order approximation does not necessarily yield a better approximation for net-

Discussion
We have derived stochastic mean-field (MF) equations that capture the effect of synaptic short-term plasticity (STP) at the level of populations. These equations generalize previous MF theories for deterministic population rates [4,30,31] to the case of finite-size populations with stochastic population rates (mesoscopic description). The mesoscopic STP dynamics is compatible with a recent mesoscopic population model [12], which has been originally derived for static synapses. The mesoscopic MF dynamics of STP can thus be easily included into existing mesoscopic models. We find that a first-order mean-field approximation that accounts for stochastic rates but neglects correlations between facilitation and depression variables (as in [47]) approximates well the mean stationary input. This mean input is slightly improved by a second-order approximation, which accounts for correlations but neglects third and higher-order cumulants. The main strength of the second-order MF theory lies in the prediction of fluctuations and transient responses of the STP variables. We have shown that population spikes and UP and Down state switches in a one-population model with synaptic depression can be well described by the extended mesoscopic model. In particular, the second-order MF equations accurately replicate simulations of a network of Poisson neurons coupled via dynamic synapses. For networks of GIF spiking neurons the agreement is less accurate but still captures the qualitative collective dynamics.
In simulations of neuronal populations with STP, our mesoscopic mean-field model yields a considerable reduction of computational complexity. Compared to a network with static synapses, each neuron is endowed with two additional variables u j and x j that capture the effect of dynamic synapses onto its postsynaptic target neurons. In a single population of N neurons and connection probability p, a microscopic simulation thus requires the numerical integration of 2pN additional equations. By contrast, a simulation of the mesoscopic model only needs 4 additional equations per population. Thus we expect that our extended mesoscopic dynamics offers a significant speed up of large-scale simulations of cortical circuits with dynamic synapses [7].
An interesting question that has been studied theoretically [27][28][29] is how STP affects information transmission through a large ensemble of dynamic synapses. Our reduction of a synaptic ensemble to a four-dimensional nonlinear mean-field dynamics offers a mathematical framework to derive approximate analytical expressions for measures of information transmission. Analysing information processing capabilities of STP in the context of our mean-field theory is an interesting topic for future studies.
We have employed the deterministic STP model of Tsodyks and Markram [30]. While the resulting mean-field equations hold for this specific model, the same approach can be applied straightforwardly to other deterministic models of STP (e.g. [21,48]). It is less obvious how to treat stochastic models of STP. Biological synapses are highly stochastic owing to the small number of synaptic vesicles that are randomly released upon spike arrival. This includes a finite probability of transmission failure. Using stochastic models of STP, it has been shown that synaptic stochasticity has a strong impact on information transmission [28] and postsynaptic neural responses [49]. On the population level, it seems to be feasible to treat this source of randomness in a similar manner as we did in Sect. 2.3. A generalization to a mesoscopic STP model that is applicable for stochastic synapses, will be an important subject for further studies.
The mean-field equations for the STP dynamics have been derived under the assumption that presynaptic spike trains are, loosely speaking, Poisson (Assumption 1). We have tested the mean-field equations in a feedforward setup and a recurrent network of Poisson rate units, where Assumption 1 holds true, and we found excellent agreements with microscopic simulations. For the application to recurrent networks of generalized integrateand-fire (GIF) neurons in Sect. 3.4, Assumption 1 is not fulfilled because of refractoriness and other spike-history dependencies of single neurons [50,51]. Despite the non-Poisson (colored-noise) statistics of spike trains in integrate-and-fire networks, a Poisson (whitenoise) assumption is commonly used in mean-field theories as a "first-order approximation" [52]. In a similar spirit, we here simply assumed that synaptic input can be treated as a Poisson process so as to apply our MF theory for STP to networks of GIF neurons. For a simple one-population model with excitatory synaptic connections and STP that exhibits non-trivial dynamics in the form of population spikes, we have shown that the MF equations reproduce the qualitative behavior. This indicates that the MF theory may be valid beyond networks of Poisson neurons. A theoretical analysis of the effect of non-Poisson inputs and the region of validity of the present MF model is beyond the scope of the present paper and remains to be studied. To this end, theoretical approaches to treat dynamic synapses driven by renewal processes [53][54][55] might be a promising starting point.
Note that although the derivation of the mesoscopic mean-field approximation is systematic we do not obtain any mathematical guarantee that the process y(t) is close (in any sense) to the process y N (t) we want to approximate. In the case where the spike trains {s j } j=1,...,N are N independent Poisson processes of rate λ(t), it might be possible to prove that the processes y(t) and y N (t) converge to the same diffusion approximation as N tends to infinity. Obtaining such a proof would be, however, challenging because in our case, we do not know around which mean value the process fluctuates, i.e. we do not know the deterministic N → ∞ limit. This is due to the product u j x j in Eq. (7b) and the fact that we do not have a closed form expression for the limit of 1 N N j=1 u j x j when N → ∞. Note that if one considers purely facilitating (or purely depressing) synapses, the deterministic N → ∞ limit can be computed (see [32]). Even in the case of purely facilitating or depressing synapses, deriving the diffusion approximation for the evolution of the mesoscopic variables u(t) or x(t) would be non-trivial because the jump sizes are modulated by the microscopic u j and x j (see Eqs. (38a)-(38b)). These are two independent reasons why standard techniques from the fluid limit literature (see [38][39][40]) cannot be applied directly. Also, one clear advantage of our current approach over the diffusion approximation approach is that our mesoscopic approximation can be conditioned on the process A N (t), i.e. it can be conditioned on any sequence of spike times {t k } k∈Z + .
In our previous work [12], we have developed a mean-field theory of neuronal populations that incorporates spike-history dependencies, such as refractoriness and adaptation, and finite-size fluctuations in a consistent manner. By adding another important feature-synaptic short-term plasticity-we have here made a further step towards a microscopically-grounded mesoscopic population model of a cortical circuit.

Appendix A: Proof of Lemma 2.1
To derive the system of equations (16a)-(16c), it is useful to rewrite the microscopic synaptic dynamics Eqs. (11a)-(11b) in differential form: Here, dn j (t) denotes the increment of the counting process n j (t) = t 0 s j (t ) dt at the jth synapse and uj is a shorthand for the left limit. The stochastic differential equations for the mesoscopic quantities u N (t) and x N (t) are then With some calculations, we can write similar equations for P N (t), Q N (t) and R N (t). For instance, to compute dP N we need to evaluate d(u 2 j ). Using Itô formula for jump processes, we get Similarly, we write We now compute the jump terms: Summing the differentials d(u 2 j ), d(x 2 j ) and d(u j x j ) over all N synapses, we obtain For notational simplicity, we have derived Eqs. (38a)-(38b) and (42a)-(42c) for the variables [u N , x N , P N , Q N , R N ] defined by Eqs. (15a)-(15c) starting from microscopic variables u j and x j defined by Eqs. (7a)-(7b). A completely analogous derivation can be performed for the variables [u * , x * , P * , Q * , R * ] defined by Eqs. (16a)-(16c) starting from the corresponding microscopic variables u * j and x * j defined by Eqs. (11a)-(11b). For the variables marked with asterisk, we thus have where u * j and x * j are shorthands for u * j (t -) and x * j (t -), respectively.
The sums over the weighted spike counts dn * j cannot be expressed as a deterministic function of the mesoscopic spike count dn N = j n * j (t). However, we can make use of the fact that in an infinitesimally small time interval dt at most one spike can occur and contribute to the sums, which thereby simplify considerably: dn * j is either zero for all j (i.e. dn N = 0), or there exists one and only one j for which dn * j = 1 (in this case dn N = 1). This implies that we can write for any function g(u * j , x * j ) multiplying dn * where j * (k) is the random synapse index at time t k and (û * (t),x * (t)) is given by Eq. (12). Applying relation (44) to Eqs. (43a)-(43e), we obtain we recover Eqs. (17a)-(17e) of the main text.

Appendix B: Efficient numerical implementation of the mesoscopic approximation
The mesoscopic STP dynamics given by Eqs. (18a)-(18g) are driven by a point process A N (t) or increments dn(t) that are multiplied by a stochastic factor of the form f (û(t),x(t)). In simulations, these stochastic amplitudes require some care. A straightforward discretization of Eqs. (18a)-(18g) would be to drawû(t) andx(t) from their joint Gaussian distribution (18f) in each time step independently, compute f (û(t),x(t)) and multiply by the number of spikes that occur in the discretization interval [t, t + t). However, this approach is only correct if the discretization time step t is small enough such that n contains at most one spike. Because spikes result from a population of many neurons, this condition would require an extremely small time step (such that Nr(t) t 1), and would thus yield a highly inefficient simulation algorithm. Luckily, the independence of the factors f (û(t k ),x(t k )) across spikes in the interval [t, t + t) allows us to use larger time steps that may contain multiple spikes: a due to the independence, the integration of the stochastic term in Eqs. (18a)-(18g) symmetrically, ε u ε 2 x ≈ 0; and finally, These approximations allow for the completion of the derivation. In summary, the Euler scheme corresponding to Eqs. (18a)-(18g) is u(t + t) = u(t) + u, x(t + t) = x(t) + x, P(t + t) = P(t) + P, Q(t + t) = Q(t) + Q, with increments given by Here, we abbreviated μ P = U P(U -2) -2u(U -1) + U , ε P = 2U 1 + u(U -2) -U ε u , μ Q = PQ -2Qu + 2 R + (u -2)x (Rux), ε Q = 2(u -1)x 2 ε u + 2u(u -2)xε x , The generation of correlated centered Gaussian random variables ε u (t) and ε x (t) with covariance matrix (49) can be implemented by standard methods. For instance, one may compute in each time step the correlation coefficient In the initial warm-up phase, it may happen that the numerical values of the variances Pu 2 and Qx 2 are non-positive or the absolute value of the correlation coefficient |ρ| exceeds unity. One simple practical solution to this problem is to set ε u and ε x to zero in these cases. Otherwise, we generate random variables by the formula where z 1 and z 2 are independent standard normal random numbers. Finally, we note that in numerical simulations it is convenient to operate on the spike counts n(t) rather than on the population activity A N (t). The discretized population activity can be easily obtained from the spike counts via the formula

Appendix C: Equations for infinite-size populations and steady-state formulas
A macroscopic theory of STP for infinite-size populations for what we call the first-order MF has been presented in [30]. We detail here the adaptation of our second-order MF to the case if infinite-size populations.
Note that these steady-states formulas are very useful for carrying out precise phase-plane analyses.

Appendix D: Mesoscopic population equations for network of GIF neurons-ODE representation
The equation for the population rate r(t) for GIF neurons involves the integral, Eq. (31), over all possible refractory states and a set of partial differential equations (so-called quasilinear equations) for the quantities q(τ , t), V (τ , t) and W (τ , t). Instead of the age of the neuron (i.e. the time since its last spike), the refractory state can be equivalently specified by the last spike timet = tτ . This variable transformation turns the partial differential equations into ordinary differential equations (ODEs), which yields an alternative formulation of the mesoscopic population equations. If the refractory state is specified by the last spike timet = tτ , we need to consider the density of last spike times Q(t, t) ≡ q(t -t, t). Instead of Q, it is slightly more convenient to write Q(t, t) = S(t|t)A N (t), where we introduced the survivor function S(t|t). With this notation the population rate, Eq. (31), becomes [12] Using the method of characteristics, the quasi-linear equation (32) for q(τ , t) has an equivalent ODE representation for the characteristic curves {Q(t, t)} t>t : dQ/dt = -λQ with initial condition Q(t,t) = A N (t). This corresponds to an ODE for the survivor function: dS(t|t) dt = -λ(t|t)S(t|t), S(t|t) = 1.