 Research
 Open Access
 Published:
Laws of Large Numbers and Langevin Approximations for Stochastic Neural Field Equations
The Journal of Mathematical Neuroscience volume 3, Article number: 1 (2013)
Abstract
In this study, we consider limit theorems for microscopic stochastic models of neural fields. We show that the Wilson–Cowan equation can be obtained as the limit in uniform convergence on compacts in probability for a sequence of microscopic models when the number of neuron populations distributed in space and the number of neurons per population tend to infinity. This result also allows to obtain limits for qualitatively different stochastic convergence concepts, e.g., convergence in the mean. Further, we present a central limit theorem for the martingale part of the microscopic models which, suitably rescaled, converges to a centred Gaussian process with independent increments. These two results provide the basis for presenting the neural field Langevin equation, a stochastic differential equation taking values in a Hilbert space, which is the infinitedimensional analogue of the chemical Langevin equation in the present setting. On a technical level, we apply recently developed law of large numbers and central limit theorems for piecewise deterministic processes taking values in Hilbert spaces to a master equation formulation of stochastic neuronal network models. These theorems are valid for processes taking values in Hilbert spaces, and by this are able to incorporate spatial structures of the underlying model.
Mathematics Subject Classification (2000):60F05, 60J25, 60J75, 92C20.
1 Introduction
The present study is concerned with the derivation and justification of neural field equations from finite size stochastic particle models, i.e., stochastic models for the behaviour of individual neurons distributed in finitely many populations, in terms of mathematically precise probabilistic limit theorems. We illustrate this approach with the example of the Wilson–Cowan equation
We focus on the following two aspects:

(A)
Often one wants to study deterministic equations such as Eq. (1.1) in order to obtain results on the ‘behaviour in the mean’ of an intrinsically stochastic system. Thus, we first discuss limit theorems of the law of large numbers type for the limit of infinitely many particles. These theorems connect the trajectories of the stochastic particle models to the deterministic solution of mean field equations, and hence provide a justification studying Eq. (1.1) in order to infer on the behaviour of the stochastic system.

(B)
Secondly, we aim to characterise the internal noise structure of the complex discrete stochastic models as in the limit of large numbers of neurons the noise is expected to be close to a simpler stochastic process. Ultimately, this yields a stochastic neural field model in terms of a stochastic evolution equation conceptually analogous to the Chemical Langevin Equation. The Chemical Langevin Equation is widely used in the study of chemical reactions networks for which the stochastic effects cannot be neglected but a numerical or analytical study of the exact discrete model is not possible due to its inherent complexity.
In this study, we understand as a microscopic model a description as a stochastic process, usually a Markov chain model, also called a master equation formulation (cf. [3, 5, 8, 9, 22] containing various master equation formulations of neural dynamics). In contrast, a macroscopic model is a deterministic evolution equation such as (1.1). Deterministic mean field equations have been used widely and for a long time to model and analyse large scale behaviour of the brain. In their original deterministic form, they are successfully used to model geometric visual hallucinations, orientation tuning in the visual cortex and wave propagation in cortical slices to mention only a few applications. We refer to [7] for a recent review and an extensive list of references. The derivation of these equations is based on a number of arguments from statistical physics and for a long time a justification from microscopic models has not been available. The interest in deriving mean field equations from stochastic microscopic model has been revived recently as it contains the possibility to derive deterministic ‘corrections’ to the mean field equations, also called secondorder approximations. These corrections might account for the inherent stochasticity, and thus incorporate so called finite size effects. This has been achieved by either applying a pathintegral approach to the master equation [8, 9] or by a van Kampen systemsize expansion of the master equation [5]. In more detail, the author in the latter reference proposes a particular master equation for a finite number of neuron populations and derives the Wilson–Cowan equation as the firstorder approximation to the mean via employing the van Kampen system size expansion and then taking the continuum limit for a continuum of populations. In keeping also the secondorder terms, a ‘stochastic’ version of the mean field equation is also presented in the sense of coupling the first moment equation to an equation for the second moments.
However, the van Kampen system size expansion does not give a precise mathematical connection, as it neither quantifies the type of convergence (quality of the limit), states conditions when the convergence is valid nor does it allow to characterise the speed of convergence. Furthermore, particular care has to be taken in systems possessing multiple fixed points of the macroscopic equation, and we refer to [5] for a discussion of this aspect in the neural field setting. The limited applicability of the van Kampen system size expansion was already well known to Sect. 10 in van Kampen [33]. In parallel to the work of van Kampen, T. Kurtz derived precise limit theorems connecting sequences of continuous time Markov chains to solutions of systems of ordinary differential equations; see the seminal studies [19, 20] or the monograph [15]. Limit theorems of that type are usually called the fluid limit, thermodynamic limit, or hydrodynamic limit; for a review, see, e.g., [13].
As is thoroughly discussed in [5] establishing the connection between master equation models and mean field equations involves two limit procedures. First, a limit which takes the number of particles, in this case neurons per considered population, to infinity (thermodynamic limit), and a second which gives the mean field by taking the number of populations to infinity (continuum limit). In this ‘double limit’, the theorems by Kurtz describe the connection of taking the number of neurons per population to infinity yielding a system of ordinary differential equation, one for each population. Then the extension from finite to infinite dimensional state space is obtained by a continuum limit. This procedure corresponds to the approach in [5]. Thus, taking the double limit step by step raises the question what happens if we first take the spatial limit and then the fluid limit, thus reversing the order of the limit procedures, or in the case of taking the limits simultaneously. Recently, in an extension to the work of Kurtz, one of the present authors and coauthors established limit theorems that achieve this double limit [27], thus being able to connect directly finite population master equation formulations to spatiotemporal limit systems, e.g., partial differential equation or integrodifferential equations such as the Wilson–Cowan equation (1.1). In a general framework, these limit theorems were derived for Piecewise Deterministic Markov Processes on Hilbert spaces, which in addition to the jump evolution also allow for a coupled deterministic continuous evolution. This generality was motivated by applications to neuron membrane models consisting of microscopic models of the ion channels coupled to a deterministic equation for the transmembrane potential. We find that this generality is also advantageous for the present situation of a pure jump model as it allows to include timedependent inputs. In this study, we employ these theorems to achieve the aims (A) and (B) focussing on the example of the deterministic limit given by the Wilson–Cowan equation (1.1).
Finally, we state what this study does not contain, which in particular distinguishes the present study from [5, 8, 9] beyond mathematical technique. Presently, the aim is not to derive moment equations, i.e., a deterministic set of equations that approximate the moments of the Markovian particle model, but rather processes (deterministic or stochastic) to which a sequence of microscopic models converges under suitable conditions in a probabilistic way. This means that a microscopic model, which is close to the limit—presently corresponding to a large number of neurons in a large number of populations—can be assumed to be close to the limiting processes in structure and pathwise dynamics as indicated by the quality of the stochastic limit. Hence, the present work is conceptually—though neither in technique nor results—close to [30] wherein using a propagation to chaos approach in the vicinity of neural field equations the author also derives in a mathematically precise way a limiting process to finite particle models. However, it is an obvious consequence that the convergence of the models necessarily implies a close resemblance of their moment equations. This provides the connection to [5, 8, 9], which we briefly comment on in Appendix B.
As a guide, we close this introduction with an outline of the subsequent sections and some general remarks on the notation employed in this study. In Sects. 1.1 to 1.3, we first discuss the two types of mean field models in more detail, on the one hand, the Wilson–Cowan equation as the macroscopic limit and, on the other hand, a master equation formulation of a stochastic neural field. The main results of the paper are found in Sect. 2. There we set up the sequence of microscopic models and state conditions for convergence. Limit theorems of the law of large numbers type are presented in Theorem 2.1 and Theorem 2.2 in Sect. 2.1. The first is a classical weak law of large numbers providing uniform convergence on compacts in probability and the second convergence in the mean uniformly over the whole positive time axis. Next, a central limit theorem for the martingale part of the microscopic models is presented in Sect. 2.2 characterising the internal fluctuations of the model to be of a diffusive nature in the limit. This part of the study is concluded in Sect. 2.3 by presenting the Langevin approximations that arise as a result of the preceding limit theorems. The proofs of the theorems in Sect. 2 are deferred to Sect. 4. The study is concluded in Sect. 3 with a discussion of the implications of the presented results and an extension of these limit theorems to different master equation formulations or mean field equations.
Notations and Conventions Throughout the study, we denote by ${L}^{p}(D)$, $1\le p\le \mathrm{\infty}$, the Lebesgue spaces of real functions on a domain $D\subset {\mathbb{R}}^{d}$, $d\ge 1$. Physically reasonable choices are $d\in \{1,2,3\}$, however, for the mathematical theory presented the spatial dimension can be arbitrary. In the present study, spatial domains D are always bounded with a sufficiently smooth boundary, where the minimal assumption is a strong local Lipschitz condition; see [2]. For bounded domains D, this condition simply means that for every point on the boundary its neighbourhood on the boundary is the graph of a Lipschitz continuous function. Furthermore, for $\alpha \in \mathbb{N}$ we denote by ${H}^{\alpha}(D)$ the Sobolev spaces, i.e., subspaces of ${L}^{2}(D)$, with the corresponding Sobolev norm. For $\alpha \in {\mathbb{R}}_{+}\mathrm{\setminus}\mathbb{N}$ we denote by ${H}^{\alpha}(D)$ the interpolating Besov spaces. In this study, ${H}^{\alpha}(D)$ is the dual space of ${H}^{\alpha}(D)$, which is in contrast to the widespread notation to denote by ${H}^{\alpha}(D)$, $\alpha \ge 0$, the dual space of ${H}_{0}^{\alpha}(D)$. As usual, we have ${H}^{0}(D)={L}^{2}(D)={H}^{0}(D)$. We thus obtain a continuous scale of Hilbert spaces ${H}^{\alpha}(D)$, $\alpha \in \mathbb{R}$, which satisfy that ${H}^{{\alpha}_{1}}(D)$ is continuously embedded^{a} in ${H}^{{\alpha}_{2}}(D)$ for all ${\alpha}_{1}<{\alpha}_{2}$. Next, a pairing ${(\cdot ,\cdot )}_{{H}^{\alpha}}$ denotes the inner product of the Hilbert space ${H}^{\alpha}(D)$ and pairings in angle brackets ${\u3008\cdot ,\cdot \u3009}_{{H}^{\alpha}}$ denote the duality pairing for the Hilbert space ${H}^{\alpha}(D)$. That is, for $\psi \in {H}^{\alpha}(D)$ and $\varphi \in {H}^{\alpha}(D)$ the expression ${\u3008\varphi ,\psi \u3009}_{{H}^{\alpha}}$ denotes the application of the real, linear functional ϕ to ψ. Furthermore, the spaces ${H}^{\alpha}(D)$, ${L}^{2}(D)$ and ${H}^{\alpha}(D)$ form an evolution triplet, i.e., the embeddings are dense and the application of linear functionals and the inner product in ${L}^{2}(D)$ satisfy the relation
Norms in Hilbert spaces are denoted by ${\parallel \cdot \parallel}_{{H}^{\alpha}}$, ${\parallel \cdot \parallel}_{0}$ is used to denote the supremum norm of real functions, i.e., for $f:\mathbb{R}\to \mathbb{R}$ we have ${\parallel f\parallel}_{0}={sup}_{z\in \mathbb{R}}f(z)$, and $\cdot $ denotes either the absolute value for scalars or the Lebesgue measure for measurable subsets of Euclidean space. Finally, we use ${\mathbb{N}}_{0}$ to denote the set of integers including zero.
1.1 The Macroscopic Limit
Neural field equations are usually classified into two types: ratebased and activitybased models. The prototype of the former is the Wilson–Cowan equation; see Eq. (1.1), which we also restate below, and the Amari equation, see Eq. (3.7) in Sect. 3, is the prototype of the latter. Besides being of a different structure, due to their derivation, the variable they describe has a completely different interpretation. In ratebased models, the variable describes the average rate of activity at a certain location and time, roughly corresponding to the fraction of active neurons at a certain infinitesimal area. In activitybased models, the macroscopic variable is an average electrical potential produced by neurons at a certain location. For a concise physical derivation that leads to these models, we refer to [5]. In the following, we consider ratebased equations, in particular, the classical Wilson–Cowan equation, to discuss the type of limit theorems we are able to obtain. We remark that the results are essentially analogous for activity based models.
Thus, the macroscopic model of interest is given by the equation
where $\tau >0$ is a decay time constant, $f:\mathbb{R}\to {\mathbb{R}}_{+}$ is a gain (or response) function that relates inputs that a neuron receives to activity. In (1.3), the value $f(z)$ can be interpreted as the fraction of neurons that receive at least threshold input. Furthermore, $w(x,y)$ is a weight function, which states the connectivity strength of a neuron located at y to a neuron located at x, and finally, $I(t,x)$ is an external input, which is received by a neuron at x at time t. For the weight function $w:D\times D\to \mathbb{R}$ and the external input I, we assume that $w\in {L}^{2}(D\times D)$ and $I\in C({\mathbb{R}}_{+},{L}^{2}(D))$. As for the gain function f, we assume in this study that f is nonnegative, satisfies a global Lipschitz condition with constant $L>0$, i.e.,
and it is bounded. From an interpretive pointofview, it is reasonable and consistent to stipulate that f is bounded by one—being a fraction—as well as being monotone. The latter property corresponds to the fact that higher input results in higher activity. In specific models, f is often chosen to be a sigmoidal function, e.g., $f(z)={(1+{\mathrm{e}}^{({\beta}_{1}z+{\beta}_{2})})}^{1}$ in [6] or $f(z)=(tanh({\beta}_{1}z+{\beta}_{2})+1)/2$ in [3], which both satisfy $f\in [0,1]$. Moreover, the most common choices of f are even infinitely often differentiable with bounded derivatives, which already implies the Lipschitz condition (1.4).
The Wilson–Cowan equation (1.3) is wellposed in the strong sense as an integral equation in ${L}^{2}(D)$ under the above conditions. That is, Eq. (1.3) possesses a unique, continuously differentiable global solution ν to every initial condition $\nu (0)={\nu}_{0}\in {L}^{2}(D)$, i.e., $\nu \in {C}^{1}([0,T],{L}^{2}(D))$ for all $T>0$, which depends continuously on the initial condition. Furthermore, if the initial condition satisfies ${\nu}_{0}(x)\in [0,{\parallel f\parallel}_{0}]$ almost everywhere in D, then it holds for all $t>0$ that $\nu (t,x)\in (0,{\parallel f\parallel}_{0})$ for almost all $x\in D$. For a brief derivation of these results, we refer to Appendix A where we also state a result about higher spatial regularity of the solution: Let $\alpha \in \mathbb{N}$ be such that $\alpha >d/2$. If now ${\nu}_{0}\in {H}^{\alpha}(D)$ and if f is at least αtimes differentiable with bounded derivatives and the weights and the input function satisfy $w\in {H}^{\alpha}(D\times D)$ and $I\in C({\mathbb{R}}_{+},{H}^{\alpha}(D))$, then the equation is wellposed in ${H}^{\alpha}(D)$, i.e., for all $T>0$ in $\nu \in {C}^{1}([0,T],{H}^{\alpha}(D))$. In particular, this implies that the solution ν is jointly continuous on ${\mathbb{R}}_{+}\times D$.
1.2 Master Equation Formulations of Neural Network Models
For the microscopic model, we concentrate on a variation of the model considered in [5, 6], which is already an improvement on a model introduced in [11]. We extend the model including variations among neuron populations and foremost timedependent inputs. We chose this model over the master equation formulations in [8, 9] as it provides a more direct connection of the microscopic and macroscopic models; see also the discussion in Sect. 3. We describe the main ingredients of the model beginning with the simpler, timeindependent model as prevalent in the literature. Subsequently, in Sect. 1.3 the final, timedependent model is defined.
We denote by P the number of neuron populations in the model. Further, we assume that the k th neuron population consists of identical neurons which can either be in one of two possible states, active, i.e., emitting action potentials, and inactive, i.e., quiescent or not emitting action potentials. Transitions between states occur instantaneously and at random times. For all $k=1,\dots ,P$, the random variables ${\Theta}_{t}^{k}$ denote the number of active neurons at time t. An integer $l(k)$ is used to characterise the population size. This number $l(k)$ can be interpreted as the number of neurons in the k th population, at least for sufficiently large values. However, this is not accurate in the literal sense as it is possible with positive probability for populations to contain more than $l(k)$ active neurons. Nevertheless, a posteriori the interpretation can be salvaged from the obtained limit theorems.^{b} It is a corollary of these that the probability of more then $l(k)$ neurons being active for some time becomes arbitrarily small for large enough $l(k)$. Hence, for physiological reasonable neuron numbers the probability in these models of observing ‘nonphysiological’ trajectories in the interpretation becomes ever smaller.
Proceeding with notation, ${\Theta}_{t}=({\Theta}_{t}^{1},\dots ,{\Theta}_{t}^{P})$ is a (unbounded) piecewise constant stochastic process taking values in ${\mathbb{N}}_{0}^{P}$. The stochastic transitions from inactive to active states and vice versa for a neuron in population k are governed by a constant inactivation rate ${\tau}^{1}>0$—uniformly for all populations—and inputs from other neurons depending on the current network state. This nonnegative activation rate is given by ${\tau}^{1}l(k){\overline{f}}_{k}(\theta )$ for $\theta \in {\mathbb{N}}_{0}^{P}$. For the definition of ${\overline{f}}_{k}$, we consider weights ${\overline{W}}_{kj}$, $k,j=1,\dots ,P$, which weigh the input one neuron in population k receives from a neuron in population j. Then the activation rate of a neuron in population k is proportional to
for a nonnegative function $f:\mathbb{R}\to \mathbb{R}$, which obviously corresponds to the gain function f in the Wilson–Cowan equation (1.3). We remark that here f is not the rate of activation of one neuron. In this model, the activation rate of a population is not proportional to the number of inactive neurons but it is proportional to $l(k)$, which stands for the total number of neurons in the population. In [5], this rate is thus interpreted as the rate with which a neuron becomes or remains active.
It follows that the process ${({\Theta}_{t})}_{t\ge 0}$ is a continuoustime Markov chain which is usually defined via the following master equation, where ${e}_{k}$ denotes the k th basis vector of ${\mathbb{R}}^{P}$,
which is endowed with the boundary conditions $\mathbb{P}[\theta ,t]=0$ if $\theta \notin {\mathbb{N}}_{0}^{P}$. In (1.6), the variable $\mathbb{P}[\theta ,t]$ denotes the probability that the process ${\Theta}_{t}$ is in state θ at time t. Finally, the definition is completed with stating an initial law ℒ, the distribution of ${\Theta}_{0}$, i.e., providing an initial value for the ODE system (1.6).
Another definition of a continuoustime Markov chain is via its generator; see, e.g., [15]. Although the master equation is widely used in the physics and chemical reactions literature the mathematically more appropriate object for the study of a Markov process is its generator and the master equation is an object derived from the generator, see Sect. V in [33]. The generator of a Markov process is an operator defined on the space of real functions over the state space of the process. For the above model defined by the master equation (1.6), the generator is given by
for all suitable $g:{\mathbb{N}}_{0}^{P}\to \mathbb{R}$. For details, we refer to [15]. Here, λ is the total instantaneous jump rate, given by
and defines the distribution of the waiting time until the next jump, i.e.,
Further, the measure μ in (1.7) is a Markov kernel on the state space of the process defining the conditional distribution of the postjump value, i.e.,
for all sets $A\subseteq {\mathbb{N}}_{0}^{P}$. In the present case for each θ, the measure μ is given by the discrete distribution
The importance of the generator lies in the fact that it fully characterises a Markov process and that convergence of Markov processes is strongly connected to the convergence of their generators; see [15].
1.3 Including External TimeDependent Input
Until now, the microscopic model does not incorporate any timedependent input into the system. In analogy to the macroscopic equation (1.3), this input enters into the model inside the active rate function ${\overline{f}}_{k}$. Thus, let ${\overline{I}}_{k}(t)$ denote the external input into a neuron in population k at time t, then the timedependent activation rate is given by
The most important qualitative difference when substituting (1.5) by (1.11) is that the corresponding Markov process is no longer homogeneous. In particular, the waiting time distributions in between jumps are no longer exponential, but satisfy
Hence, the resulting process is an inhomogeneous continuoustime Markov chain; see, e.g., Sect. 2 in [36]. It is straightforward to write down the corresponding master equation analogously to (1.6) yielding a system of nonautonomous ordinary differential equations, cf. the master equation formulation in [8]. Similarly, there exists the notion of a timedependent generator for inhomogeneous Markov processes, cf. Sect. 4.7 in [15]. Employing a standard trick, that is, suitably extending the state space of the process, we can transform a inhomogeneous to a homogeneous Markov process [15, 28]. That is, the spacetime process ${Y}_{t}:=({\Theta}_{t},t)$ is again a homogeneous Markov process. The initial law of the associated spacetime process is $\mathcal{L}\times {\delta}_{0}$ on ${\mathbb{N}}^{P}\times {\mathbb{R}}_{+}$. We emphasise that definitions of the spacetime process and its initial law imply that the timecomponent starts at 0 a.s. and, moreover, moves continuously and deterministically. That is, the trajectories satisfy in between jumps the differential equation
where the jump intensity λ is given by the sum of all individual timedependent rates analogously to (1.8). Finally, the post jump value is given by a Markov kernel $\mu ((\theta ,t),\cdot )\times {\delta}_{t}$ as there clearly do not occur jumps in the progression of time and μ is the obvious timedependent modification of (1.10).
It thus follows, that the spacetime process ${({\Theta}_{t},t)}_{t\ge 0}$ is a homogeneous Piecewise Deterministic Markov Process (PDMP); see, e.g., [14, 16, 26]. This connection is particularly important as we apply in the course of the present study limit theorems developed for this type of processes; see [27]. Finally, for the spacetime process ${({\Theta}_{t},t)}_{t\ge 0}$, we obtain for suitable functions $g:{\mathbb{N}}_{0}^{P}\times {\mathbb{R}}_{+}\to \mathbb{R}$ the generator
2 A Precise Formulation of the Limit Theorems
In this section, we present the precise formulations of the limit theorems. To this end, we first define a suitable sequence of microscopic models, which gives the connection between the defining objects of the Wilson–Cowan equation (1.3) and the microscopic models discussed in Sect. 1.2. Thus, ${({Y}_{t}^{n})}_{t\ge 0}={({\Theta}_{t}^{n},t)}_{t\ge 0}$, $n\in \mathbb{N}$, denotes a sequence of microscopic PDMP neural field models of the type as defined in Sect. 1.3. Each process ${({Y}_{t}^{n})}_{t\ge 0}$ is defined on a filtered probability space $({\Omega}^{n},{\mathcal{F}}^{n},{({\mathcal{F}}_{t}^{n})}_{t\ge 0},{\mathbb{P}}^{n})$, which satisfies the usual conditions. Hence, the defining objects for the jump models are now dependent on an additional index n. That is $P(n)$ denotes the number of neuron populations in the n th model, $l(k,n)$ is the number of neurons in the k th population of the n th model and analogously we use the notations ${\overline{W}}_{kj}^{n}$ and ${\overline{I}}_{k,n}$ and ${\overline{f}}_{k,n}$. However, we note from the beginning that the decay rate ${\tau}^{1}$ is independent of n and τ is the time constant in the Wilson–Cowan equation (1.3). In the following paragraphs, we discuss the connection of the defining components of this sequence of microscopic models to the components of the macroscopic limit.
Connection to the Spatial Domain D A key step of connecting the microscopic models to the solution of Eq. (1.3) is that we need to put the individual neuron populations into relation to the spatial domain D the solution of (1.3) lives on. To this end, we assume that each population is located within a subdomain of D and that the subdomains of the individual populations are nonoverlapping. Hence, for each $n\in \mathbb{N}$, we obtain a collection ${\mathcal{D}}_{n}$ of $P(n)$ nonoverlapping subsets of D denoted by ${D}_{1,n},\dots ,{D}_{P(n),n}$. We assume that each subdomain is measurable and convex. The convexity of the subdomains is a technical condition that allows us to apply Poincaré’s inequality, cf. (4.1). We do not think that this condition is too restrictive as most reasonable partition domains, e.g., cubes, triangles, are convex. Furthermore, for all reasonable domains D, e.g., all Jordan measurable domains, a sequence of convex partitions can be found such that additionally the conditions imposed in the limit theorems below are also satisfied. One may think of obtaining the collection ${\mathcal{D}}_{n}$ by partitioning the domain into $P(n)$ convex subdomains ${D}_{1,n},\dots ,{D}_{P(n),n}$ and confining each neuron population to one subdomain. However, it is not required that the union of the sets in ${\mathcal{D}}_{n}$ amounts to the full domain D nor that the partitions consists of refinements. Necessary conditions on the limiting behaviour of the subdomains are very strongly connected to the convergence of initial conditions of the models, which is a condition in the limit theorems; see below. For the sake of terminological simplicity, we refer to ${\mathcal{D}}_{n}$ simply as the partitions.
We now define some notation for parameters characterising the partitions ${\mathcal{D}}_{n}$: the minimum and maximum Lebesgue measure, i.e., length, area, or volume depending on the spatial dimension, is denoted by
and the maximum diameter of the partition is denoted by
where the diameter of a set ${D}_{k,n}$ is defined as $diam({D}_{k,n}):={sup}_{x,y\in {D}_{k,n}}xy$. In the special case of domains obtained by unions of cubes with edge length ${n}^{1}$, it obviously holds that ${v}_{\pm}(n)={n}^{d}$ and ${\delta}_{+}(n)=\sqrt{d}{n}^{1}$. It is a necessary condition in all the subsequent limit theorems that ${lim}_{n\to \mathrm{\infty}}{\delta}_{+}(n)=0$. This condition implies on the one hand that ${lim}_{n\to \mathrm{\infty}}{v}_{+}(n)=0$ as the Lebesgue measure of a set is bounded in terms of its diameter, and on the other hand—at least in all but degenerate cases due to the necessary convergence of initial conditions that ${lim}_{n\to \mathrm{\infty}}P(n)=\mathrm{\infty}$. That is, in order to obtain a limit the sequence of partitions usually consists of ever finer sets and the number of populations diverges. Finally, each domain ${D}_{k,n}$ of the partition ${\mathcal{D}}_{n}$ contains one neuron population ‘consisting’ of $l(k,n)\in \mathbb{N}$ neurons. Then we denote by ${\ell}_{\pm}(n)$ the maximum and minimum number of neurons in populations corresponding to the n th model, i.e.,
Connection to the Weight Function w We assume that there exists a function $w:D\times D\to \mathbb{R}$ such that the connection to the discrete weights is given by
where w is the same function as in the Wilson–Cowan equation (1.3). For the definition of activation rate at time t, we thus obtain
As already highlighted by Bressloff [5], the transition rates are not uniquely defined by the requirement that a possible limit to the microscopic models is given by the Wilson–Cowan equation (1.3). If in (2.5), the definition of the transition rates is changed to
where ${f}^{n}$, $n\in \mathbb{N}$, is a sequence of functions converging uniformly to f, then all limit theorems remain valid. The proof can be carried out as presented adding and subtracting the appropriate term where the additional difference term vanishes due to ${sup}_{x\in \mathbb{R}}{f}^{n}(x)f(x)\to 0$ for $n\to \mathrm{\infty}$. Hence, any microscopic model with gain rates ${\overline{f}}_{k,n}$ of such a form reduces to the same Wilson–Cowan equation in the limit. Clearly, the same applies analogously to the decay rate τ, the weights w, and the input I.
Connection to the Input Current I
The external input which is applied to neurons in a certain population is obtained by spatially averaging a spacetime input over the subdomain that population is located in, i.e.,
This completes the definition of the Markov jump processes ${({\Theta}_{t}^{n},t)}_{t\ge 0}$. For the sake of completeness, we repeat the definition of the total jump rate
and the transition measure ${\mu}^{n}$ is defined by
for all $k=1,\dots ,P(n)$.
Connection to the Solution ν As functions of time, the paths of the PDMP ${({\Theta}_{t}^{n},t)}_{t\ge 0}$ and the solution ν live on different state spaces. The former takes values in ${\mathbb{N}}_{0}^{P}\times {\mathbb{R}}_{+}$ and the latter in ${L}^{2}(D)$. Thus, in order to compare these two, we have to introduce a mapping that maps the stochastic process onto ${L}^{2}(D)$. In [27], the authors called such a mapping a coordinate function, which is also the terminology used in [13]. In fact, the limit theorems we subsequently present actually are for the processes we obtain from the composition of the coordinate functions with the PDMPs. Here, it is important to note that for each $n\in \mathbb{N}$ the coordinate functions may—and usually do—differ, however, they project the process into the common space ${L}^{2}(D)$. For the mean field models, we define the coordinate functions for all $n\in \mathbb{N}$ by
Clearly, each ${\nu}^{n}$ is a measurable map into ${L}^{2}(D)$. For the composition of ${\nu}^{n}$ with the stochastic process ${({\Theta}_{t}^{n},t)}_{t\ge 0}$, we also use the abbreviation ${\nu}_{t}^{n}:={\nu}^{n}({\Theta}_{t}^{n})$, and hence the resulting stochastic process ${({\nu}_{t}^{n})}_{t\ge 0}$ is an adapted càdlàg process taking values in ${L}^{2}(D)$. This process thus states the activity at a location $x\in D$ as the fraction of active neurons in the population, which is located around this location.
Connection of the Initial Conditions
One condition in the subsequent limit theorems is the convergence of initial conditions in probability, i.e., the assumption that
It is easy to see that such a sequence of initial conditions ${\Theta}_{0}^{n}$, $n\in \mathbb{N}$, can be found for any deterministic initial condition ${\nu}_{0}$ under some reasonable conditions on the domain D and the sequence of partitions ${\mathcal{D}}_{n}$. Hence, the assumption (2.8) can always be satisfied. For example, we may define such a sequence of initial conditions by
Next, assuming that partitions fill the whole domain D for $n\to \mathrm{\infty}$, i.e., ${lim}_{n\to \mathrm{\infty}}D\mathrm{\setminus}{\bigcup}_{k=1}^{P(n)}{D}_{k,n}=0$, and that the maximal diameter of the sets decreases to zero, i.e., ${lim}_{n\to \mathrm{\infty}}{\delta}_{+}(n)=0$, it is easy to see using the Poincaré inequality (4.1) that the above definition of the initial condition implies that ${\parallel {\nu}_{0}^{n}\nu (0)\parallel}_{{L}^{2}(D)}\to 0$ and ${sup}_{n\in \mathbb{N}}{\parallel {\nu}_{0}^{n}\parallel}_{{L}^{2}(D)}^{2r}<\mathrm{\infty}$ for all $r\ge 1$. Then (2.8) holds trivially as the initial condition is deterministic and converges. A simple nondegenerate sequence of initial conditions is obtained by choosing random initial conditions with the above value as their mean and sufficiently fast decreasing fluctuations. Furthermore, a sequence of partitions, which satisfy the above conditions also exists for a large class of reasonable domains D. Assume that D is Jordan measurable, i.e., a bounded domain such that the boundary is a Lebesgue null set, and let ${\mathcal{C}}_{n}$ be the smallest grid of cubes with edge length $1/n$ covering D. We define ${\mathcal{D}}_{n}$ to be the set of all cubes, which are fully in D. As D is Jordan measurable, these partitions fill up D from inside and ${\delta}_{+}(n)\to 0$. For a more detailed discussion of these aspects, we refer to [26].
In the remainder of this section, we now collect the main results of this article. We start with the law of large numbers, which establishes the connection to the deterministic mean field equation, and then proceed to central limit theorems which provide the basis for a Langevin approximation. The proofs of the results are deferred to Sect. 4.
2.1 A Law of Large Numbers
The first law of large numbers takes the following form. Note that the assumptions imply that the number of neuron populations diverges.
Theorem 2.1 (Law of large numbers)
Let $w\in {L}^{2}(D)\times {L}^{2}(D)$ and $I\in {L}_{\mathrm{loc}}^{2}({\mathbb{R}}_{+},{H}^{1}(D))$. Assume that the sequence of initial conditions converges to $\nu (0)$ in probability in the space ${L}^{2}(D)$, i.e., (2.8) holds, that ${\mathbb{E}}^{n}{\Theta}_{0}^{k,n}\le l(k,n)$, and that
holds. Then it follows that the sequence of ${L}^{2}(D)$valued jumpprocess ${({\nu}_{t}^{n})}_{t\ge 0}$ converges uniformly on compact time intervals in probability to the solution ν of the Wilson–Cowan equation (1.3), i.e., for all $T,\u03f5>0$ it holds that
Moreover, if for $r\ge 1$ the initial conditions satisfy in addition ${sup}_{n\in \mathbb{N}}{\mathbb{E}}^{n}{\parallel {\nu}_{0}^{n}\parallel}_{{L}^{2}(D)}^{2r}<\mathrm{\infty}$, then convergence in the rth mean holds, i.e., for all $T>0$
Remark 2.1 The norm of the uniform convergence ${sup}_{t\in [0,T]}{\parallel \cdot \parallel}_{{L}^{2}(D)}$, which we used in Theorem 2.1 is a very strong norm on the space of ${L}^{2}(D)$valued càdlàg functions on $[0,T]$. Hence, due to continuous embeddings, the result immediately extends to weaker norms, e.g., the norms ${L}^{p}((0,T),{L}^{2}(D))$ for all $1\le p\le \mathrm{\infty}$. Also, for the state space, weaker spatial norms can be chosen, e.g., ${L}^{p}(D)$ with $1\le p\le 2$ or any norm on the duals ${H}^{\alpha}(D)$ of Sobolev spaces with $\alpha >0$. If weaker norms for the state space are considered, it is possible to relax the conditions of Theorem 2.1 by sharpening some estimates in the proof of the theorem. The results in the following corollary cover the whole range of $\alpha \ge 0$ and splits it into sections with weakening conditions. In particular note that after passing to weaker norms, the convergence does not necessitate that the neuron numbers per population diverge. However, regarding the divergence of the neuron populations, this condition (${\delta}_{+}(n)\to 0$) cannot be relaxed.
Corollary 2.1 Let $\alpha \ge 0$ and set
Further, assume that $w\in {L}^{q}(D)\times {L}^{2}(D)$ and $I\in {L}_{\mathrm{loc}}^{2}({\mathbb{R}}_{+},{H}^{1}(D))$ and that the sequence of initial conditions converges to $\nu (0)$ in probability in the space ${H}^{\alpha}(D)$, that ${lim}_{n\to \mathrm{\infty}}{\delta}_{+}(n)=0$ and
where 1− denotes an arbitrary positive number strictly smaller than 1. Then it holds for all $T,\u03f5>0$ that
and for $r\ge 1$, if the additional boundedness assumptions of Theorem 2.1 are satisfied, that for all $T>0$
Remark 2.2 We believe that fruitful and illustrative comparisons of these convergence results and their conditions to the results in Kotelenez [17, 18], and particularly, Blount [4] can be made. Here, we just mention that the latter author conjectured the conditions (2.13) to be optimal for the convergence, but was not able to prove this result in his model of chemical reactions with diffusions for the region $\alpha \in (0,d/2]$. For our model, we could achieve these rates.
2.1.1 InfiniteTime Convergence
In the law of large numbers, Theorem 2.1, and its Corollary 2.1 we have presented results of convergence over finite time intervals. Employing a different technique, we are also able to derive a convergence result over the whole positive time axis motivated by a similar result in [32]. The proof of the following theorem is deferred to Sect. 4.3. Restricted to finite time intervals, the subsequent result is strictly weaker than Theorem 2.1. However, the result is important when one wants to analyse the mean long time behaviour of the stochastic model via a bifurcation analysis of the deterministic limit as (2.14) suggests that ${\mathbb{E}}^{n}{\nu}_{t}^{n}$ is close to $\nu (t)$ for all times $t\ge 0$ for sufficiently large n.
Theorem 2.2 Let $\alpha \ge 0$ and assume that the conditions of Corollary 2.1 are satisfied. We further assume that the current input function $I\in {L}_{\mathrm{loc}}^{2}({\mathbb{R}}_{+},{H}^{1}(D))$ satisfies ${\parallel {\mathrm{\nabla}}_{x}I\parallel}_{{L}^{\mathrm{\infty}}({\mathbb{R}}_{+},{L}^{2}(D))}<\mathrm{\infty}$, i.e., it is square integrable in ${H}^{1}(D)$ over bounded intervals, and possesses first spatial derivatives bounded for almost all $t\ge 0$ in ${L}^{2}(D)$. Then it holds that
2.2 A Martingale Central Limit Theorem
In this section, we present a central limit theorem for a sequence of martingales associated with the jump processes ${\nu}^{n}$. A brief, heuristic discussion of the method of proof for the law of large numbers explains the importance of these martingales and motivates their study. In the proof of the law of large numbers, the central argument relies on the fact that the process ${({\nu}_{t}^{n})}_{t\ge 0}$ satisfies the decomposition
Here, the process ${({M}_{t}^{n})}_{t\ge 0}$ is a Hilbert spacevalued, squareintegrable, càdlàg martingale using (2.15) as its definition. We have used this representation of the process ${\nu}^{n}$ in the proof of Theorem 2.2; see Sect. 4.3. We note that the Bochner integral in (2.15) is a.s. well defined due to bounded second moments of the integrand; see (4.7) in the proof of Theorem 2.1. Now an heuristic argument to obtain the convergence to the solution of the Wilson–Cowan equation is the following: The initial conditions converge, the martingale term ${M}^{n}$ converges to zero and the integral term in the righthand side of (2.15) converges to the righthand side in the Wilson–Cowan equation (1.3). Hence, the ‘solution’ ${\nu}^{n}$ of (2.15) converges to the solution ν of the Wilson–Cowan equation (1.3). Now interpreting Eq. (2.15) as a stochastic evolution equation, which is driven by the martingale ${({M}_{t}^{n})}_{t\ge 0}$ sheds light on the importance of the study of this term. Because, from this point of view, the martingale part in the decomposition (2.15) contains all the stochasticity inherent in the system. Then the idea for deriving a Langevin or linear noise approximation is to find a stochastic nontrivial limit (in distribution) for the sequence of martingales and substituting heuristically this limiting martingale into the stochastic evolution equation. Then it is expected that this new and much less complex process behaves similarly to the process ${({\nu}_{t}^{n})}_{t\ge 0}$ for sufficiently large n. Deriving a suitable limit for ${({M}_{t}^{n})}_{t\ge 0}$ is what we set to do next. The result can be found in Theorem 2.3 below and takes the form of a central limit theorem.
First of all, what has been said so far implies the necessity of rescaling the martingale with a diverging sequence in order to obtain a nontrivial limit. The conditions in the law of large numbers imply in particular that the martingale converges uniformly in the mean square to zero, i.e.,
which in turn implies convergence in probability and convergence in distribution to the zero limit.
Furthermore, in contrast to Euclidean spaces norms on infinitedimensional spaces are usually not equivalent. In Corollary 2.1, we exploited this fact as it allowed us to obtain convergence results under less restrictive conditions by changing to strictly weaker norms. In the formulation and proof of central limit theorems, the change to weaker norms even becomes an essential ingredient. It is often observed in the literature, see, e.g., [4, 17, 18] that central limit theorems cannot be proven in the strongest norm for which the law of large numbers holds, e.g., ${L}^{2}(D)$ in the present setting, but only in a strictly weaker norm. Here, this norm is the norm in the dual of an appropriate Sobolev space. Hence, from now on, we consider for all $n\in \mathbb{N}$ the processes ${({\nu}_{t}^{n})}_{t\ge 0}$ and the martingales ${({M}_{t}^{n})}_{t\ge 0}$ as taking values in the space ${H}^{\alpha}(D)$ for an $\alpha >d$, where d is the dimension of the spatial domain D, using the embedding of ${L}^{2}(D)$ into ${H}^{\alpha}(D)$. The technical significance of the restriction $\alpha >d$ is that these are the indices such that there exists an embedding ${H}^{\alpha}(D)$ into a ${H}^{{\alpha}_{1}}(D)$ with $d/2<{\alpha}_{1}<\alpha $, which is of Hilbert–Schmidt type^{c} due to Maurin’s theorem and ${H}^{{\alpha}_{1}}(D)$ is embedded into $C(\overline{D})$ due to the Sobolev embedding theorem. These two properties are essential for the proof of the central limit theorem and their occurrence will be made clear subsequently.
The limit we propose for the rescaled martingale sequence is a centred diffusion process in ${H}^{\alpha}(D)$, that is, a centred continuous Gaussian stochastic process ${({X}_{t})}_{t\ge 0}$ taking values in ${H}^{\alpha}(D)$ with independent increments and given covariance $C(t)$, $t\ge 0$; see, e.g., [12, 25] for a discussion of Gaussian processes in Hilbert spaces. Such a process is uniquely defined by its covariance operator and conversely, each family of linear, bounded operators $C(t):{H}^{\alpha}(D)\to {H}^{\alpha}(D)$, $t\ge 0$, uniquely defines a diffusion process^{d} if

(i)
each $C(t)$ is symmetric and positive, i.e.,
$${\u3008C(t)\varphi ,\psi \u3009}_{{H}^{\alpha}(D)}={\u3008C(t)\psi ,\varphi \u3009}_{{H}^{\alpha}(D)}\phantom{\rule{1em}{0ex}}\text{and}\phantom{\rule{1em}{0ex}}{\u3008C(t)\varphi ,\varphi \u3009}_{{H}^{\alpha}(D)}\ge 0,$$ 
(ii)
each $C(t)$ is of trace class, i.e., for one (and thus every) orthonormal basis ${\phi}_{j}$, $j\in \mathbb{N}$, in ${H}^{\alpha}(D)$ it holds that
$$\sum _{j=1}^{\mathrm{\infty}}{\u3008C(t){\phi}_{j},{\phi}_{j}\u3009}_{{H}^{\alpha}(D)}<\mathrm{\infty},$$(2.16) 
(iii)
and the family $C(t)$, $t\ge 0$, is continuously increasing in t in the sense that the map $t\mapsto {\u3008C(t)\varphi ,\psi \u3009}_{{H}^{\alpha}(D)}$ is continuous and increasing for all $\varphi ,\psi \in {H}^{\alpha}(D)$.
We next define the process, which will be the limit identified in the martingale central limit theorem via its covariance. In order to define the operator C, we first define a family of linear operators $G(\nu (t),t)$ mapping from ${H}^{\alpha}(D)$ into the dual space ${H}^{\alpha}(D)$ via the bilinear form
It is obvious that this bilinear form is symmetric and positive and, as $\nu (t)$ is continuous in t, it holds that the map $t\mapsto {\u3008G(\nu (t),t)\varphi ,\psi \u3009}_{{H}^{\alpha}(D)}$ is continuous for all $\varphi ,\psi \in {H}^{\alpha}(D)$. Furthermore, it is easy to see that the operator is bounded, i.e.,
as the solution of the Wilson–Cowan equation ν and the gain function f are pointwise bounded. Hence, due to the Cauchy–Schwarz inequality, the norm ${\u3008G(\nu (t),t)\varphi ,\psi \u3009}_{{H}^{\alpha}(D)}$ is proportional to the product ${\parallel \varphi \parallel}_{{L}^{2}(D)}{\parallel \psi \parallel}_{{L}^{2}(D)}$ and for any $\alpha \ge 0$ the Sobolev embedding theorem gives now a uniform bound in terms of the norm of ϕ, ψ in ${H}^{\alpha}(D)$. As a final property, we show that these operators are of traceclass if $\alpha >d/2$. Thus, let ${({\phi}_{j})}_{j\in \mathbb{N}}$ be an orthonormal basis in ${H}^{\alpha}(D)$, then the Cauchy–Schwarz inequality yields
Summing these inequalities for all $j\in \mathbb{N}$, we find that the resulting righthand side is finite as due to Maurin’s theorem the embedding of ${H}^{\alpha}(D)$ into ${L}^{2}(D)$ is of Hilbert–Schmidt type. Moreover, their trace is even bounded independently of t.
Now, it holds that the map $t\mapsto G(\nu (t),t)$ is continuous taking values in the Banach space of trace class operators, hence we define trace class operators $C(t)$ from ${H}^{\alpha}(D)$ into ${H}^{\alpha}(D)$ via the Bochner integral for all $t\ge 0$
Clearly, the resulting bilinear form ${\u3008C(t)\cdot ,\cdot \u3009}_{{H}^{\alpha}(D)}$ inherits the properties of the bilinear form (2.17). Moreover, due to the positivity of the integrands, it follows that ${\u3008C(t)\varphi ,\varphi \u3009}_{{H}^{\alpha}(D)}$ is increasing in t for all $\varphi \in {H}^{\alpha}(D)$. Hence, the family of operators $C(t)$, $t\ge 0$, satisfies the above conditions (i)–(iii), and thus uniquely defines an ${H}^{\alpha}(D)$valued diffusion process.
We are now able to state the martingale central limit theorem. The proof of the theorem is deferred to Sect. 4.4.
Theorem 2.3 (Martingale central limit theorem)
Let $\alpha >d$ and assume that the conditions of Theorem 2.1 are satisfied. In particular, convergence in the mean holds, i.e., (2.11) holds for $r=1$. Additionally, we assume it holds that
Then it follows that the sequence of rescaled ${H}^{\alpha}(D)$valued martingales
converges weakly on the space of ${H}^{\alpha}(D)$valued càdlàg function to the ${H}^{\alpha}(D)$valued diffusion process defined by the covariance operator $C(t)$ given by (2.18).
Remark 2.3 In connection with the results of Theorem 2.3, two questions may arise. First, in what sense is there uniqueness of the rescaling sequence, and hence of the limiting diffusion? That is, does a different scaling also produce a (nontrivial) limit, or, rephrased, is the proposed scaling the correct one to look at? Secondly, the theorem deals with the norms for the range of $\alpha >d$ in the Hilbert scale, what can be said about convergence in the stronger norms corresponding to the range of $\alpha \in [0,d]$? Does there exist a limit? We conclude this section addressing these two issues.
Regarding the first question, it is immediately obvious that the rescaling sequence $\frac{{\ell}_{}(n)}{{v}_{+}(n)}$, which we denote by ${\rho}_{n}$ in the following, is not a unique sequence yielding a nontrivial limit. Rescaling the martingales ${M}^{n}$ by any sequence of the form $\sqrt{c{\rho}_{n}}$ yields a convergent martingale sequence. However, the limiting diffusion differs only in a covariance operator, which is also rescaled by c, and hence the limit is essentially the same process with either ‘stretched’ or ‘shrinked’ variability. However, the asymptotic behaviour of the rescaling sequences, which allow for a nontrivial weak limit is unique. In general, by considering different rescaling sequences ${\rho}_{n}^{\ast}$, we obtain three possibilities for the convergence of the sequence $\sqrt{{\rho}_{n}^{\ast}}{M}^{n}$. If ${\rho}_{n}^{\ast}$ is of the same speed of convergence as ${\rho}_{n}$, i.e., for ${\rho}_{n}^{\ast}=\mathcal{O}({\rho}_{n})$, the thus rescaled sequence converges again to a diffusion process for which the covariance operator is proportional to (2.18). This is then just a rescaling by a sequence (asymptotically) proportional to ${\rho}_{n}$ as discussed above. Secondly, if the convergence is slower, i.e., ${\rho}_{n}^{\ast}=o({\rho}_{n})$, then the same methods as in the law of large numbers show that the sequence converges to zero uniformly on compacts in probability, hence also convergence in distribution to the degenerate zero process follows. Thus, one only obtains the trivial limit. Finally, if we rescale by a sequence that diverges faster, i.e., ${\rho}_{n}=o({\rho}_{n}^{\ast})$, we can show that there does not exist a limit. This follows from general necessary conditions for the preservation of weak limits under transformation, which presuppose that $\sqrt{{\rho}_{n}^{\ast}/{\rho}_{n}}M$ has to converge in distribution in order for $\sqrt{{\rho}_{n}^{\ast}}{M}_{n}$ possessing a limit in distribution; see Theorem 2 in [29]. As the sequence ${\rho}_{n}^{\ast}/{\rho}_{n}$ diverges, this is clearly not possible to hold.
Unfortunately, an answer to the second question is not possible in this clarity, when considering nontrivial limits. Essentially, we can only say that the currently used methods do not allow for any conclusion on convergence. The limitations are the following: The central problem is that for the parameter range $\alpha \in [0,d]$ the current method does not provide tightness of the rescaled martingale sequence, hence we cannot infer that the sequence possesses a convergent subsequence. However, if tightness can be established in a different way then for the range $\alpha \in (max\{1,d/2\},d]$, the limit has to be the diffusion process defined by the operator (2.18) as follows from the characterisation of any limit in the proof of the theorem. Here, the lower bound of $max\{1,d/2\}$ results, on the one hand, from our estimation technique, which necessitates $\alpha \ge 1$, and on the other hand, from the definition of the limiting diffusion. Recall that the covariance operator is only of trace class for $\alpha >d/2$. Hence, for $\alpha \in [0,d/2]$, we can no longer infer that the limiting diffusion even exists.
2.3 The MeanField Langevin Equation
An important property of the limiting diffusion in view toward analytic and numerical studies is that it can be represented by a stochastic integral with respect to a cylindrical or QWiener process. For a general discussion of infinitedimensional stochastic integrals, we refer to [12]. First, let ${({W}_{t})}_{t\ge 0}$ be a cylindrical Wiener process on ${H}^{\alpha}(D)$ with covariance operator being the identity. Then $G(\nu (t),t)\circ {\iota}^{1}$ is a trace class operator on ${H}^{\alpha}(D)$ for suitable values of α. Here, ${\iota}^{1}:{H}^{\alpha}(D)\to {H}^{\alpha}(D)$ is the Riesz representation, i.e., the usual identification of a Hilbert space with its dual. The operator $G(\nu (t),t)\circ {\iota}^{1}$ possesses a unique squareroot we denote by $\sqrt{G(\nu (t),t)\circ {\iota}^{1}}$, which is a Hilbert–Schmidt operator on ${H}^{\alpha}(D)$. It follows that the stochastic integral process
is a diffusion process in ${H}^{\alpha}(D)$ with covariance operator $C(t)$. That is, ${({Z}_{t})}_{t\ge 0}$ is a version of the limiting diffusion in Theorem 2.3. Now, formally substituting for the limits in (2.15) yields the linear noise approximation
or in differential notation
where ${\u03f5}_{n}=\sqrt{{v}_{+}(n)/{\ell}_{}(n)}$ is small for large n. Here, we have used the operator notation
Equation (2.21) is an infinitedimensional stochastic differential equation with additive (linear) noise. Here, additive means that the coefficient in the diffusion term does not depend on the solution ${U}_{t}$. A second formal substitution yields the Langevin approximation. Here, the dependence of the diffusion coefficient on the deterministic limit ν is formally substituted by a dependence on the solution. That is, we obtain a stochastic partial differential equation with multiplicative noise given by
or in differential notation
Note that the derivation of the above equations was only formal, hence we have to address the existence and uniqueness of solutions and the proper setting for these equations. This is left for future work. It is an ongoing discussion and probably undecidable as lacking a criterion of approximation quality which—if any at all—is the correct diffusion approximation to use. First of all note that for both versions the noise term vanishes for $n\to \mathrm{\infty}$, and thus both have the Wilson–Cowan equation as their limit. And also, neither of them approximates even the first moment of the microscopic models exactly. This means that for neither we have that the mean solves the Wilson–Cowan equation, which would be only the case if f were linear. However, they are close to the mean of the discrete process. We discuss this aspect in Appendix B.
Furthermore, we already observe in the central limit theorem, and thus also in the linear noise and Langevin approximation that the covariance (2.18) or the drift and the structure of the diffusion terms in (2.21) and (2.22), respectively, are independent of objects resulting from the microscopic models. They are defined purely in terms of the macroscopic limit. This observation supports the conjecture that these approximations are independent from possible different microscopic models converging to the same deterministic limit. Analogous statements hold also for derivations from the van Kampen system size expansion [5] and in related limit theorems for reaction diffusion models [4, 17, 18]. The only object reminiscent of the microscopic models in the continuous approximations is the rescaling sequence ${\u03f5}_{n}$. However, the rescaling is proportional to the square root of ${\ell}_{}(n)/{v}_{+}(n)$, i.e., the number of neurons per area divided by the size of the area, which is just the local density of particles. Therefore, in the approximations, the noise scales inversely to the square root of neuron density in this model, which interpreted in this way can also be considered a macroscopic fixed parameter and chosen independently of the approximating sequence.
Remark 2.4 The stochastic partial differential equations (2.21) and (2.22), which we proposed as the linear noise or Langevin approximation, respectively, are not necessarily unique as the representation of the limiting diffusion as a stochastic integral process (2.20) may not be unique. It will be subject for further research efforts to analyse the practical implications and usability of this Langevin approximation. Let Q be a trace class operator, ${({W}_{t}^{Q})}_{t\ge 0}$ be a QWiener process and let $B(\nu (t),t)$ be operators such that $B(\nu (t),t)\circ Q\circ B{(\nu (t),t)}^{\ast}=G(\nu (t),t)\circ {\iota}^{1}$, where ^{∗} denotes the adjoint operator. Then also the stochastic integral process
is a version of the limiting diffusion in (2.3) and the corresponding linear noise and Langevin approximations are given by
and
We conclude this section by presenting one particular choice of a diffusion coefficient and a Wiener process. We take ${({W}_{t}^{Q})}_{t\ge 0}$ to be a cylindrical Wiener process on ${L}^{2}(D)$ with covariance $Q={\mathrm{Id}}_{{L}^{2}}$. Then we can choose $B(t)=j\circ (\cdot \sqrt{g(t)})\in L({L}^{2}(D),{H}^{\alpha}(D))$, where j is the embedding operator ${L}^{2}(D)\hookrightarrow {H}^{\alpha}(D)$ in the sense of (1.2) and $(\cdot \sqrt{g(t)})\in L({L}^{2}(D),{L}^{2}(D))$ denotes a pointwise product of a function in ${L}^{2}(D)$, i.e.,
We first investigate the operator $G(\nu (t),t)\circ {\iota}^{1}$ and write it in more detail as the following composition of operators:
where k is the embedding operator ${H}^{\alpha}(D)\hookrightarrow {L}^{2}(D)$. Next, the Hilbert adjoint ${B}^{\ast}\in L({H}^{\alpha},{L}^{2})$ is given by ${B}^{\ast}=(\cdot \sqrt{g})\circ k\circ {\iota}^{1}$, which is easy to verify. Hence, the stochastic integral of $B(t)$ with respect to ${W}^{Q}$ is again a version of the limiting martingale as
3 Discussion and Extensions
In this article, we have presented limit theorems that connect finite, discrete microscopic models of neural activity to the Wilson–Cowan neural field equation. The results state qualitative connections between the models formulated as precise probabilistic convergence concepts. Thus, the results strengthen the connection derived in a heuristic way from the van Kampen system size expansion.
A general limitation of mathematically precise approaches to approximations, cf. also the propagation to chaos limit theorems in [30], is that the microscopic models are usually defined via the limit. In other words, the limit has to be known a priori, and we look for models which converge to this limit. Thus, in contrast to the van Kampen system size expansion, the presented results are not a stepbystep modelling procedure in the sense that, via a constructive limiting procedure, a microscopic model yields a deterministic or stochastic approximation. Hence, it might be objected that the presented method can only be used a posteriori in order to justify a macroscopic model from a constructed microscopic model and that somehow one has to ‘guess’ the correct limit in advance. Several remarks can be made to answer this objection.
First, this observation is certainly true, but not necessarily a drawback. On the contrary, when both microscopic and macroscopic models are available, then it is rather important to know how these are connected and qualitatively and quantitatively characterise this connection. Concerning neural field models, this precise connection was simply not available so far for the wellestablished Wilson–Cowan model. Furthermore, when starting from a stochastic microscopic description working through proving the conditions for convergence for given microscopic models, one obtains very strong hints on the structure of a possible deterministic limit. Therefore, our results can also ease the procedure of ‘guessing the correct limit’.
Secondly, often a phenomenological, deterministic model, which is an approximation to an inherently probabilistic process is derived from adhoc heuristic arguments. Given that the model has proved useful, one often aims to derive a justification from first principles and/or a stochastic version, which keeps the features of the deterministic model, but also accounts for the formerly neglected fluctuations. A standard, though somewhat simple approach to obtain stochastic versions consists of adding (small) noise to the deterministic equations. This article, provides a second approach which consists of finding microscopic models, which converge to the deterministic limit to obtain a stochastic correction via a central limit argument.
Thirdly and finally, the method also provides an argument for new equations, i.e., the Langevin and linear noise approximations, which can be used to study the stochastic fluctuations in the model. Furthermore, in contrast to previous studies, we do not provide deterministic moment equations but stochastic processes, which can be, e.g., via Monte Carlo simulations, studied concerning a large number of pathwise properties and dynamics beyond first and second moments.
We now conclude this article commenting on the feasibility of our approach connecting microscopic Markov models to deterministic macroscopic equations when dealing with different master equation formulations that appear in the literature. Additionally, the following discussions also relate the model (1.6) considered in this article to other master equation formulations. We conjecture that the analogous results as presented for the Wilson–Cowan equation (1.3) in Sect. 2 also hold for these variations of the master equations. This should be possible to achieve by an adaptation of the methods of proof presented although we have not performed the computations in detail.
3.1 A Variation of the Master Equation Formulation
A first variation of the discrete model we discussed in Sect. 1.2 was considered in the articles [8, 9] and a version restricted to a bounded state space also appears in [31]. This model consists of the master equation stated below in (3.2), which closely resembles (1.6). In the earlier reference [8], the model was introduced with a different interpretation called the effective spike model. We briefly explain this interpretation before presenting the master equation. Instead of interpreting P as the number of neuron populations, in this model, P denotes the number of different neurons in the network located within a spatial domain D. Then ${\Theta}_{t}^{k}$, the state of the k th neuron, counts the number of ‘effective’ spikes this neuron has emitted in the past up till time t. Effective spikes are those spikes that still influence the dynamics of the system, e.g., via a postsynaptic potential. Then state transitions adding/subtracting one effective spike for the k th neuron are governed by a firing rate function ${\tilde{f}}_{k}$, which depends on the input into neuron k, and a decay rate ${\tau}^{1}$. The constant decay rate indicates that emitted spikes are effective for a time interval of length τ and the gain function is defined—neglecting external input—by
where ${f}^{\ast}$ is a certain nonnegative, real function. It is stated clearly in [9] that the function ${f}^{\ast}$ is not equal to the gain function f in the proposed limiting Wilson–Cowan equation (1.3), but rather connected to f such that
The authors in [9] state that for any function f such a function ${f}^{\ast}$ can be found. Then the process ${\Theta}_{t}=({\Theta}_{t}^{1},\dots ,{\Theta}_{t}^{P})$ is a jump Markov process given by the master equation
with boundary conditions $\mathbb{P}[\theta ,t]=0$ if $\theta \notin {\mathbb{N}}_{0}^{P}$ as stated in [9]. The advantage of the effective spike model interpretation over the interpretation as neurons per population is that the unbounded state space of the model is justified. In principle, there can be an arbitrary number of spikes emitted in the past still active. However, a disadvantage of the master equation (3.2) is that for taking the limit it lacks a parameter corresponding to the system size providing a natural small parameter in the van Kampen system size expansion. This explains the shift in the interpretation of the master equation in the study [9] following [8], and subsequently in [5] to the interpretation we presented in Sect. 1.2, which provides the systemsize parameters $l(k)$.
On the level of Markov jump processes, the master equation (3.2) obviously describes dynamics similar to the master equation (1.6) only replacing the activation rate ${\tau}^{1}l(k){\overline{f}}_{k}(\theta )$ in (1.6) by ${\tilde{f}}_{k}(\theta )$ which is independent of the parameter $l(k)$. Thus, the model (3.2) can be understood as resulting from (1.6) after a limit procedure taking $l(k)\to \mathrm{\infty}$ has been applied and the firing rate functions are connected via the formal limit ${lim}_{l(k)\to \mathrm{\infty}}l(k){\overline{f}}_{k}(\theta )={\tilde{f}}_{k}(\theta )$. A qualitative interpretation of this limit procedure connecting the two types of models is given in [8]. This observation motivated the model in [5] stepping back one limit procedure, and thus providing the correct framework for the derivation of limit theorems.
It would be an interesting addition to the limit theorems in Theorem 2.1 to derive a law of large numbers for the models (3.2) with stochastic mean activity ${\nu}^{n}$ as defined in (2.7) and suitable chosen weights ${\tilde{W}}_{kj}$. Clearly, the macroscopic limit should be given by the Wilson–Cowan equation (1.3). We conjecture that the appropriate condition for the function ${f}^{\ast}$ in the present setting—including time dependent inputs—is
such that the higher order terms are uniformly bounded and vanish in the limit $n\to \mathrm{\infty}$, and where the weights ${\overline{W}}_{kj}^{n}$ and inputs ${\overline{I}}_{k,n}(t)$ are defined as in (2.4) and (2.6). Property (3.3) closely resembles condition (3.1) and trivially holds for linear f with ${f}^{\ast}=f$.
3.2 Bounded State Space Master Equations
We have already stated when introducing the microscopic model in Sect. 1.2 that the interpretation of the parameter $l(k)$ as the number of neurons in the k th population is not literally correct. The state space of the process is unbounded, hence arbitrarily many neurons can be active, and thus each population contains arbitrarily many neurons. In order to overcome this interpretation problem, it was supposed to consider the master equation only on a bounded state space. That is, the k th population consists of $l(k)$ neurons, and $0\le {\Theta}_{t}^{k}\le l(k)$ almost surely. Such master equations are simply obtained by setting the transition rates for transition of ${\theta}^{k}$ from $l(k)\to l(k)+1$ to zero.
A first master equation of this form was considered in [22], which in present notation, takes the form
Versions of such a master equation for, e.g., one population only or coupled inhibitory and excitatory populations were considered in [3, 22], and a van Kampen systems size expansion was carried out. Here, the bound in the state space provides a natural parameter for the rescaling, thus a small parameter for the expansion. The setup of this problem resembles closely the structure of excitable membranes for which limits have been obtained with the present technique by one of the present author and coworkers in [27]. Therefore, we conjecture that our limit theorems also apply to this setting with minor adaptations with essentially the same conditions and results as in Sect. 2. However, the macroscopic limit, which will be obtained does not conform with the Wilson–Cowan equation but will be given by
Next, we return to the master equation (1.6) as discussed in this article in Sect. 1.2 and the comment we made regarding bounded state spaces the footnote on page 7. In our primary reference for this model [5], actually a bounded state space version of the master equation was considered where the activation rate for the event ${\theta}^{k}\to {\theta}^{k}+1$ is
replacing $l(k){\overline{f}}_{k}(\theta ,t)$ in (1.6). The van Kampen system size expansion was then applied to this bounded state space master equation, tacitly neglecting possible difficulties, which might arise due to the discontinuity of (3.6) considered as a function on ${\mathbb{R}}^{P}$. However, for the present, mathematically precise limit convergence results considering bounded state space as originally suggested in [5] are problematic. The discontinuous activation rate (3.6) causes the machinery developed in [27], which depends on Lipschitztype estimates to break down. However, we strongly expect that also in this case the law of large numbers with the deterministic limit given by the Wilson–Cowan equation (1.3) holds. Furthermore, also the Langevin approximations should agree with the equations discussed in Sect. 2.3. However, we have not yet been able to prove such a theorem. We further conjecture that the results in this article can be used to prove the convergence for the bounded state space model by a domination argument. Heuristically, it seems clear that a bounded process should be dominated by a process that possesses the same dynamics inside the state space of the bounded process, but can stray out from that bounded domain. Hence, as the limit of the potentially larger process lies within the domain where the two processes agree also the dominated process should converge to the same limit. Mathematically, this line of argument relies on nontrivial estimates between occupation measures of highdimensional Markov processes. This is work in progress.
3.3 Activity Based Neural Field Model
Finally, we return also to a difference in neural field theory mentioned in the beginning. In contrast to ratebased neural field models of the Wilson–Cowan type (1.1), there exists a second essential class of neural field models, socalled activity based models, the prototype of which is the Amari equation
We conjecture that also for this type of equations a phenomenological microscopic model can be constructed with a suitable adaptation of the activation rates and that limit theorems analogous to the results in Sect. 2.1 hold. Then also a Langevin equation for this model can be obtained and used for further analysis.
4 Proofs of the Main Results
In this section, we present the proofs of the limit theorems. For the convenience of the reader, as it is important tool in the subsequent proofs, we first state the Poincaré inequality. Let $D\subset {\mathbb{R}}^{d}$ be a convex domain, then it holds for any function $\varphi \in {H}^{1}(D)$ that
where ${\overline{\varphi}}_{D}$ is the mean value of the function ϕ on the domain D, i.e.,
Moreover, the constant in the righthand side of (4.1) is the optimal constant depending only on the diameter of the domain D, cf. [1, 23]. Whenever we omit to denote the spatial domain for definition of norms or inner products in ${L}^{2}(D)$ or Sobolev spaces ${H}^{\alpha}(D)$, then it is to be interpreted as the norm over the whole domain D. If the norm is taken only over a subset ${D}_{k,n}$, then this is always indicated unexceptionally.
For the benefit of the reader, we next repeat the limiting equation
We denote by F the Nemytzkii operator on ${L}^{2}(D)$ defined by
and for all $\theta \in {\mathbb{N}}_{0}^{P}$ we define a discrete version of the Nemytzkii operator via
Note that ${\tau}^{1}{(\varphi ,{\nu}^{n}(\theta ))}_{{L}^{2}}+{\tau}^{1}{(\varphi ,{\overline{F}}^{n}({\nu}^{n}(\theta ),t))}_{{L}^{2}}$ for $\varphi \in {L}^{2}(D)$ corresponds to the generator of ${({\Theta}_{t}^{n},t)}_{t\ge 0}$ applied to the function $(\theta ,t)\mapsto {(\varphi ,{\nu}^{n}(\theta ))}_{{L}^{2}}$.
Finally, another useful property is that the means of the process’ components are bounded. For each k, n it holds that
see also (B.1). Therefore, it holds that $\mathbb{E}{\Theta}_{t}^{k,n}\le {m}_{t}^{k,n}$, where ${m}_{t}^{k,n}$ solves the deterministic initial value problem
i.e.,
Here, we also used the assumption ${\mathbb{E}}^{n}{\Theta}_{0}^{k,n}\le l(k,n)$ on the initial condition.
4.1 Proof of Theorem 2.1 (Law of Large Numbers)
In order to prove the law of large numbers, Theorem 2.1, we apply the law of large numbers for Hilbert space valued PDMPs, see Theorem 4.1 in [27], to the sequence of homogeneous PDMPs ${({Y}_{t}^{n})}_{t\ge 0}={({\Theta}_{t}^{n},t)}_{t\ge 0}$. For the application of this theorem, recall that the first, piecewise constant, vectorvalued component of this process counts the number of active neurons in each subpopulation and the second, deterministic component states time. The process ${({Y}_{t}^{n})}_{t\ge 0}$ is the usual ‘spacetime process’, i.e., homogeneous Markov process which is obtained via a statespace extension to obtain a homogeneous Markov process from the inhomogeneous process ${({\Theta}_{t}^{n})}_{t\ge 0}$. The continuous component satisfies the simple ODE $\dot{t}=1$, $t(0)=0$, and thus the full process is a PDMP. In the terminology of [27], the sequence of coordinate functions on the different state spaces of the PDMPs ${({Y}_{t}^{n})}_{t\ge 0}$ into a common Hilbert space is given by the maps ${\nu}^{n}$ (2.7) with the common Hilbert space ${L}^{2}(D)$. Thus, in order to infer convergence in probability (2.10) from Theorem 4.1 in [27], it is sufficient to validate the following conditions:
(LLN1) For fixed $T>0$, it holds that
(LLN2) The Nemytzkii operator F satisfies a Lipschitz condition in ${L}^{2}(D)$ uniformly with respect to t, $t\ge 0$, i.e., there exists a constant ${L}_{0}>0$ such that
(LLN3) For fixed $T>0$, it holds that
Note that the final condition of Theorem 4.1 in [27], i.e., the convergence of the initial conditions, is satisfied by assumption. For a discussion of these conditions, we refer to [27] and proceed to their derivation for the present model in the subsequent parts (a) to (c).

(a)
In order to prove condition (4.7), we write the integral with respect to the discrete probability measure ${\mu}^{n}$ as a sum. This yields
$$\begin{array}{r}{\mathbb{E}}^{n}\lambda \left({Y}_{t}^{n}\right){\int}_{{\mathbb{N}}^{P}}{\parallel {\nu}^{n}(\xi ){\nu}^{n}\left({\Theta}_{t}^{n}\right)\parallel}_{{L}^{2}}^{2}{\mu}^{n}({Y}_{t}^{n},\mathrm{d}\xi )\\ \phantom{\rule{1em}{0ex}}=\frac{1}{\tau}\sum _{k=1}^{P}{\mathbb{E}}^{n}\frac{1}{l{(k,n)}^{2}}({\Theta}_{t}^{k,n}+l(k,n){\overline{f}}_{k,n}\left({Y}_{t}^{n}\right)){D}_{k,n}\\ \phantom{\rule{1em}{0ex}}\le \frac{1}{\tau}\frac{1+2{\parallel f\parallel}_{0}}{{\ell}_{}(n)}D,\end{array}$$(4.10)
where we have used the upper bound (4.6) on the expectation ${\mathbb{E}}^{n}{\Theta}_{t}^{k,n}$ and the assumption on the initial conditions. Next, integrating over $[0,T]$ and employing the assumption ${lim}_{n\to \mathrm{\infty}}{\ell}_{}(n)=\mathrm{\infty}$ in (2.9) establishes condition (4.7).

(b)
The Lipschitz condition (4.8) of the Nemytzkii operators is a straightforward consequence of the Lipschitz continuity (1.4) of the gain function f as
$$\begin{array}{r}{\parallel F({g}_{1},t)F({g}_{2},t)\parallel}_{{L}^{2}}^{2}\\ \phantom{\rule{1em}{0ex}}={\int}_{D}f({\int}_{D}w(x,y){g}_{1}(y)\phantom{\rule{0.2em}{0ex}}\mathrm{d}y+I(x,t))\\ \phantom{\rule{2em}{0ex}}f({\int}_{D}w(x,y){g}_{2}(y)\phantom{\rule{0.2em}{0ex}}\mathrm{d}y+I(x,t)){}^{2}\phantom{\rule{0.2em}{0ex}}\mathrm{d}x\\ \phantom{\rule{1em}{0ex}}\le {L}^{2}{\int}_{D}{{\int}_{D}w(x,y)({g}_{1}(y){g}_{2}(y))\phantom{\rule{0.2em}{0ex}}\mathrm{d}y}^{2}\phantom{\rule{0.2em}{0ex}}\mathrm{d}x\\ \phantom{\rule{1em}{0ex}}\le {L}^{2}{\int}_{D}{\parallel w(x,\cdot )\parallel}_{{L}^{2}}^{2}{\parallel {g}_{1}{g}_{2}\parallel}_{{L}^{2}}^{2}\phantom{\rule{0.2em}{0ex}}\mathrm{d}x\\ \phantom{\rule{1em}{0ex}}={L}^{2}{\parallel w\parallel}_{{L}^{2}\times {L}^{2}}^{2}{\parallel {g}_{1}{g}_{2}\parallel}_{{L}^{2}}^{2}.\end{array}$$
Therefore, (4.8) holds with Lipschitz constant ${L}_{0}:=L{\parallel w\parallel}_{{L}^{2}\times {L}^{2}}$.

(c)
Finally, we prove the convergence of the generators (4.9). To this end, we employ the characterisation of the norm in ${L}^{2}(D)$ by ${\parallel \eta \parallel}_{{L}^{2}}={sup}_{{\parallel \varphi \parallel}_{{L}^{2}}=1}{(\varphi ,\eta )}_{{L}^{2}}$ for all $\eta \in {L}^{2}(D)$, and thus consider first the scalar product of elements $\varphi \in {L}^{2}(D)$ with ${\parallel \varphi \parallel}_{{L}^{2}}=1$ and the difference inside the norm in (4.9). On the one hand, we obtain using definition (4.5) that
$${(\varphi ,{\overline{F}}^{n}({\nu}_{t}^{n},t))}_{{L}^{2}}={(\varphi ,\sum _{k=1}^{P}{\overline{f}}_{k,n}\left({Y}_{t}^{n}\right){\mathbb{I}}_{{D}_{k,n}})}_{{L}^{2}}.$$(4.11)
Next, we apply the Nemytzkii operator F defined in (4.4) to ${\nu}^{n}(t)$ and take the inner product of the result with respect to ϕ to obtain on the other hand
Subtracting (4.12) from (4.11), we obtain the integrated difference
We proceed to estimate the norm of the term in the righthand side. We use the Lipschitz condition (1.4) on f, the triangle inequality, and finally the Cauchy–Schwarz inequality on the resulting second term to obtain the estimate
Here, the term in the righthand side marked $(\ast \ast )$ is further estimated using the Cauchy–Schwarz inequality and the Poincaré inequality (4.1), which yields
We now consider the term marked $(\ast )$. Inserting the definition of ${\overline{W}}_{kj}^{n}$ given in (2.4), the reordering of the summations and changing the order of integration yields
We next apply the Cauchy–Schwarz inequality to the integral inside the square brackets in the last term. Thus, we obtain the estimate
Now the Poincaré inequality (4.1) is applied to the innermost integral inside the square brackets, which yields
Finally, using once more the Cauchy–Schwarz inequality on the innermost summation we obtain
Now, a combination of the estimates (4.13) and (4.14) on the terms $(\ast )$ and $(\ast \ast )$ yields
Here, the righthand side is independent of ϕ, hence taking the supremum over all ϕ with ${\parallel \varphi \parallel}_{{L}^{2}}=1$ yields
Finally, integrating over $(0,T)$ and taking the expectation on both sides results in
Here, we have used (4.6) and a combination of the Cauchy–Schwarz and Poincaré inequality (4.1) in order to estimate
The upper bound in (4.15) is of order $\mathcal{O}({\delta}_{+}(n))$ and, therefore, converges to zero for $n\to \mathrm{\infty}$ due to assumption (2.9). Hence, condition (4.9) is satisfied. The proof of the convergence in probability (2.10) is completed.
It is now easy to extend this result to the convergence in the r th mean. First of all, the convergence in probability (2.10) implies for all $r\ge 1$ the convergence in probability of the random variables ${sup}_{t\in [0,T]}{\parallel {\nu}_{t}^{n}\nu (t)\parallel}_{{L}^{2}}^{r}$ to zero. As convergence in the mean of real valued random variables is equivalent to convergence in probability and uniform integrability it remains to prove the latter for the families ${sup}_{t\in [0,T]}{\parallel {\nu}_{t}^{n}\nu (t)\parallel}_{{L}^{2}}^{r}$, $n\in \mathbb{N}$.
We first consider the case $r=1$, and establish a uniform bound on the second moments ${\mathbb{E}}^{n}{sup}_{t\in [0,T]}{\parallel {\nu}_{t}^{n}\nu (t)\parallel}_{{L}^{2}}^{2}$. Then the de la Vallée–Poussin theorem, cf. App., Proposition 2.2 in [15], implies that the random variables ${sup}_{t\in [0,T]}{\parallel {\nu}_{t}^{n}\nu (t)\parallel}_{{L}^{2}}$, $n\in \mathbb{N}$, are uniformly integrable.
Without loss of generality, we can assume that there exist^{e} Poisson processes ${({N}_{t}^{k,n})}_{t\ge 0}$ with rates ${\Lambda}_{k,n}=l(k,n)(1+{\parallel f\parallel}_{0})/\tau $, which dominate ${({\Theta}_{t}^{k,n}{\Theta}_{0}^{k,n})}_{t\ge 0}$ pathwise. Then we obtain almost surely
Here, the righthand side is independent of $t\le T$, and thus we obtain
where we have used that ${N}_{T}^{k,n}$ is Poisson distributed with rate $T{\Lambda}_{k,n}$, and thus ${\mathbb{E}}^{n}{({N}_{T}^{k,n})}^{2}=T{\Lambda}_{k,n}+{T}^{2}{\Lambda}_{k,n}^{2}$. Here, ${C}_{T}$ is some finite constant which depends on T and the overall parameters of the model, i.e., τ, f, D, but is independent of k and n. Using this upper bound, the triangle inequality yields the estimate
Therefore, using the assumption ${sup}_{n\in \mathbb{N}}{\mathbb{E}}^{n}{\parallel {\nu}_{0}^{n}\parallel}_{{L}^{2}}^{2}<\mathrm{\infty}$ it holds that
The general case for $r>1$ works analogously. Note that the r th moment of the Poisson distribution is proportional to the r th power of its rate. Hence, just as in the case of $r=1$, the term
can thus be bounded from above by some constant ${C}_{T}$ independent of k and n. The proof of Theorem 2.1 is completed.
4.2 Proof of Corollary 2.1 (Corollary to the Law of Large Numbers)
For $\alpha =0$, the statement of the corollary coincides with the statement of Theorem 2.1, hence we consider $\alpha >0$. As in the proof of Theorem 2.1, we apply Theorem 4.1 in [27] to the PDMPs ${({Y}_{t}^{n})}_{t\ge 0}$, however, this time for the functions ${\nu}^{n}$ understood as taking values in the Hilbert space ${H}^{\alpha}(D)$ instead of ${L}^{2}(D)$. Thus, we have to validate again conditions (LLN1)–(LLN3) wherein the norm in ${L}^{2}(D)$ is always replaced by the norm in ${H}^{\alpha}(D)$. The essential argument is sharpening the estimates in part (a) of the proof of Theorem 2.1 using optimal Sobolev embedding theorems such that the conditions (2.13) imply (LLN1). This we present in part (a) of the proof below. The Lipschitz condition (LLN2) of the Nemytzkii operator F in the spaces ${H}^{\alpha}$ is established in part (b). Finally, as the condition ${\delta}_{+}(n)\to \mathrm{\infty}$ remains as in Theorem 2.1, the condition (LLN3) follows immediately from the proof of Theorem 2.1 due to the continuous embedding of ${L}^{2}(D)$ into ${H}^{\alpha}(D)$.

(a)
In the case $\alpha =0$, i.e., ${H}^{\alpha}(D)={L}^{2}(D)$, we used in (4.10) that ${\parallel {\mathbb{I}}_{{D}_{k,n}}\parallel}_{{L}^{2}}^{2}={D}_{k,n}$. For general $\alpha >0$, we use the representation
$${\parallel {\mathbb{I}}_{{D}_{k,n}}\parallel}_{{H}^{\alpha}}=\underset{{\parallel \varphi \parallel}_{{H}^{\alpha}}}{sup}\left{(\varphi ,{\mathbb{I}}_{{D}_{k,n}})}_{{L}^{2}}\right.$$
In order to estimate the terms inside the supremum in the righthand side, we use Hölder’s inequality and the Sobolev embedding theorem, i.e., ${H}^{\alpha}(D)\hookrightarrow {L}^{\mathrm{\infty}}(D)$ for $\alpha >d/2$ and ${H}^{\alpha}(D)\hookrightarrow {L}^{r}(D)$ with $r=d/(d/2\alpha )$ for $0<\alpha <d/2$, see Theorem 7.34 and Corollary 7.17 in [2]. Thus, we obtain
where the constants K are the constants arising from the continuous embeddings of the Sobolev spaces into the Lebesgue spaces. Evaluating the norms in the righthand side, and further estimating using the maximal Lebesgue measure of the elements of the partition yields
Note that the upper bounds are consistent with the condition in Theorem 2.1 for $\alpha =0$. Finally, as ${H}^{d/2}(D)\hookrightarrow {H}^{(d/2\u03f5)}(D)$ for all small ϵ, the result for $\alpha =d/2$ follows from the result above as
where C is the constant resulting from the continuous embedding of ${H}^{d/2}(D)$ into ${H}^{d/2\u03f5}(D)$. Thus, we obtain for all $\u03f5>0$ the estimate

(b)
Next, we have to establish that the Nemytzkii operator F on ${L}^{2}(D)$ is also Lipschitz continuous with respect to the norms ${\parallel \cdot \parallel}_{{H}^{\alpha}}$, $\alpha \ge 0$, i.e., for all $\alpha \ge 0$ there exists a constant ${L}_{\alpha}$ such that
$${\parallel F({g}_{1},t)F({g}_{2},t)\parallel}_{{H}^{\alpha}}\le {L}_{\alpha}{\parallel {g}_{1}{g}_{2}\parallel}_{{H}^{\alpha}}\phantom{\rule{1em}{0ex}}\mathrm{\forall}t\ge 0,{g}_{1},{g}_{2}\in {L}^{2}(D).$$(4.16)
We obtain due to the Lipschitz continuity of f, which implies absolute continuity of f, that
where
Applying Hölder’s inequality and the essential boundedness of the derivative ${f}^{\prime}$, we obtain the estimate
Next, as by assumption $w(x,\cdot )\in {H}^{\alpha}(D)$, we obtain
Overall, this yields the estimate
Hence, taking the supremum on both sides of this inequality over all ${\parallel \varphi \parallel}_{{H}^{\alpha}}=1$, we obtain the Lipschitz condition (4.16) with ${L}_{\alpha}:=L{K}_{\alpha}{\parallel w\parallel}_{{L}^{q}\times {H}^{\alpha}}$, where ${K}_{\alpha}$ is the constant resulting from the continuous embedding of ${H}^{\alpha}(D)$ into ${L}^{p}(D)$ and the Lipschitz constant L of f satisfies $L\ge {\parallel {f}^{\prime}\parallel}_{{L}^{\mathrm{\infty}}}$.
4.3 Proof of Theorem 2.2 (Infinite Time Convergence)

(a)
We first present an alternative representation for the jump processes ${({\Theta}_{t}^{n})}_{t\ge 0}$ and the solution ν of the Wilson–Cowan equation (1.3). Using the generator of the PDMP ${({\Theta}_{t}^{n},t)}_{t\ge 0}$, we obtain that the components ${\Theta}^{k,n}$ satisfy
$$\begin{array}{rcl}{\Theta}_{t}^{k,n}& =& {\Theta}_{0}^{k,n}+{\int}_{0}^{t}{\lambda}^{n}({\Theta}_{s}^{n},s){\int}_{{\mathbb{N}}^{p}}({\xi}^{k}{\Theta}_{s}^{k,n}){\mu}^{n}({\Theta}_{s}^{n},s;\mathrm{d}\xi )\phantom{\rule{0.2em}{0ex}}\mathrm{d}s+{M}_{t}^{k,n}\\ =& {\Theta}_{0}^{k,n}+{\int}_{0}^{t}(\frac{1}{\tau}{\Theta}_{s}^{k,n}+\frac{1}{\tau}l(k,n){\overline{f}}_{k,n}({\Theta}_{s}^{n},s))\phantom{\rule{0.2em}{0ex}}\mathrm{d}s+{M}_{t}^{k,n},\end{array}$$(4.17)
where ${({M}_{t}^{k,n})}_{t\ge 0}$ is a squareintegrable càdlàg martingale given by
As the jump process is regular, this martingale is almost surely of finite variation and it could also be written in terms of a stochastic integral with respect to the associated martingale measure of the PDMP [16]. Next, interpreting ${\Theta}^{k,n}$ as the solution of the stochastic evolution equation (4.17) driven by the martingale ${M}^{k,n}$, it follows from the variation of constants formula that it satisfies
This formula can also be easily verified pathbypath by inserting (4.19) into (4.17) and using integration by parts. Note that here the stochastic integral with respect to the martingale is just a Riemann–Stieltjes integral as the martingale is of finite variation. For the sake of completeness, we briefly sketch the arguments. Thus, inserting (4.19) into (4.17) yields
Considering the three terms marked $(\ast )\text{\u2013}(\ast \ast \ast )$ separately, we show that this righthand side equals (4.19). For the first term $(\ast )$ simply evaluating the integral yields
which gives the first term in the righthand side of (4.19). Next, we simplify the term $(\ast \ast )$ employing integration by parts to the first term in $(\ast \ast )$, which yields
Thus, we obtain subtracting from this righthand side the second term in $(\ast \ast )$ that
This term is just the second term in the righthand side of (4.19). It remains to consider the term marked $(\ast \ast \ast )$. We have already stated that the stochastic integral with respect to the martingale (4.18) is defined pathbypath as a Riemann–Stieltjes integral, and thus satisfies
where ${\tau}_{j}^{n}$ denotes the j th jump time of the n th PDMP. Integrating the sum in this righthand side over $(0,t)$ yields
Next, we apply integration by parts to the integral over $(0,t)$ of the second term above analogously to the application to term $(\ast \ast )$, and obtain
Hence, overall these considerations show that
and we obtain the final, third term in the righthand side of (4.19). This completes the proof that (4.19) solves Eq. (4.17).
Further, we obtain from the variation of constants formula for ${\Theta}_{t}^{k,n}$ also a representation for the stochastic mean activity ${\nu}^{n}$ by inserting (4.19) into its definition (2.7). This gives
Finally, in order to compare stochastic and deterministic solutions we use that the solution of the Wilson–Cowan equation can also be given via the variation of constants formula, i.e., it holds that for all $t\ge 0$
Thus, subtracting (4.22) from (4.21), and taking the expectation of the norm in ${H}^{\alpha}(D)$ yields the estimate