In this section, we present the precise formulations of the limit theorems. To this end, we first define a suitable sequence of microscopic models, which gives the connection between the defining objects of the Wilson–Cowan equation (1.3) and the microscopic models discussed in Sect. 1.2. Thus, ${({Y}_{t}^{n})}_{t\ge 0}={({\Theta}_{t}^{n},t)}_{t\ge 0}$, $n\in \mathbb{N}$, denotes a sequence of microscopic PDMP neural field models of the type as defined in Sect. 1.3. Each process ${({Y}_{t}^{n})}_{t\ge 0}$ is defined on a filtered probability space $({\Omega}^{n},{\mathcal{F}}^{n},{({\mathcal{F}}_{t}^{n})}_{t\ge 0},{\mathbb{P}}^{n})$, which satisfies the usual conditions. Hence, the defining objects for the jump models are now dependent on an additional index *n*. That is $P(n)$ denotes the number of neuron populations in the *n* th model, $l(k,n)$ is the number of neurons in the *k* th population of the *n* th model and analogously we use the notations ${\overline{W}}_{kj}^{n}$ and ${\overline{I}}_{k,n}$ and ${\overline{f}}_{k,n}$. However, we note from the beginning that the decay rate ${\tau}^{-1}$ is independent of *n* and *τ* is the time constant in the Wilson–Cowan equation (1.3). In the following paragraphs, we discuss the connection of the defining components of this sequence of microscopic models to the components of the macroscopic limit.

*Connection to the Spatial Domain D* A key step of connecting the microscopic models to the solution of Eq. (1.3) is that we need to put the individual neuron populations into relation to the spatial domain *D* the solution of (1.3) lives on. To this end, we assume that each population is located within a sub-domain of *D* and that the sub-domains of the individual populations are non-overlapping. Hence, for each $n\in \mathbb{N}$, we obtain a collection ${\mathcal{D}}_{n}$ of $P(n)$ non-overlapping sub-sets of *D* denoted by ${D}_{1,n},\dots ,{D}_{P(n),n}$. We assume that each subdomain is measurable and convex. The convexity of the sub-domains is a technical condition that allows us to apply Poincaré’s inequality, cf. (4.1). We do not think that this condition is too restrictive as most reasonable partition domains, e.g., cubes, triangles, are convex. Furthermore, for all reasonable domains *D*, e.g., all Jordan measurable domains, a sequence of convex partitions can be found such that additionally the conditions imposed in the limit theorems below are also satisfied. One may think of obtaining the collection ${\mathcal{D}}_{n}$ by partitioning the domain into $P(n)$ convex sub-domains ${D}_{1,n},\dots ,{D}_{P(n),n}$ and confining each neuron population to one sub-domain. However, it is not required that the union of the sets in ${\mathcal{D}}_{n}$ amounts to the full domain *D* nor that the partitions consists of refinements. Necessary conditions on the limiting behaviour of the sub-domains are very strongly connected to the convergence of initial conditions of the models, which is a condition in the limit theorems; see below. For the sake of terminological simplicity, we refer to ${\mathcal{D}}_{n}$ simply as the partitions.

We now define some notation for parameters characterising the partitions

${\mathcal{D}}_{n}$: the minimum and maximum Lebesgue measure, i.e., length, area, or volume depending on the spatial dimension, is denoted by

${v}_{-}(n):=\underset{k=1,\dots ,P(n)}{min}|{D}_{k,n}|,\phantom{\rule{2em}{0ex}}{v}_{+}(n):=\underset{k=1,\dots ,P(n)}{max}|{D}_{k,n}|,$

(2.1)

and the maximum diameter of the partition is denoted by

${\delta}_{+}(n):=\underset{1,\dots ,P(n)}{max}diam({D}_{k,n}),$

(2.2)

where the diameter of a set

${D}_{k,n}$ is defined as

$diam({D}_{k,n}):={sup}_{x,y\in {D}_{k,n}}|x-y|$. In the special case of domains obtained by unions of cubes with edge length

${n}^{-1}$, it obviously holds that

${v}_{\pm}(n)={n}^{-d}$ and

${\delta}_{+}(n)=\sqrt{d}{n}^{-1}$. It is a necessary condition in all the subsequent limit theorems that

${lim}_{n\to \mathrm{\infty}}{\delta}_{+}(n)=0$. This condition implies on the one hand that

${lim}_{n\to \mathrm{\infty}}{v}_{+}(n)=0$ as the Lebesgue measure of a set is bounded in terms of its diameter, and on the other hand—at least in all but degenerate cases due to the necessary convergence of initial conditions that

${lim}_{n\to \mathrm{\infty}}P(n)=\mathrm{\infty}$. That is, in order to obtain a limit the sequence of partitions usually consists of ever finer sets and the number of populations diverges. Finally, each domain

${D}_{k,n}$ of the partition

${\mathcal{D}}_{n}$ contains one neuron population ‘consisting’ of

$l(k,n)\in \mathbb{N}$ neurons. Then we denote by

${\ell}_{\pm}(n)$ the maximum and minimum number of neurons in populations corresponding to the

*n* th model, i.e.,

${\ell}_{-}(n):=\underset{k=1,\dots ,P(n)}{min}l(k,n),\phantom{\rule{2em}{0ex}}{\ell}_{+}(n):=\underset{k=1,\dots ,P(n)}{max}l(k,n).$

(2.3)

*Connection to the Weight Function w* We assume that there exists a function

$w:D\times D\to \mathbb{R}$ such that the connection to the discrete weights is given by

${\overline{W}}_{kj}^{n}:=\frac{1}{|{D}_{k,n}|}{\int}_{{D}_{k,n}}({\int}_{{D}_{j,n}}w(x,y)\phantom{\rule{0.2em}{0ex}}\mathrm{d}y)\phantom{\rule{0.2em}{0ex}}\mathrm{d}x,$

(2.4)

where

*w* is the same function as in the Wilson–Cowan equation (

1.3). For the definition of activation rate at time

*t*, we thus obtain

${\overline{f}}_{k,n}({\theta}^{n},t):=f(\sum _{j=1}^{P}{\overline{W}}_{kj}^{n}\frac{{\theta}^{j,n}}{l(j,n)}+{\overline{I}}_{k,n}(t)).$

(2.5)

As already highlighted by Bressloff [

5], the transition rates are not uniquely defined by the requirement that a possible limit to the microscopic models is given by the Wilson–Cowan equation (

1.3). If in (2.5), the definition of the transition rates is changed to

${\overline{f}}_{k,n}({\theta}^{n},t):={f}^{n}(\sum _{j=1}^{P}{\overline{W}}_{kj}^{n}\frac{{\theta}^{j,n}}{l(j,n)}+{\overline{I}}_{k,n}(t)),$

where ${f}^{n}$, $n\in \mathbb{N}$, is a sequence of functions converging uniformly to *f*, then all limit theorems remain valid. The proof can be carried out as presented adding and subtracting the appropriate term where the additional difference term vanishes due to ${sup}_{x\in \mathbb{R}}|{f}^{n}(x)-f(x)|\to 0$ for $n\to \mathrm{\infty}$. Hence, any microscopic model with gain rates ${\overline{f}}_{k,n}$ of such a form reduces to the same Wilson–Cowan equation in the limit. Clearly, the same applies analogously to the decay rate *τ*, the weights *w*, and the input *I*.

*Connection to the Input Current I*

The external input which is applied to neurons in a certain population is obtained by spatially averaging a space-time input over the sub-domain that population is located in, i.e.,

${\overline{I}}_{k,n}(t):=\frac{1}{|{D}_{k,n}|}{\int}_{{D}_{k,n}}I(t,x)\phantom{\rule{0.2em}{0ex}}\mathrm{d}x.$

(2.6)

This completes the definition of the Markov jump processes

${({\Theta}_{t}^{n},t)}_{t\ge 0}$. For the sake of completeness, we repeat the definition of the total jump rate

${\lambda}^{n}({\theta}^{n},t):=\frac{1}{\tau}\sum _{k=1}^{P}({\theta}^{k,n}+l(k,n){\overline{f}}_{k,n}({\theta}^{n},t))$

and the transition measure

${\mu}^{n}$ is defined by

$\begin{array}{rcl}{\mu}^{n}(({\theta}^{n},t),\{{\theta}^{n}-{e}_{k}\})& :=& \frac{1}{\tau}\frac{{\theta}^{k,n}}{{\lambda}^{n}({\theta}^{n},t)},\\ {\mu}^{n}(({\theta}^{n},t),\{{\theta}^{n}+{e}_{k}\})& :=& \frac{1}{\tau}\frac{l(k,n){\overline{f}}_{k,n}({\theta}^{n},t)}{{\lambda}^{n}({\theta}^{n},t)}\end{array}$

for all $k=1,\dots ,P(n)$.

*Connection to the Solution ν* As functions of time, the paths of the PDMP

${({\Theta}_{t}^{n},t)}_{t\ge 0}$ and the solution

*ν* live on different state spaces. The former takes values in

${\mathbb{N}}_{0}^{P}\times {\mathbb{R}}_{+}$ and the latter in

${L}^{2}(D)$. Thus, in order to compare these two, we have to introduce a mapping that maps the stochastic process onto

${L}^{2}(D)$. In [

27], the authors called such a mapping a

*coordinate function*, which is also the terminology used in [

13]. In fact, the limit theorems we subsequently present actually are for the processes we obtain from the composition of the coordinate functions with the PDMPs. Here, it is important to note that for each

$n\in \mathbb{N}$ the coordinate functions may—and usually do—differ, however, they project the process into the common space

${L}^{2}(D)$. For the mean field models, we define the coordinate functions for all

$n\in \mathbb{N}$ by

${\nu}^{n}:{\mathbb{N}}_{0}^{P}\to {L}^{2}(D):{\theta}^{n}\mapsto \sum _{k=1}^{P}\frac{{\theta}^{k,n}}{l(k,n)}{\mathbb{I}}_{{D}_{k,n}}.$

(2.7)

Clearly, each ${\nu}^{n}$ is a measurable map into ${L}^{2}(D)$. For the composition of ${\nu}^{n}$ with the stochastic process ${({\Theta}_{t}^{n},t)}_{t\ge 0}$, we also use the abbreviation ${\nu}_{t}^{n}:={\nu}^{n}({\Theta}_{t}^{n})$, and hence the resulting stochastic process ${({\nu}_{t}^{n})}_{t\ge 0}$ is an adapted càdlàg process taking values in ${L}^{2}(D)$. This process thus states the activity at a location $x\in D$ as the fraction of active neurons in the population, which is located around this location.

*Connection of the Initial Conditions*

One condition in the subsequent limit theorems is the convergence of initial conditions in probability, i.e., the assumption that

$\underset{n\to \mathrm{\infty}}{lim}{\mathbb{P}}^{n}[{\parallel {\nu}^{n}\left({\Theta}_{0}^{n}\right)-{\nu}_{0}\parallel}_{{L}^{2}(D)}>\u03f5]=0\phantom{\rule{1em}{0ex}}\mathrm{\forall}\u03f5>0.$

(2.8)

It is easy to see that such a sequence of initial conditions

${\Theta}_{0}^{n}$,

$n\in \mathbb{N}$, can be found for any deterministic initial condition

${\nu}_{0}$ under some reasonable conditions on the domain

*D* and the sequence of partitions

${\mathcal{D}}_{n}$. Hence, the assumption (2.8) can always be satisfied. For example, we may define such a sequence of initial conditions by

${\Theta}_{0}^{k,n}=\underset{i=1,\dots ,l(k,n)}{argmin}|\frac{i}{l(k,n)}-\frac{1}{|{D}_{k,n}|}{\int}_{{D}_{k,n}}\nu (0,x)\phantom{\rule{0.2em}{0ex}}\mathrm{d}x|.$

Next, assuming that partitions fill the whole domain *D* for $n\to \mathrm{\infty}$, i.e., ${lim}_{n\to \mathrm{\infty}}|D\mathrm{\setminus}{\bigcup}_{k=1}^{P(n)}{D}_{k,n}|=0$, and that the maximal diameter of the sets decreases to zero, i.e., ${lim}_{n\to \mathrm{\infty}}{\delta}_{+}(n)=0$, it is easy to see using the Poincaré inequality (4.1) that the above definition of the initial condition implies that ${\parallel {\nu}_{0}^{n}-\nu (0)\parallel}_{{L}^{2}(D)}\to 0$ and ${sup}_{n\in \mathbb{N}}{\parallel {\nu}_{0}^{n}\parallel}_{{L}^{2}(D)}^{2r}<\mathrm{\infty}$ for all $r\ge 1$. Then (2.8) holds trivially as the initial condition is deterministic and converges. A simple non-degenerate sequence of initial conditions is obtained by choosing random initial conditions with the above value as their mean and sufficiently fast decreasing fluctuations. Furthermore, a sequence of partitions, which satisfy the above conditions also exists for a large class of reasonable domains *D*. Assume that *D* is Jordan measurable, i.e., a bounded domain such that the boundary is a Lebesgue null set, and let ${\mathcal{C}}_{n}$ be the smallest grid of cubes with edge length $1/n$ covering *D*. We define ${\mathcal{D}}_{n}$ to be the set of all cubes, which are fully in *D*. As *D* is Jordan measurable, these partitions fill up *D* from inside and ${\delta}_{+}(n)\to 0$. For a more detailed discussion of these aspects, we refer to [26].

In the remainder of this section, we now collect the main results of this article. We start with the law of large numbers, which establishes the connection to the deterministic mean field equation, and then proceed to central limit theorems which provide the basis for a Langevin approximation. The proofs of the results are deferred to Sect. 4.

### 2.1 A Law of Large Numbers

The first law of large numbers takes the following form. Note that the assumptions imply that the number of neuron populations diverges.

**Theorem 2.1** (Law of large numbers)

*Let* $w\in {L}^{2}(D)\times {L}^{2}(D)$ *and* $I\in {L}_{\mathrm{loc}}^{2}({\mathbb{R}}_{+},{H}^{1}(D))$.

*Assume that the sequence of initial conditions converges to* $\nu (0)$ *in probability in the space* ${L}^{2}(D)$,

*i*.

*e*., (2.8)

*holds*,

*that* ${\mathbb{E}}^{n}{\Theta}_{0}^{k,n}\le l(k,n)$,

*and that* $\underset{n\to \mathrm{\infty}}{lim}{\delta}_{+}(n)=0,\phantom{\rule{2em}{0ex}}\underset{n\to \mathrm{\infty}}{lim}{\ell}_{-}(n)=\mathrm{\infty}$

(2.9)

*holds*.

*Then it follows that the sequence of* ${L}^{2}(D)$-

*valued jump*-

*process* ${({\nu}_{t}^{n})}_{t\ge 0}$ *converges uniformly on compact time intervals in probability to the solution* *ν* *of the Wilson–Cowan equation* (1.3),

*i*.

*e*.,

*for all* $T,\u03f5>0$ *it holds that* $\underset{n\to \mathrm{\infty}}{lim}{\mathbb{P}}^{n}[\underset{t\in [0,T]}{sup}{\parallel {\nu}_{t}^{n}-\nu (t)\parallel}_{{L}^{2}(D)}>\u03f5]=0.$

(2.10)

*Moreover*,

*if for* $r\ge 1$ *the initial conditions satisfy in addition* ${sup}_{n\in \mathbb{N}}{\mathbb{E}}^{n}{\parallel {\nu}_{0}^{n}\parallel}_{{L}^{2}(D)}^{2r}<\mathrm{\infty}$,

*then convergence in the* *rth mean holds*,

*i*.

*e*.,

*for all* $T>0$ $\underset{n\to \mathrm{\infty}}{lim}{\mathbb{E}}^{n}\underset{t\in [0,T]}{sup}{\parallel {\nu}_{t}^{n}-\nu (t)\parallel}_{{L}^{2}(D)}^{r}=0.$

(2.11)

*Remark 2.1* The norm of the uniform convergence ${sup}_{t\in [0,T]}{\parallel \cdot \parallel}_{{L}^{2}(D)}$, which we used in Theorem 2.1 is a very strong norm on the space of ${L}^{2}(D)$-valued càdlàg functions on $[0,T]$. Hence, due to continuous embeddings, the result immediately extends to weaker norms, e.g., the norms ${L}^{p}((0,T),{L}^{2}(D))$ for all $1\le p\le \mathrm{\infty}$. Also, for the state space, weaker spatial norms can be chosen, e.g., ${L}^{p}(D)$ with $1\le p\le 2$ or any norm on the duals ${H}^{-\alpha}(D)$ of Sobolev spaces with $\alpha >0$. If weaker norms for the state space are considered, it is possible to relax the conditions of Theorem 2.1 by sharpening some estimates in the proof of the theorem. The results in the following corollary cover the whole range of $\alpha \ge 0$ and splits it into sections with weakening conditions. In particular note that after passing to weaker norms, the convergence does not necessitate that the neuron numbers per population diverge. However, regarding the divergence of the neuron populations, this condition (${\delta}_{+}(n)\to 0$) cannot be relaxed.

**Corollary 2.1**
*Let*
$\alpha \ge 0$
*and set*
$q:=\{\begin{array}{cc}\frac{2d}{d+2\alpha}\hfill & \mathit{\text{if}}0\le \alpha d/2,\hfill \\ 1-\hfill & \mathit{\text{if}}\alpha =d/2,\hfill \\ 1\hfill & \mathit{\text{if}}d/2\alpha \mathrm{\infty}.\hfill \end{array}$

(2.12)

*Further*,

*assume that* $w\in {L}^{q}(D)\times {L}^{2}(D)$ *and* $I\in {L}_{\mathrm{loc}}^{2}({\mathbb{R}}_{+},{H}^{1}(D))$ *and that the sequence of initial conditions converges to* $\nu (0)$ *in probability in the space* ${H}^{-\alpha}(D)$,

*that* ${lim}_{n\to \mathrm{\infty}}{\delta}_{+}(n)=0$ *and* $\begin{array}{ll}{lim}_{n\to \mathrm{\infty}}\frac{{v}_{+}{(n)}^{2\alpha /d}}{{\ell}_{-}(n)}=0& \mathit{\text{if}}0\le \alpha d/2,\\ {lim}_{n\to \mathrm{\infty}}\frac{{v}_{+}{(n)}^{1-}}{{\ell}_{-}(n)}=0& \mathit{\text{if}}\alpha =d/2,\\ {lim}_{n\to \mathrm{\infty}}\frac{{v}_{+}(n)}{{\ell}_{-}(n)}=0& \mathit{\text{if}}d/2\alpha \mathrm{\infty},\end{array}\}$

(2.13)

*where* 1−

*denotes an arbitrary positive number strictly smaller than* 1.

*Then it holds for all* $T,\u03f5>0$ *that* $\underset{n\to \mathrm{\infty}}{lim}{\mathbb{P}}^{n}[\underset{t\in [0,T]}{sup}{\parallel {\nu}_{t}^{n}-\nu (t)\parallel}_{{H}^{-\alpha}(D)}>\u03f5]=0$

*and for* $r\ge 1$,

*if the additional boundedness assumptions of Theorem * 2.1

*are satisfied*,

*that for all* $T>0$ $\underset{n\to \mathrm{\infty}}{lim}{\mathbb{E}}^{n}\underset{t\in [0,T]}{sup}{\parallel {\nu}_{t}^{n}-\nu (t)\parallel}_{{H}^{-\alpha}(D)}^{r}=0.$

*Remark 2.2* We believe that fruitful and illustrative comparisons of these convergence results and their conditions to the results in Kotelenez [17, 18], and particularly, Blount [4] can be made. Here, we just mention that the latter author conjectured the conditions (2.13) to be optimal for the convergence, but was not able to prove this result in his model of chemical reactions with diffusions for the region $\alpha \in (0,d/2]$. For our model, we could achieve these rates.

#### 2.1.1 Infinite-Time Convergence

In the law of large numbers, Theorem 2.1, and its Corollary 2.1 we have presented results of convergence over finite time intervals. Employing a different technique, we are also able to derive a convergence result over the whole positive time axis motivated by a similar result in [32]. The proof of the following theorem is deferred to Sect. 4.3. Restricted to finite time intervals, the subsequent result is strictly weaker than Theorem 2.1. However, the result is important when one wants to analyse the mean long time behaviour of the stochastic model via a bifurcation analysis of the deterministic limit as (2.14) suggests that ${\mathbb{E}}^{n}{\nu}_{t}^{n}$ is close to $\nu (t)$ for all times $t\ge 0$ for sufficiently large *n*.

**Theorem 2.2** *Let* $\alpha \ge 0$ *and assume that the conditions of Corollary * 2.1

*are satisfied*.

*We further assume that the current input function* $I\in {L}_{\mathrm{loc}}^{2}({\mathbb{R}}_{+},{H}^{1}(D))$ *satisfies* ${\parallel {\mathrm{\nabla}}_{x}I\parallel}_{{L}^{\mathrm{\infty}}({\mathbb{R}}_{+},{L}^{2}(D))}<\mathrm{\infty}$,

*i*.

*e*.,

*it is square integrable in* ${H}^{1}(D)$ *over bounded intervals*,

*and possesses first spatial derivatives bounded for almost all* $t\ge 0$ *in* ${L}^{2}(D)$.

*Then it holds that* $\underset{n\to \mathrm{\infty}}{lim}\underset{t\ge 0}{sup}{\mathbb{E}}^{n}{\parallel {\nu}_{t}^{n}-\nu (t)\parallel}_{{H}^{-\alpha}(D)}=0.$

(2.14)

### 2.2 A Martingale Central Limit Theorem

In this section, we present a central limit theorem for a sequence of martingales associated with the jump processes

${\nu}^{n}$. A brief, heuristic discussion of the method of proof for the law of large numbers explains the importance of these martingales and motivates their study. In the proof of the law of large numbers, the central argument relies on the fact that the process

${({\nu}_{t}^{n})}_{t\ge 0}$ satisfies the decomposition

${\nu}_{t}^{n}={\nu}_{0}^{n}+{\int}_{0}^{t}\lambda ({\Theta}_{s}^{n},s){\int}_{{\mathbb{N}}_{0}^{P}}({\nu}^{n}(\xi )-{\nu}^{n}\left({\Theta}_{s}^{n}\right)){\mu}^{n}(({\Theta}_{s}^{n},s),\mathrm{d}\xi )\phantom{\rule{0.2em}{0ex}}\mathrm{d}s+{M}_{t}^{n}.$

(2.15)

Here, the process ${({M}_{t}^{n})}_{t\ge 0}$ is a Hilbert space-valued, square-integrable, càdlàg martingale using (2.15) as its definition. We have used this representation of the process ${\nu}^{n}$ in the proof of Theorem 2.2; see Sect. 4.3. We note that the Bochner integral in (2.15) is a.s. well defined due to bounded second moments of the integrand; see (4.7) in the proof of Theorem 2.1. Now an heuristic argument to obtain the convergence to the solution of the Wilson–Cowan equation is the following: The initial conditions converge, the martingale term ${M}^{n}$ converges to zero and the integral term in the right-hand side of (2.15) converges to the right-hand side in the Wilson–Cowan equation (1.3). Hence, the ‘solution’ ${\nu}^{n}$ of (2.15) converges to the solution *ν* of the Wilson–Cowan equation (1.3). Now interpreting Eq. (2.15) as a stochastic evolution equation, which is driven by the martingale ${({M}_{t}^{n})}_{t\ge 0}$ sheds light on the importance of the study of this term. Because, from this point of view, the martingale part in the decomposition (2.15) contains all the stochasticity inherent in the system. Then the idea for deriving a Langevin or linear noise approximation is to find a stochastic non-trivial limit (in distribution) for the sequence of martingales and substituting heuristically this limiting martingale into the stochastic evolution equation. Then it is expected that this new and much less complex process behaves similarly to the process ${({\nu}_{t}^{n})}_{t\ge 0}$ for sufficiently large *n*. Deriving a suitable limit for ${({M}_{t}^{n})}_{t\ge 0}$ is what we set to do next. The result can be found in Theorem 2.3 below and takes the form of a central limit theorem.

First of all, what has been said so far implies the necessity of re-scaling the martingale with a diverging sequence in order to obtain a non-trivial limit. The conditions in the law of large numbers imply in particular that the martingale converges uniformly in the mean square to zero, i.e.,

$\underset{n\to \mathrm{\infty}}{lim}{\mathbb{E}}^{n}\underset{t\in [0,T]}{sup}{\parallel {M}_{t}^{n}\parallel}_{{L}^{2}(D)}=0,$

which in turn implies convergence in probability and convergence in distribution to the zero limit.

Furthermore, in contrast to Euclidean spaces norms on infinite-dimensional spaces are usually not equivalent. In Corollary 2.1, we exploited this fact as it allowed us to obtain convergence results under less restrictive conditions by changing to strictly weaker norms. In the formulation and proof of central limit theorems, the change to weaker norms even becomes an essential ingredient. It is often observed in the literature, see, e.g., [4, 17, 18] that central limit theorems cannot be proven in the strongest norm for which the law of large numbers holds, e.g., ${L}^{2}(D)$ in the present setting, but only in a strictly weaker norm. Here, this norm is the norm in the dual of an appropriate Sobolev space. Hence, from now on, we consider for all $n\in \mathbb{N}$ the processes ${({\nu}_{t}^{n})}_{t\ge 0}$ and the martingales ${({M}_{t}^{n})}_{t\ge 0}$ as taking values in the space ${H}^{-\alpha}(D)$ for an $\alpha >d$, where *d* is the dimension of the spatial domain *D*, using the embedding of ${L}^{2}(D)$ into ${H}^{-\alpha}(D)$. The technical significance of the restriction $\alpha >d$ is that these are the indices such that there exists an embedding ${H}^{\alpha}(D)$ into a ${H}^{{\alpha}_{1}}(D)$ with $d/2<{\alpha}_{1}<\alpha $, which is of Hilbert–Schmidt type^{c} due to Maurin’s theorem and ${H}^{{\alpha}_{1}}(D)$ is embedded into $C(\overline{D})$ due to the Sobolev embedding theorem. These two properties are essential for the proof of the central limit theorem and their occurrence will be made clear subsequently.

The limit we propose for the re-scaled martingale sequence is a

*centred diffusion process* in

${H}^{-\alpha}(D)$, that is, a centred continuous Gaussian stochastic process

${({X}_{t})}_{t\ge 0}$ taking values in

${H}^{-\alpha}(D)$ with independent increments and given covariance

$C(t)$,

$t\ge 0$; see, e.g., [

12,

25] for a discussion of Gaussian processes in Hilbert spaces. Such a process is uniquely defined by its covariance operator and conversely, each family of linear, bounded operators

$C(t):{H}^{\alpha}(D)\to {H}^{-\alpha}(D)$,

$t\ge 0$, uniquely defines a diffusion process

^{d} if

- (i)
each

$C(t)$ is

*symmetric* and

*positive*, i.e.,

${\u3008C(t)\varphi ,\psi \u3009}_{{H}^{\alpha}(D)}={\u3008C(t)\psi ,\varphi \u3009}_{{H}^{\alpha}(D)}\phantom{\rule{1em}{0ex}}\text{and}\phantom{\rule{1em}{0ex}}{\u3008C(t)\varphi ,\varphi \u3009}_{{H}^{\alpha}(D)}\ge 0,$

- (ii)
each

$C(t)$ is of

*trace class*, i.e., for one (and thus every) orthonormal basis

${\phi}_{j}$,

$j\in \mathbb{N}$, in

${H}^{\alpha}(D)$ it holds that

$\sum _{j=1}^{\mathrm{\infty}}{\u3008C(t){\phi}_{j},{\phi}_{j}\u3009}_{{H}^{\alpha}(D)}<\mathrm{\infty},$

(2.16)

- (iii)
and the family $C(t)$, $t\ge 0$, is *continuously increasing* in *t* in the sense that the map $t\mapsto {\u3008C(t)\varphi ,\psi \u3009}_{{H}^{\alpha}(D)}$ is continuous and increasing for all $\varphi ,\psi \in {H}^{\alpha}(D)$.

We next define the process, which will be the limit identified in the martingale central limit theorem via its covariance. In order to define the operator

*C*, we first define a family of linear operators

$G(\nu (t),t)$ mapping from

${H}^{\alpha}(D)$ into the dual space

${H}^{-\alpha}(D)$ via the bilinear form

$\begin{array}{r}{\u3008G(\nu (t),t)\varphi ,\psi \u3009}_{{H}^{\alpha}(D)}\\ \phantom{\rule{1em}{0ex}}={\int}_{D}\varphi (x)(\frac{1}{\tau}\nu (t,x)\\ \phantom{\rule{2em}{0ex}}+\frac{1}{\tau}f({\int}_{D}w(x,y)\nu (t,y)\phantom{\rule{0.2em}{0ex}}\mathrm{d}y+I(t,x)))\psi (x)\phantom{\rule{0.2em}{0ex}}\mathrm{d}x.\end{array}$

(2.17)

It is obvious that this bilinear form is symmetric and positive and, as

$\nu (t)$ is continuous in

*t*, it holds that the map

$t\mapsto {\u3008G(\nu (t),t)\varphi ,\psi \u3009}_{{H}^{\alpha}(D)}$ is continuous for all

$\varphi ,\psi \in {H}^{\alpha}(D)$. Furthermore, it is easy to see that the operator is bounded, i.e.,

${\parallel G(\nu (t),t)\parallel}_{L({H}^{\alpha}(D),{H}^{-\alpha}(D))}=\underset{{\parallel \varphi \parallel}_{{H}^{\alpha}(D)}=1}{sup}\underset{{\parallel \psi \parallel}_{{H}^{\alpha}(D)}=1}{sup}\left|{\u3008G(\nu (t),t)\varphi ,\psi \u3009}_{{H}^{\alpha}(D)}\right|<\mathrm{\infty},$

as the solution of the Wilson–Cowan equation

*ν* and the gain function

*f* are pointwise bounded. Hence, due to the Cauchy–Schwarz inequality, the norm

$|{\u3008G(\nu (t),t)\varphi ,\psi \u3009}_{{H}^{\alpha}(D)}|$ is proportional to the product

${\parallel \varphi \parallel}_{{L}^{2}(D)}{\parallel \psi \parallel}_{{L}^{2}(D)}$ and for any

$\alpha \ge 0$ the Sobolev embedding theorem gives now a uniform bound in terms of the norm of

*ϕ*,

*ψ* in

${H}^{\alpha}(D)$. As a final property, we show that these operators are of trace-class if

$\alpha >d/2$. Thus, let

${({\phi}_{j})}_{j\in \mathbb{N}}$ be an orthonormal basis in

${H}^{\alpha}(D)$, then the Cauchy–Schwarz inequality yields

$\left|{\u3008G(\nu (t),t){\phi}_{j},{\phi}_{j}\u3009}_{{H}^{\alpha}(D)}\right|\le \frac{1}{\tau}(1+{\parallel f\parallel}_{0})|D|{\parallel {\phi}_{j}\parallel}_{{L}^{2}(D)}^{2}.$

Summing these inequalities for all $j\in \mathbb{N}$, we find that the resulting right-hand side is finite as due to Maurin’s theorem the embedding of ${H}^{\alpha}(D)$ into ${L}^{2}(D)$ is of Hilbert–Schmidt type. Moreover, their trace is even bounded independently of *t*.

Now, it holds that the map

$t\mapsto G(\nu (t),t)$ is continuous taking values in the Banach space of trace class operators, hence we define trace class operators

$C(t)$ from

${H}^{\alpha}(D)$ into

${H}^{-\alpha}(D)$ via the Bochner integral for all

$t\ge 0$ $C(t):={\int}_{0}^{t}G(\nu (s),s)\phantom{\rule{0.2em}{0ex}}\mathrm{d}s.$

(2.18)

Clearly, the resulting bilinear form ${\u3008C(t)\cdot ,\cdot \u3009}_{{H}^{\alpha}(D)}$ inherits the properties of the bilinear form (2.17). Moreover, due to the positivity of the integrands, it follows that ${\u3008C(t)\varphi ,\varphi \u3009}_{{H}^{\alpha}(D)}$ is increasing in *t* for all $\varphi \in {H}^{\alpha}(D)$. Hence, the family of operators $C(t)$, $t\ge 0$, satisfies the above conditions (i)–(iii), and thus uniquely defines an ${H}^{-\alpha}(D)$-valued diffusion process.

We are now able to state the martingale central limit theorem. The proof of the theorem is deferred to Sect. 4.4.

**Theorem 2.3** (Martingale central limit theorem)

*Let* $\alpha >d$ *and assume that the conditions of Theorem * 2.1

*are satisfied*.

*In particular*,

*convergence in the mean holds*,

*i*.

*e*., (2.11)

*holds for* $r=1$.

*Additionally*,

*we assume it holds that* $\underset{n\to \mathrm{\infty}}{lim}\frac{{v}_{-}(n)}{{v}_{+}(n)}\frac{{\ell}_{-}(n)}{{\ell}_{+}(n)}=1.$

(2.19)

*Then it follows that the sequence of re*-

*scaled* ${H}^{-\alpha}(D)$-

*valued martingales* ${\left(\sqrt{\frac{{\ell}_{-}(n)}{{v}_{+}(n)}}{M}_{t}^{n}\right)}_{t\ge 0}$

*converges weakly on the space of* ${H}^{-\alpha}(D)$-*valued càdlàg function to the* ${H}^{-\alpha}(D)$-*valued diffusion process defined by the covariance operator* $C(t)$ *given by* (2.18).

*Remark 2.3* In connection with the results of Theorem 2.3, two questions may arise. First, in what sense is there uniqueness of the re-scaling sequence, and hence of the limiting diffusion? That is, does a different scaling also produce a (non-trivial) limit, or, rephrased, is the proposed scaling the correct one to look at? Secondly, the theorem deals with the norms for the range of $\alpha >d$ in the Hilbert scale, what can be said about convergence in the stronger norms corresponding to the range of $\alpha \in [0,d]$? Does there exist a limit? We conclude this section addressing these two issues.

Regarding the first question, it is immediately obvious that the re-scaling sequence $\frac{{\ell}_{-}(n)}{{v}_{+}(n)}$, which we denote by ${\rho}_{n}$ in the following, is not a unique sequence yielding a non-trivial limit. Re-scaling the martingales ${M}^{n}$ by any sequence of the form $\sqrt{c{\rho}_{n}}$ yields a convergent martingale sequence. However, the limiting diffusion differs only in a covariance operator, which is also re-scaled by *c*, and hence the limit is essentially the same process with either ‘stretched’ or ‘shrinked’ variability. However, the asymptotic behaviour of the re-scaling sequences, which allow for a non-trivial weak limit is unique. In general, by considering different re-scaling sequences ${\rho}_{n}^{\ast}$, we obtain three possibilities for the convergence of the sequence $\sqrt{{\rho}_{n}^{\ast}}{M}^{n}$. If ${\rho}_{n}^{\ast}$ is of the same speed of convergence as ${\rho}_{n}$, i.e., for ${\rho}_{n}^{\ast}=\mathcal{O}({\rho}_{n})$, the thus re-scaled sequence converges again to a diffusion process for which the covariance operator is proportional to (2.18). This is then just a re-scaling by a sequence (asymptotically) proportional to ${\rho}_{n}$ as discussed above. Secondly, if the convergence is slower, i.e., ${\rho}_{n}^{\ast}=o({\rho}_{n})$, then the same methods as in the law of large numbers show that the sequence converges to zero uniformly on compacts in probability, hence also convergence in distribution to the degenerate zero process follows. Thus, one only obtains the trivial limit. Finally, if we rescale by a sequence that diverges faster, i.e., ${\rho}_{n}=o({\rho}_{n}^{\ast})$, we can show that there does not exist a limit. This follows from general necessary conditions for the preservation of weak limits under transformation, which presuppose that $\sqrt{{\rho}_{n}^{\ast}/{\rho}_{n}}M$ has to converge in distribution in order for $\sqrt{{\rho}_{n}^{\ast}}{M}_{n}$ possessing a limit in distribution; see Theorem 2 in [29]. As the sequence ${\rho}_{n}^{\ast}/{\rho}_{n}$ diverges, this is clearly not possible to hold.

Unfortunately, an answer to the second question is not possible in this clarity, when considering non-trivial limits. Essentially, we can only say that the currently used methods do not allow for any conclusion on convergence. The limitations are the following: The central problem is that for the parameter range $\alpha \in [0,d]$ the current method does not provide tightness of the re-scaled martingale sequence, hence we cannot infer that the sequence possesses a convergent subsequence. However, if tightness can be established in a different way then for the range $\alpha \in (max\{1,d/2\},d]$, the limit has to be the diffusion process defined by the operator (2.18) as follows from the characterisation of any limit in the proof of the theorem. Here, the lower bound of $max\{1,d/2\}$ results, on the one hand, from our estimation technique, which necessitates $\alpha \ge 1$, and on the other hand, from the definition of the limiting diffusion. Recall that the covariance operator is only of trace class for $\alpha >d/2$. Hence, for $\alpha \in [0,d/2]$, we can no longer infer that the limiting diffusion even exists.

### 2.3 The Mean-Field Langevin Equation

An important property of the limiting diffusion in view toward analytic and numerical studies is that it can be represented by a stochastic integral with respect to a cylindrical or

*Q*-Wiener process. For a general discussion of infinite-dimensional stochastic integrals, we refer to [

12]. First, let

${({W}_{t})}_{t\ge 0}$ be a cylindrical Wiener process on

${H}^{-\alpha}(D)$ with covariance operator being the identity. Then

$G(\nu (t),t)\circ {\iota}^{-1}$ is a trace class operator on

${H}^{-\alpha}(D)$ for suitable values of

*α*. Here,

${\iota}^{-1}:{H}^{-\alpha}(D)\to {H}^{\alpha}(D)$ is the Riesz representation, i.e., the usual identification of a Hilbert space with its dual. The operator

$G(\nu (t),t)\circ {\iota}^{-1}$ possesses a unique square-root we denote by

$\sqrt{G(\nu (t),t)\circ {\iota}^{-1}}$, which is a Hilbert–Schmidt operator on

${H}^{-\alpha}(D)$. It follows that the stochastic integral process

${Z}_{t}:={\int}_{0}^{t}\sqrt{G(\nu (s),s)\circ {\iota}^{-1}}\phantom{\rule{0.2em}{0ex}}\mathrm{d}{W}_{s}$

(2.20)

is a diffusion process in

${H}^{-\alpha}(D)$ with covariance operator

$C(t)$. That is,

${({Z}_{t})}_{t\ge 0}$ is a version of the limiting diffusion in Theorem 2.3. Now, formally substituting for the limits in (2.15) yields the

*linear noise approximation* ${U}_{t}={\nu}_{0}+{\int}_{0}^{t}{\tau}^{-1}({U}_{s}+F({U}_{s},s))\phantom{\rule{0.2em}{0ex}}\mathrm{d}s+{\u03f5}_{n}{\int}_{0}^{t}\sqrt{G(\nu (s),s)\circ {\iota}^{-1}}\phantom{\rule{0.2em}{0ex}}\mathrm{d}{W}_{s},$

or in differential notation

$\mathrm{d}{U}_{t}={\tau}^{-1}({U}_{t}+F({U}_{t},t))\phantom{\rule{0.2em}{0ex}}\mathrm{d}t+{\u03f5}_{n}\sqrt{G(\nu (t),t)\circ {\iota}^{-1}}\phantom{\rule{0.2em}{0ex}}\mathrm{d}{W}_{t},\phantom{\rule{1em}{0ex}}{U}_{0}={\nu}_{0},$

(2.21)

where

${\u03f5}_{n}=\sqrt{{v}_{+}(n)/{\ell}_{-}(n)}$ is small for large

*n*. Here, we have used the operator notation

$F:{H}^{-\alpha}(D)\times {\mathbb{R}}_{+}\to {H}^{-\alpha}(D):F(g,t)(x)=f({\u3008g,w(x,\cdot )\u3009}_{{H}^{\alpha}(D)}+I(t,x)).$

Equation (2.21) is an infinite-dimensional stochastic differential equation with additive (linear) noise. Here, additive means that the coefficient in the diffusion term does not depend on the solution

${U}_{t}$. A second formal substitution yields the

*Langevin approximation*. Here, the dependence of the diffusion coefficient on the deterministic limit

*ν* is formally substituted by a dependence on the solution. That is, we obtain a stochastic partial differential equation with multiplicative noise given by

${V}_{t}={V}_{0}+{\int}_{0}^{t}{\tau}^{-1}({V}_{s}+F({V}_{s},s))\phantom{\rule{0.2em}{0ex}}\mathrm{d}s+{\u03f5}_{n}{\int}_{0}^{t}\sqrt{G({V}_{s},s)\circ {\iota}^{-1}}\phantom{\rule{0.2em}{0ex}}\mathrm{d}{W}_{s},$

or in differential notation

$\mathrm{d}{V}_{t}={\tau}^{-1}({V}_{t}+F({V}_{t},t))\phantom{\rule{0.2em}{0ex}}\mathrm{d}t+{\u03f5}_{n}\sqrt{G({V}_{t},t)\circ {\iota}^{-1}}\phantom{\rule{0.2em}{0ex}}\mathrm{d}{W}_{t}.$

(2.22)

Note that the derivation of the above equations was only formal, hence we have to address the existence and uniqueness of solutions and the proper setting for these equations. This is left for future work. It is an ongoing discussion and probably undecidable as lacking a criterion of approximation quality which—if any at all—is the correct diffusion approximation to use. First of all note that for both versions the noise term vanishes for $n\to \mathrm{\infty}$, and thus both have the Wilson–Cowan equation as their limit. And also, neither of them approximates even the first moment of the microscopic models exactly. This means that for neither we have that the mean solves the Wilson–Cowan equation, which would be only the case if *f* were linear. However, they are close to the mean of the discrete process. We discuss this aspect in Appendix B.

Furthermore, we already observe in the central limit theorem, and thus also in the linear noise and Langevin approximation that the covariance (2.18) or the drift and the structure of the diffusion terms in (2.21) and (2.22), respectively, are independent of objects resulting from the microscopic models. They are defined purely in terms of the macroscopic limit. This observation supports the conjecture that these approximations are independent from possible different microscopic models converging to the same deterministic limit. Analogous statements hold also for derivations from the van Kampen system size expansion [5] and in related limit theorems for reaction diffusion models [4, 17, 18]. The only object reminiscent of the microscopic models in the continuous approximations is the re-scaling sequence ${\u03f5}_{n}$. However, the re-scaling is proportional to the square root of ${\ell}_{-}(n)/{v}_{+}(n)$, i.e., the number of neurons per area divided by the size of the area, which is just the local density of particles. Therefore, in the approximations, the noise scales inversely to the square root of neuron density in this model, which interpreted in this way can also be considered a macroscopic fixed parameter and chosen independently of the approximating sequence.

*Remark 2.4* The stochastic partial differential equations (

2.21) and (2.22), which we proposed as the linear noise or Langevin approximation, respectively, are not necessarily unique as the representation of the limiting diffusion as a stochastic integral process (2.20) may not be unique. It will be subject for further research efforts to analyse the practical implications and usability of this Langevin approximation. Let

*Q* be a trace class operator,

${({W}_{t}^{Q})}_{t\ge 0}$ be a

*Q*-Wiener process and let

$B(\nu (t),t)$ be operators such that

$B(\nu (t),t)\circ Q\circ B{(\nu (t),t)}^{\ast}=G(\nu (t),t)\circ {\iota}^{-1}$, where

^{∗} denotes the adjoint operator. Then also the stochastic integral process

${Z}_{t}^{Q}:={\int}_{0}^{t}B(\nu (s),s)\phantom{\rule{0.2em}{0ex}}\mathrm{d}{W}_{s}^{Q}$

is a version of the limiting diffusion in (2.3) and the corresponding linear noise and Langevin approximations are given by

$\mathrm{d}{U}_{t}^{Q}={\tau}^{-1}({U}_{t}^{Q}+F({U}_{t}^{Q},t))\phantom{\rule{0.2em}{0ex}}\mathrm{d}t+{\u03f5}_{n}B(\nu (t),t)\phantom{\rule{0.2em}{0ex}}\mathrm{d}{W}_{t}^{Q}$

and

$\mathrm{d}{V}_{t}^{Q}={\tau}^{-1}({V}_{t}^{Q}+F({V}_{t}^{Q},t))\phantom{\rule{0.2em}{0ex}}\mathrm{d}t+{\u03f5}_{n}B({V}_{t}^{Q},t)\phantom{\rule{0.2em}{0ex}}\mathrm{d}{W}_{t}^{Q}.$

We conclude this section by presenting one particular choice of a diffusion coefficient and a Wiener process. We take

${({W}_{t}^{Q})}_{t\ge 0}$ to be a cylindrical Wiener process on

${L}^{2}(D)$ with covariance

$Q={\mathrm{Id}}_{{L}^{2}}$. Then we can choose

$B(t)=j\circ (\cdot \sqrt{g(t)})\in L({L}^{2}(D),{H}^{-\alpha}(D))$, where

*j* is the embedding operator

${L}^{2}(D)\hookrightarrow {H}^{-\alpha}(D)$ in the sense of (1.2) and

$(\cdot \sqrt{g(t)})\in L({L}^{2}(D),{L}^{2}(D))$ denotes a pointwise product of a function in

${L}^{2}(D)$, i.e.,

$(\varphi \cdot \sqrt{g(t)})(x)=\varphi (x){({\tau}^{-1}\nu (t,x)+{\tau}^{-1}f({\int}_{D}w(x,y)\nu (t,y)\phantom{\rule{0.2em}{0ex}}\mathrm{d}y+I(t,x)))}^{1/2}.$

We first investigate the operator

$G(\nu (t),t)\circ {\iota}^{-1}$ and write it in more detail as the following composition of operators:

$G(\nu (t),t)\circ {\iota}^{-1}=j\circ (\cdot g(t))\circ k\circ {\iota}^{-1},$

where

*k* is the embedding operator

${H}^{\alpha}(D)\hookrightarrow {L}^{2}(D)$. Next, the Hilbert adjoint

${B}^{\ast}\in L({H}^{-\alpha},{L}^{2})$ is given by

${B}^{\ast}=(\cdot \sqrt{g})\circ k\circ {\iota}^{-1}$, which is easy to verify. Hence, the stochastic integral of

$B(t)$ with respect to

${W}^{Q}$ is again a version of the limiting martingale as

$\begin{array}{rcl}B(t)\circ Q\circ {B}^{\ast}(t)& =& j\circ (\cdot \sqrt{g(t)})\circ {\mathrm{Id}}_{{L}^{2}}\circ (\cdot \sqrt{g(t)})\circ k\circ {\iota}^{-1}\\ =& j\circ (\cdot g(t))\circ k\circ {\iota}^{-1}=G(\nu (t),t)\circ {\iota}^{-1}.\end{array}$