A Stochastic Version of the Jansen and Rit Neural Mass Model: Analysis and Numerics

Neural mass models provide a useful framework for modelling mesoscopic neural dynamics and in this article we consider the Jansen and Rit neural mass model (JR-NMM). We formulate a stochastic version of it which arises by incorporating random input and has the structure of a damped stochastic Hamiltonian system with nonlinear displacement. We then investigate path properties and moment bounds of the model. Moreover, we study the asymptotic behaviour of the model and provide long-time stability results by establishing the geometric ergodicity of the system, which means that the system—independently of the initial values—always converges to an invariant measure. In the last part, we simulate the stochastic JR-NMM by an efficient numerical scheme based on a splitting approach which preserves the qualitative behaviour of the solution.

summary on their history, applications and an outlook on their future possible use, we refer to [11]. In general, neural mass models can be derived as a mean-field limit from microscopic models [12] and involve just a few state variables such as average membrane potentials and average population firing rates.
In this article, we focus on the Jansen and Rit neural mass model (JR-NMM) [13], which has been introduced as a model in the context of human electroencephalography (EEG) rhythms and visual evoked potentials [14]. It dates back to the work of Lopes da Silva and Van Rotterdam [3,5,15]. The JR-NMM is a biologically motivated convolution-based model of a neuronal population involving two subpopulations, i.e. excitatory and inhibitory interneurons forming feedback loops, which can describe background activity, alpha activity, sporadic and also rhythmic epileptic activity.
The original JR-NMM is formulated as a set of three coupled second-order nonlinear ordinary differential equations (ODEs), i.e. these constitute a system of coupled nonlinear oscillators, often rewritten as the six-dimensional system of first-order equations. After introducing this system in Sect. 2, we rewrite the system in the format of classical mechanics, that is, as a damped Hamiltonian system with a nonlinear displacement. Furthermore, in most of the literature, the JR-NMM includes a term representing extrinsic input or background noise, which essentially is done by declaring that input function to be a stochastic process. Mathematically this implies that the solution process of the ODE system then also is a stochastic process inheriting the analytical properties of the input process and requiring some framework of stochastic analysis for its mathematical treatment. In Sect. 3 we discuss options for such a framework and in this article we choose to formulate a stochastic JR-NMM as a stochastic differential equation (SDE) with additive noise, in particular a stochastic damped Hamiltonian system with a nonlinear term. Systems of SDEs of this or similar form are well studied in the molecular dynamics literature, where they are often called Langevin equations. 1 In this article we provide a range of results employing various techniques available in the framework of stochastic analysis developed for SDEs: In Sect. 4 we establish basic properties of the SDE such as moment bounds and bounds on the path behaviour. Section 5 augments existing analysis of the dynamics of the deterministic JR-NMM, in particular we consider stochastic versions of equilibrium solutions, i.e. invariant measures, as well as the long-time behaviour of solutions of the SDE with respect to this invariant measure. These results may be interpreted as starting points for studies of phenomenological stochastic bifurcations or noise-induced transitions. Finally, in Sect. 6, we present efficient numerical methods designed for stochastic Hamiltonian problems and show that these numerical methods, which represent discrete stochastic systems for any fixed step-size, respect the properties previously established for the SDE system (subject to mild conditions on the step-size). Thus the resulting numerical methods will not only be quite efficient for future computational studies with the stochastic JR-NMM, they also provide reliable computational results.

Description of the Original Jansen and Rit Neural Mass Model
A detailed summary of the model derivation from the neuroscientific point of view can be found in [18][19][20]. The main neural population, the excitatory and inhibitory interneurons, are in each case described by both a second-order ordinary differential operator, which transforms the mean incoming firing rate into the mean membrane potential, and a nonlinear function, which transforms the mean membrane potential into the mean output firing rate. For t ∈ [0, T ] with T ∈ R + , the JR-NMM proposed in [13] consists of three coupled nonlinear ODEs of second order which can be written as the six-dimensional first-order ODE systeṁ x 0 (t) = x 3 (t), with initial value (x 0 (0), . . . , x 5 (0)) T = x 0 ∈ R 6 . Here, x i for i ∈ {0, 1, 2} describe the mean postsynaptic potentials of distinct neuronal populations. The output signal y(t) := x 1 (t) − x 2 (t) describes the average membrane potential of the main family, i.e. the principal neurons of the JR-NMM (see [18,19,21]). The function p : [0, T ] → R describes the external input which may originate both from external sources or the activity of neighbouring neural populations. We will discuss the mathematical modelling of p in more detail at the end of this section. The sigmoid function Sigm : R → [0, ν max ], ν max > 0 (as suggested in [4]) is given by and works as a gain function transforming the average membrane potential of a neural population into an average firing rate (see [22,23]). The constant ν max denotes the maximum firing rate of the neural population, v 0 ∈ R is the value for which 50% of the maximum firing rate is attained and r > 0 determines the slope of the sigmoid function at v 0 .
System (2) includes 11 parameters A, B, a, b, C 1 , C 2 , C 3 , C 4 , ν max , r, v 0 and typical values for these parameters, taken from [13,19], are given in Table 1. The parameters A, B, a and b model basic features of postsynaptic potentials. In particular, A and B denote the excitatory and inhibitory synaptic gain, respectively, and  [13]). The solution behaviour of System (2) depends sensitively on the values of the parameters (we refer to the bifurcation analyses in [18,19,24]). Especially, changes in the connectivity constants C i can result in drastic changes of the solution path. Subsequently, we will employ the Hamiltonian formulation of classical mechanics to study coupled oscillators such as System (1) or (2). Let Q := (x 0 , x 1 , x 2 ) T and P := (x 3 , x 4 , x 5 ) T denote three-dimensional vectors, then System (2) can be written as a damped Hamiltonian system with nonlinear displacement, In this formulation, the system consists of a Hamiltonian part with Hamiltonian function H : a damping part with damping matrix Γ = diag[a, a, b] ∈ R 3×3 , and a nonlinear part given by the function G : The Hamiltonian H (Q, P ) may be interpreted as the total energy of an electrical RCL parallel resonant circuit; see [22]. In particular, H (Q, P ) is proportional to the sum of the inductive and capacitive energy of the neuronal population, respectively. If the input p(t) is a bounded deterministic function, the solution curve and the total energy H (Q, P ) are bounded (this is an immediate result of Theorem 4.2 in Sect. 4) and the time change in the total energy is given by In the original paper by Jansen and Rit [13], the external input p(t) has been used to represent spontaneous background noise as well as peak-like functions for generating evoked potentials. In the latter case the extrinsic input has been modelled as a deterministic periodic function (see also [25]) and with this type of input, the solution of the System (1) (or (2) or (3)) remains a deterministic function and the mathematical background to treat it is deterministic analysis. In the former case, i.e. when p(t) represents spontaneous background noise and is modelled as a stochastic process, the mathematical background immediately changes to be stochastic analysis. In particular, the solution of the Systems (1), (2) or (3) becomes a stochastic process and it inherits the mathematical properties of the input process p(t). Within stochastic analysis, (1), (2) or (3) may be interpreted in different frameworks, with consequences depending on the specific choices of p(t).
(i) Random Ordinary Differential Equation (RODE) framework: RODEs are pathwise ODEs involving a stochastic process in their right-hand side, i.e. for a sufficiently smooth function f : R m × R d → R d and an m-dimensional stochastic process ξ(t), a d-dimensional system of RODEs is given bẏ with an appropriate initial value. One may then choose the stochastic input process for example as a Wiener process or a coloured noise process, these processes exist in the classical sense and have almost surely continuous paths. In this framework standard deterministic analysis for e.g. guaranteeing existence and uniqueness of solutions can be applied pathwise; see for example [26], Chap. 1. However, the solution of this equation inherits the smoothness properties of the driving stochastic process ξ(t), independent of the smoothness of the function f . Analysis of properties and dynamics of solutions of RODEs may be performed pathwise by standard analysis techniques, bearing in mind that the low smoothness of the solutions limits the applicability of many classical results, such as Taylor's theorem. We further refer to [27] for relevant results concerning random dynamical systems. Another consequence concerns the numerical treatment: as the order of convergence of classical numerical schemes for ODEs is determined by the smoothness of the solution of that ODE, when such schemes are applied pathwise to RODEs, they usually converge with a lower order than their expected one. In particular, they converge with order at most 1/2 when the input process is chosen as the Wiener process or a coloured noise process, as their paths are only Hölder continuous of order less than 1/2. We refer to [28] and its references for further information on numerical methods specifically designed for RODEs.
(ii) Stochastic Differential Equation framework: If one were to choose the stochastic input process in an RODE as above as a Gaussian white noise process, one would need to deal with the fact that such a process exists only in the sense of generalised stochastic processes; see [29], Sect. 3.2, or [30], Appendix I. In particular, Gaussian white noise is usually interpreted as the (generalised) derivative of the Wiener process, which itself is almost surely nowhere differentiable in the classical sense. It is much more convenient to work in the classical stochastic analysis framework designed to deal with differential equations 'subject to (white) noise' and interpret Systems (1), (2) or (3) as a stochastic differential equation; see also [29], Sect. 4.1. A considerable amount of results concerning analysis, dynamics, numerics, statistics, etc. of SDEs is available and for stochastic numerics we refer for example to [31], which also treats SDEs driven by coloured noise.

Jansen and Rit Neural Mass Model as a Damped Stochastic Hamiltonian System with Nonlinear Displacement
Let (Ω, F, P) be a complete probability space together with the filtration {F t } t∈[0,T ] which is right-continuous and complete. We extend the model of System (2) by allowing perturbation terms such as p(t) not only in x 1 (t) but in both x 0 (t) and x 2 (t) as well. For this purpose, we define the functions μ i : [0, T ] → R and σ i : [0, T ] → R + for i ∈ {3, 4, 5}. The functions μ i will be used for representing deterministic input whereas σ i will be used for scaling the stochastic components. In an analogous way to the exposition concerning stochastic oscillators in [32], Chap. 8, or in [33], Chap. 14.2, we symbolically introduce Gaussian white noiseẆ i representing the stochastic input into Eq. (1) as follows: with deterministic initial value (X 0 (0), . . . , X 5 (0)) T = X 0 ∈ R 6 . Here, the processes W i (t) for i ∈ {3, 4, 5} are independent, F t -adapted Wiener processes on (Ω, F, P). Note that as the system above is an additive noise system the Itô and Stratonovich interpretations of that SDE system coincide. As for the deterministic case in Sect. 2, we can use the (Q, P )-notation of classical mechanics with initial values and nonlinear displacement As before, we define the output signal as Systems of the type (5), typically called Langevin equations, have received considerable attention in the literature of molecular dynamics (see [16] for an overview). In particular, the long-time properties of such systems have been intensively studied in [34][35][36]. We employ these techniques in Sect. 5 to study the long-time behaviour of System (5).
We briefly discuss the existence of a solution of Eq. (5). As the sigmoid function Sigm is globally Lipschitz continuous, the existence and pathwise uniqueness of an F t -adapted solution is a standard result; see e.g. in [29], Theorem 6.2.2. In particular, Q is continuously differentiable. In the current context, it makes sense to assume that the functions μ i and σ i are smooth and bounded which we will do in the following.
We simulate the solution of Eq. (5) with the splitting integrator (24) proposed in Sect. 6 and illustrate the output signal in Fig. 1. The coefficients and the noise components are chosen in such a way that the simulation results of [14] for varying connectivity constants C can be reproduced. The numerical values for the parameters are given in Table 1. For the deterministic part of the external inputs we set μ 3 = μ 5 = 0 and μ 4 = 220, for the diffusion components we set σ 3 = σ 5 = 10 and σ 4 = 1,000 such that 'weak noise' is acting on the components X 3 and X 5 ; X 4 receives a stronger noise input. As in the original paper [14] we see (noisy) α-rhythm-like behaviour as well as spiking behaviour for varying connectivity constants C. In Fig. 2 we provide an illustration of changes in the system behaviour induced by including noise with plots of the phase portrait of the output signal for the case C = 135 and C = 270. The top two pictures show simulations of y of System (2), i.e. without noise, where the solution curves quickly converge towards a limit cycle. The bottom two pictures show a path of Y of System (5) and in particular for C = 135, the behaviour of the path is markedly different from the deterministic case.

Moment Bounds and Path Behaviour
We have already mentioned in Sect. 2 that the solution paths of Eq. (1) take values in a bounded set. It is natural to ask in which sense this behaviour transfers to the stochastic setting. We answer this question via a twofold strategy. On the one hand we will study the time evolution of the moments of the solution, which describes the average behaviour of all solution paths. On the other hand we will study the behaviour on the level of single paths and estimate the probability that a specific path exceeds a given threshold. Before we study these qualitative properties of Eq. (5) we provide a convolution-based representation for the Q-component of Eq. (5) which simplifies the corresponding calculations considerably.

Convolution-Based Representation of the JR-NMM
In this section we rewrite Eq. (5) using X = (Q, P ) T as where and

Fig. 2 Phase portraits
Here, we denote by O 3 , I 3 ∈ R 3×3 the zero and identity matrix, respectively. Moreover, we define 0 3 := (0, 0, 0) T and 1 3 : Note that M is a block matrix with diagonal submatrices. Hence, we can calculate an explicit expression for the matrix exponential, Obviously, the matrix exponential fulfils e Mt M = Me Mt . This allows one to represent solutions of Eq. (5) via the following convolution-based formula.
Proof Applying Itô's formula ( [29], Theorem 5.3.8) to the JR-NMM in Eq. (6) and using the commutativity of M and e Mt we obtain which reads in integral form Since the nonlinear part N only depends on Q the equation for Q is given by Eq. (8).
Remark 1 From the latter proof we also get a convolution-based representation for P , however, this formula depends on Q. Indeed, for t ∈ [0, T ], Remark 2 System (1) has originally been deduced by using convolutions of impulse response functions with functions of the output from other subpopulations within the neural mass (see [13,18,37,38]). These response functions have the same shape as the kernel κ in Eq. (8), which thus can be interpreted as the stochastic version of this kernel representation.

Moment Bounds
Using representation (8) we provide bounds on the first and second moment of Q; analogous results can be derived for P . In the remainder of this section we will perform various componentwise calculations and estimations. For ease and consistency of notation we define the following: Let x, y ∈ R n , then x ≤ y denotes x i ≤ y i for all 1 ≤ i ≤ n. Furthermore, for U, V ∈ R n×k we denote the Hadamard product of U and V as U V , which is defined as the elementwise product (see [39,40]) such that each element of the n × k matrix U V is given as In addition, we define U 2 := U U and U (1/2) as the elementwise root with (U (1/2) Proof We write .
Note that E[u(t)] = u(t) and that the expectation of an Itô integral is zero, i.e.
ds and integration of κ yield the desired estimates.
Obviously, the bounds provided by Theorem 4.2 also hold for the deterministic equation (3) which justifies our claim in the introduction.

Remark 3
The upper bound depends linearly on μ i,max and the connectivity constants C i whereas u(t) decays exponentially fast towards 0 3 . In particular, Similar calculations can be done for the second moments of the components of Q(t). We obtain the following result.

Theorem 4.3 Let the assumptions of Theorem 4.2 hold and assume
In particular,  Proof From the proof of Theorem 4.2 it immediately follows that The last equality can be shown by applying the Itô isometry. For notational simplicity we omit the dependence on t in the following. By using the Cauchy-Schwarz inequality we bound Applying the bounds (9a)-(9b), the desired result follows from In Fig. 3 we employ Monte Carlo simulation to estimate E[X 1 (t)] for varying coupling parameter C. The results for the second moment E[X 2 1 (t)] are essentially the same; see Fig. 4. Similar results can be obtained for X 2 and X 3 . The numerical approximations of the expectation (blue curves) stay well within the theoretical bounds (red curves), whereas single trajectories (purple curves) of course may exceed the bounds of the average. Note that, for C = 68, 135 and 675, the approximations of E[X 1 (t)] rapidly converge towards fixed values for growing t. The same behaviour can be observed for C = 270 on larger time scales. We will give a theoretical explanation for this phenomenon in Sect. 5 when we study the long-time behaviour of Eq. (5).

Pathwise Bounds
Theorem 4.2 states that on average the solution of Eq. (5) stays in some bounded set. However, the theorem gives no information for single solution paths, which can in principle attain arbitrarily large values with positive probability; see Lemma A.2 in the Appendix. In this section we want to quantify the probability of such large values. The following theorem provides an upper bound on the escape probability of the components of Q, i.e. the probability that for i ∈ {0, 1, 2} the solution X i is larger than a given threshold x th i ∈ R + .

Theorem 4.4 Let the assumptions of Theorem
Then the probability that the components X i (t) for i ∈ {0, 1, 2} exceed the given thresholds x th i ∈ R + is bounded by Proof From Eq. (8), the bound on G and again integrating κ, we immediately see that each path of Q is bounded by the stochastic process Y defined by The process Y (t) is Gaussian distributed with mean u(t) + Γ −2 (I 3 − ϑ(t))C G and covariance matrix 1 , which were calculated in Theorems 4.2 and 4.3. Then, for each t ∈ [0, T ], Remark 4 Theorem 4.4 can, for example, be used for calibration of the noise parameters in Σ. Let Σ = diag[σ 3 , σ 4 , σ 5 ]. Suppose we want to choose σ 3 such that the corresponding component X 0 (t) stays below some given threshold x th 0 with high probability α. Then a suitable choice of σ 3 is implicitly given by In Fig. 5 we illustrate numerical trajectories of Eq. (5) and the corresponding bounds for varying levels of α, i.e. for a given time point t, the probability of X 1 (t) to be below the red, purple and black curve is at least 60, 90 and 99 percent, respectively.

Long-Time Behaviour and Stationary Solutions
A further property of interest of an SDE concerns the asymptotic behaviour of solution trajectories. The classical approach in ODE theory for analysing the long-time behaviour of ODE systems is to study the stability of equilibrium solutions and limit cycles. Even in the simplest case of constant input p(t) = p ∈ R, the deterministic equation (3) can possess several equilibrium solutions (both stable and unstable) as well as limit cycles with typically nontrivial basins of attraction (again we refer to the bifurcation analyses in [18,19,24]). Thus, the choice of the initial value can have large impact on the long-time behaviour of the solution curves of Eq. (3). From a practical point of view, this fact may be problematic, as it is not at all obvious how to estimate the initial value of the P -component.
In this section, we will analyse a stochastic counterpart of equilibrium solutions, more precisely invariant measures, and study the long-time asymptotics of Eq. (5). Our main tool is the theory of ergodic Markov processes, for convenience of the reader we recapitulate the basic definitions.
Let (X(t)) t∈[0,T ] be the solution process of Eq. (5). Standard stochastic analysis shows that X(t) is a Markov process and the corresponding transition probability P t (A, x), i.e. the probability that X(t) reaches a Borel-set A ⊂ R 6 at time t, when it started in the point x ∈ R 6 at time t = 0, is given by We use the definition provided in [41], Chap. 2, to characterise invariant measures. For simplicity, let η be a probability measure on (R 6 , B(R 6 )) (in general η can degenerate on some lower dimensional space). The measure η is called invariant if In particular, if we set the initial value (Q 0 , P 0 ) to be a random vector with distribution η, then there exists a Markov process X(t) which satisfies Eq. (5) and the distribution of X(t) is η for all t ∈ [0, T ]. In this sense, the concept of invariant measures can be seen as a natural extension of stationary solutions of deterministic ODEs.
We are interested in the following questions: (i) Does Eq. (5) have an invariant measure? (ii) Is the invariant measure unique? (iii) Do quantities of the type E[h(X(t))] converge towards stationary values for a suitable class of functions h : R 6 → R and any initial value (Q 0 , P 0 )?
We answer these questions in two steps: In the first step we will show that Eq. (5) fulfills a Lyapunov condition ensuring the existence of (possibly many) invariant measures. The questions (ii) and (iii) will be answered positively via the concept of geometric ergodicity. In classical mechanics and molecular dynamics, equations with a similar structure as Eq. (5), termed Langevin equations in the corresponding literature (see e.g. [16,31,36]), are well studied and we make use of relevant results concerning the long-time behaviour. In particular, we follow the presentation in [34]. As in the bifurcation analyses for Eq. (3) mentioned before, we assume that the deterministic parts of the perturbation as well as the diffusion matrix are constant, i.e.
Thus, G does not depend on t and we simply write G(Q).

Existence of Invariant Measures and Geometric Ergodicity
The existence of invariant measures for Eq. (5) can be established by finding a suitable Lyapunov function. Heuristically speaking, the existence of a Lyapunov function ensures both that the solution trajectories stay in some bounded domain (except for some rare excursions) and in the case of excursions, the trajectories return to the bounded set. The following lemma shows that a perturbed version of the Hamiltonian H in Eq. (5) can act as a Lyapunov function (see [42]).
Lemma 5.1 Assume a, b > 0 and let for n ∈ N V n (Q, P ) Then V n is a Lyapunov function for Eq. (5) in the following sense: where L denotes the generator of Eq. (5), Here, ∇ Q and ∇ P denote the gradient with respect to the Q and P component, respectively.
Proof For a, b > 0, Property (i) is satisfied by construction and we only have to prove (ii). In a first step we set n = 1 and analyse the action of L on V 1 . Note that V 1 is quadratic, therefore the second derivatives in L result in constants. Since P T Γ 2 Q = Q T Γ 2 P , we obtain The first two terms in Eq. (11) are quadratic and non-positive. As V 1 is quadratic and non-negative, there always exist constants α < 0 and β > 0 such that the first three terms in Eq. (11) can be bounded in the following way: Furthermore, Young's inequality implies where > 0 can be chosen arbitrarily small. Thus, there exist α < α < 0 and β > 0 sufficiently large such that In a second step, we will extend this procedure to V n for n ∈ N. For brevity we use the notation L 1 is a first-order differential operator and the action of L 1 on V n can be bounded in the following way by using Eq. (12): Applying L 2 to V n leads to where Γ i,j denotes the entry of Γ at the intersection of the ith row and j th column. Note that This implies that there exist c, c n > 0 such that and thus L 2 V n ≤ c n V n−1 .
As a consequence, there exist α n < 0 (possibly close to zero) and β n > 0 such that LV n = L 1 V n + L 2 V n ≤ αnV n + (nβ + c n )V n−1 ≤ α n V n + β n , and Property (ii) follows.
Lemma 5.1 has two immediate consequences. First, applying Itô's formula on V n we obtain the following bounds (see [34]).

Corollary 5.2 Let Assumption (10) hold and s, t ∈ [0, T ] with t ≥ s. Then
In particular, Second, the existence of a Lyapunov function ensures the existence of an invariant measure (see e.g. [43], Corollary 1.11).

Corollary 5.3 Let Assumption (10) hold and let X(t) denote the solution of Eq. (5). Then there exists an invariant measure η of X(t).
Lemma 5.1 does not give any information on the uniqueness of the invariant measure. If we further assume that the three Wiener processes W i act on all components of P , i.e. σ i > 0 for i ∈ {3, 4, 5}, we can establish the uniqueness of the invariant measure. Furthermore, the Markov process X fulfills the property of geometric ergodicity in the sense of [34]. We give a modification of the result in [34], Theorem 3.2, including the nonlinear function G. Then for any n ∈ N and any initial value X(0) = (Q 0 , P 0 ) there exist positive constants C n , λ n such that Proof The proof is the same as in [34]. The Lyapunov condition has been established in Lemma 5.1, the corresponding results for the necessary smoothness of the transition probabilities and the irreducibility of the Markov process are given in the Appendix in Lemma A.2 and A.1. Both lemmas rely on the assumption that σ i > 0 for i ∈ {3, 4, 5}.
Theorem 5.4 has two implications for the numerical simulation of Eq. (5). First, the actual choice of the initial value is insignificant as the impact of the initial value on the distribution of X(t) diminishes exponentially fast for growing t and an appropriate approximation of the system behaviour should be obtained with any choice of (Q 0 , P 0 ) provided that the system is simulated on a large enough time horizon. Second, due to the correspondence of the time averages and "space averages" of ergodic systems (see [41], Theorem 3.2.4.), one can estimate quantities of the type E[h(X(t))] (for t sufficiently large) by computing the time average of a single path of X(t) on a large time horizon instead of using Monte Carlo estimation which requires one to compute a large number of paths of X(t). Of course, both aspects hold only true if the numerical method reproduces the geometric ergodicity of the original system (see Sect. 6).
The computation of the invariant measure for nonlinear systems is highly nontrivial. One possibility would be to solve the corresponding Fokker-Planck equation, which is a six-dimensional nonlinear PDE. A standard alternative is to use stochastic simulation techniques to approximate the marginal densities of η. Several possibilities have been proposed in the literature how to estimate the distribution of the solution of SDEs; see e.g. [44] for an approach based on Malliavin calculus and kernel density estimation (see [45], Chap. 2). We use the latter approach where we choose the kernel functions to be Gaussian. The numerical samples are calculated as a longtime simulation of a single path with the splitting integrator Eq. (24) proposed in Sect. 6. In Fig. 6 we compare approximations of the stationary probability density of the output signal Y for varying coupling parameter C. We observe the change from unimodal densities (for C = 68, 135) to multi-modal densities C = 270 and to the peak-like structure C = 675. This behaviour can be interpreted as a phenomenological stochastic bifurcation as discussed, for example, in [46] or a noise-induced transition (see [47]).

Numerical Simulation
In order to obtain an approximation of Eq. (5) which accurately reproduces the qualitative behaviour, it is highly important to construct numerical integrators which on the one hand fulfil the properties of Eq. (5) derived in Sects. 4 and 5, and on the other hand are computationally efficient such that large ensembles of trajectories can be calculated in reasonable time.
We want to emphasise that the difficulty does not lie in the construction of a meansquare convergent integrator for Eq. (5). In fact, as the coefficient functions of Eq. (5) are globally Lipschitz continuous, any standard integrator (e.g. the Euler-Maruyama method) converges in the mean-square sense. However, it has already been shown for linear stochastic oscillators that the Euler-Maruyama method does not preserve second moment properties of that system [48] and it is expected that this negative result extends to nonlinear stochastic oscillators as well. The splitting methods fulfil the following properties: (i) The methods preserve the moment bounds proposed in Theorems 4.2 and 4.3.
Furthermore, for Σ = O 3 , the numerical method preserves the bounds of the exact solution. (ii) The Markov process generated by the numerical method is geometrically ergodic and fulfils a Lyapunov condition under very mild step-size restrictions.

Splitting Integrators for the JR-NMM
For convenience of the reader we provide a brief introduction to splitting integrators. Further details can be found, for example, in the classical monograph [49], Chapter II, for the deterministic case and [35,50] for stochastic Langevin-type equations.
The main idea of splitting integrators is the following: Assume for simplicity we want to approximate a deterministic ODE systeṁ for which the function f : R n → R n can be written as Of course, there can be several possibilities to decompose f . The goal is to choose f [j ] in such a way that the subsystemṡ can be solved exactly. Let ϕ [j ] t (y 0 ) denote the exact flow of the Subsystem (13) with initial value y 0 . Then the following compositions of flows define integrators of deter-ministic order one (Lie-Trotter splitting) and two (Strang splitting): This strategy can be extended to the stochastic setting (see [35] and the references therein for splitting integrators in the field of molecular dynamics, [50] for quasisymplectic splitting integrators, [51] for variational integrators based on a splitting approach and [52] for splitting integrators in a Lie group setting). In particular, splitting integrators have been applied efficiently to Langevin equations with a similar structure as Eq. (5), see [35,50], thus we extend this approach here to Eq. (5). The main step in the construction of splitting integrators is to choose a suitable decomposition of the coefficient functions of Eq. (5). The right-hand side decomposes into three rather distinct parts: First, a damped, linear oscillatory part, second, a nonlinear and non-autonomous coupling part which does not depend on the Pcomponent, and third, a stochastic part which does only arise in the P -component. Therefore, with the nonlinear term given by It makes sense to split the linear and the nonlinear drift contributions, thus providing two options to incorporate the stochastic noise. This yields two different sets of subsystems and therefore two different sets of numerical methods. In the first case, the stochastic subsystem defines a general Ornstein-Uhlenbeck process and we denote the corresponding splitting integrator as Ornstein-Uhlenbeck integrator. In the second case, the stochastic subsystem defines a Wiener process with drift and we denote the splitting integrator as Wiener integrator.
In the following let 0 = t 0 < · · · < t N = T with N ∈ N be an equidistant partition of [0, T ] with step-size t.
For both subsystems we can easily derive explicit representations of the exact solutions which can be used directly for the numerical simulation. Subsystem (14a) is a six-dimensional Ornstein-Uhlenbeck process. Let X [1] (t i ) = (Q [1] (t i ), P [1] (t i )) T denote the solution of Eq. (14a) at time point t i for i ∈ {0, . . . , N − 1}, then the exact solution at time point t i+1 > t i can be represented as where e M t is defined in Eq. (7). X [1] is a Gaussian process with conditional expectation E X [1] (t i+1 )|F t i = e M t X [1] (t i ) and the conditional covariance matrix (see [29], Theorem 8.2.6) In particular, the integral term in Eq. (15) can be simulated exactly. Indeed, it is Gaussian distributed with mean zero and covariance matrix Cov(t i+1 ), which is for t ≥ t i given as the unique solution of the matrix-valued ODE In the special case of a constant diffusion matrix Σ(t) = Σ ∈ R 3×3 , the exact solution of Eq. (16) can be explicitly calculated for t ≥ 0 as .
In general, Eq. (16) has to be solved by numerical approximation, however, it only needs to be precomputed once for the step-size t. In either case, we obtain where ξ i ( t) are iid six-dimensional Gaussian random vectors with expectation E[ξ i ( t)] = 0 6 and covariance matrix Cov( t). Subsystem (14b) is a deterministic system and the solution can be obtained by integration with respect to time. As before let X [2] (t i ) = (Q [2] (t i ), P [2] (t i )) T denote the solution of Eq. (14b) at time point t i , then the exact solution at time point t i+1 > t i is given by Q [2] (t i+1 ) = Q [2] (t i ), P [2] (t i+1 ) = P [2] (t i ) + t i+1 t i G s, Q [2] (s) ds (18) = P [2] (t i ) + tG II Q [2] where we assume that the last integral can be calculated exactly. Now, let ϕ ou, [1] t and ϕ ou, [2] t denote the exact flows of Eq. (14a) and (14b) given via Eq. (17) and (18), respectively. Let x ∈ R 6 , then a one-step integrator is defined by the composition of the flows
(20b) Subsystem (20a) is a deterministic system. Let X [1] (t i ) = (Q [1] (t i ), P [1] (t i )) T denote the solution of Eq. (20a) at time point t i , then the exact solution at time point t i+1 is given by The solution of subsystem (20b) is-by definition-given by Q [2] (t i+1 ) = Q [2] (t i ), P [2] (t i+1 ) = P [2] (t i ) + tG II Q [2] (t i ) where the last term can be simulated exactly as a three-dimensional Gaussian random vector with zero mean and covariance matrix In the case of a constant diffusion matrix Σ(t) = Σ ∈ R 3×3 , the covariance matrix is given as tΣ 2 .

Order of Convergence and Strang Splitting
As the noise in Eq. (5) is additive, standard integrators such as the Euler-Maruyama method converge with mean-square order one. The same holds true for the splitting integrators constructed above.
Theorem 6.1 Let 0 = t 0 < · · · < t N = T be an equidistant partition of [0, T ] with step-size t, and let X ou (t i ) and X w (t i ) denote the numerical solutions defined by Eq. (19) and (23) at time point t i starting at initial value (Q 0 , P 0 ) ∈ R 6 . Then the one-step methods defined in Eq. (19) and (23) are of mean-square order one, i.e. there exist constants C 1 , C 2 > 0 such that for sufficiently small t the inequalities hold for all time points t i .
Proof The result can be proved in the same way as in [50], Lemma 2.1.. For deterministic ODE systems the convergence order of splitting methods can be increased by using composition based on fractional steps (see e.g. [49], Chapter II). We will illustrate this approach for the method based on the subsystems (20a) and (20b), the other method can be treated analogously. Using a Strang splitting we can compose the integrator For Σ = O 3 , Eq. (24) is a second-order method for the deterministic Eq. (2), however, the mean-square order of Eq. (24) is still one. To increase the mean-square order one has to include higher-order stochastic integrals to reproduce the interactions of the Subsystems (20a) and (20b) (see [50], Sect. 2, for details). Note that even without including the higher-order stochastic integrals the Strang splitting integrator given by Eq. (24) performs considerably better in our numerical simulations than the Lie-Trotter methods, thus we recommend to use this type of integrator. We have not yet studied the reason for this improved performance, but expect that the symmetry of the Strang splitting or the weak noise acting on the system may contribute. We illustrate the mean-square convergence of our proposed methods in Fig. 7 and compare the Strang splitting Eq. (24) with the standard Euler-Maruyama method for the coupling parameters C = 68 and C = 135. As expected, both methods have meansquare order one, however, for C = 135 the mean-square error (MSE) of the splitting method is significantly smaller than the MSE of the Euler-Maruyama method. Obviously, one might use smaller step-sizes for the Euler-Maruyama method; however, this quickly becomes highly inefficient, e.g. for the JR-NMM for multiple populations  or when the Euler-Maruyama method is embedded in a continuous-time particle filter.
Figs. 8 and 9 also demonstrate the efficiency of the splitting scheme, as the correct (as not only observed from both methods with small step-sizes, but also based on our analysis of the method's properties) results can still be produced with much larger Fig. 9 Densities of the invariant measure of Y step-sizes than those required for the Euler-Maruyama method. The other important feature of the proposed method is its reliability. Figure 8 shows several plots of the phase portrait of one single path of the output Y , with the splitting method and the Euler-Maruyama method and different step-sizes. It can be observed that the phase portrait obtained with the latter method changes markedly with increasing step-size. These phase portraits have been computed for the coupling parameter C = 135, initial value X(0) = 0 6 , σ 3 = σ 5 = 1, σ 4 = 200 and t ∈ {10 −4 , 10 −3 , 2 · 10 −3 }. Figure 9 corresponds to the upper right plot in Fig. 6, which itself can be interpreted as a computational study of a phenomenological stochastic bifurcation for varying coupling parameter C. It shows the densities of the invariant measure of Y for C = 135, σ 3 = σ 5 = 10 and σ 4 = 10 3 and compares the Strang splitting scheme with the Euler-Maruyama method over the time-step-sizes t ∈ {10 −3 , 2 · 10 −3 , 5 · 10 −3 }. The Euler-Maruyama method with moderately small step-sizes would report a change from a unimodal to a bimodal density for the parameter C = 135, whereas the correct value of C for this change to happen should be much larger.

Moment Bounds and Geometric Ergodicity
The following two lemmas represent the properties presented in Sect. 4 for the numerical approximation schemes defined by Eq. (19) and Eq. (23). Let X ou = (Q ou , P ou ) and X w = (Q w , P w ) denote the numerical solutions defined by Eq. (19) and Eq. (23), respectively. We start with proving analogous bounds to those in Theorem 4.2 for the expected value of Q ou and Q w . It is well known already in the deterministic setting that the Euler scheme does not preserve such properties, see [49], Chap. 1, in the stochastic case negative results for the Euler-Maruyama method for (simple) stochastic oscillators have been observed in [48]. Note that the following two lemmas also hold when commuting the compositions in Eq. (19) and Eq. (23).
Proof We prove the result for the numerical method ψ w , ψ ou can be treated analogously. Bearing in mind the notation in Sect. 4, we obtain from Eq. (23) that and in particular its Q-component reads From the proof of Theorem 4.2 we obtain 0 3 ≤ E[G(t i−k , X w (t i−k ))] ≤ C G . Obviously, the lower bound of E[Q w (t i )] is fulfilled for any time step-size t. To prove the upper bound it remains to show that which is in each component smaller than Γ −2 for any time step-size t.
Remark 5 Another viewpoint of Eq. (23) is to apply the rectangle method (using the left boundary point of the integral) in order to approximate the convolution-based formula Eq. (8) in the form Moreover, Eq. (25) permits better insight into the distinction of the numerical schemes: The sum in Eq. (25) corresponds to the rectangle method in order to approximate the convolution integral E[v(t)] defined in the proof of Theorem 4.2, where the right boundary point is used in each approximation interval. Analogously, when commuting the composition in Eq. (23) one obtains the rectangle method evaluating the left boundary points. In the case of the Strang splitting scheme given by Eq. (24), the function κ is evaluated at the midpoints (t k + t k+1 )/2.

Remark 6
It can be shown analogously that the second moment E[(Q w (t i )) 2 ] (and also E[(Q ou (t i )) 2 ]) is bounded by The last point we discuss in this article is the geometric ergodicity of the discrete Markov processes X ou and X w defined by Eq. (19) and (23). In analogy to Sect. 5 we assume that Assumption (10) holds. Due to the global Lipschitz continuity of the coefficients of Eq. (5), one would expect that standard numerical methods such as the Euler-Maruyama method are again geometrically ergodic for small enough stepsizes t (see [34], Theorem 7.3). The advantage of our proposed splitting integrators is that we can directly prove a discrete analog of Lemma 5.1, i.e. a discrete Lyapunov condition for the same Lyapunov function under very mild restrictions on t. We formulate the result for the Wiener integrator, the Ornstein-Uhlenbeck integrator can be treated analogously. Lemma 6.3 Let 0 < t 0 · · · < t N = T be an equidistant partition of [0, T ] with step-size t < 1/(2 Γ L ∞ ) and let X w denote the numerical solutions defined by Eq. (23). Then the functional V (X) := V 1 (Q, P ) defined in Lemma 5.1 is a Lyapunov function for X w , i.e. there exist constants α ∈ (0, 1) and β ≥ 0 such that Proof For the sake of simplicity we set a = b, which implies e −Γ t x = e −at x for any x ∈ R 3 . Furthermore, we denote Q := Q(t i ), P := P (t i ) to shorten notation. The one-step approximations Q(t i+1 ) and P (t i+1 ) can be written as is a three-dimensional Gaussian vector independent of F t i . By elementary calculations and application of Young's inequality we obtain where > 0 is a parameter which can be freely chosen. Thus, one can find C 1 > 0 sufficiently large such that In the same spirit we can find C 2 > 0 such that with free parameters˜ ,ˆ > 0. Combining the bounds above we can find a suitable α ∈ (0, 1) if for any given t there exists a choice˜ * such that Note that andˆ can be chosen arbitrarily small, therefore the corresponding terms can be neglected. Now let t < 1/(2a), then Eq. (26a) and (26b) are fulfilled for which implies the result.
In analogy to Sect. 5, geometric ergodicity of the (discrete) Markov processes X w and X ou can be established by proving smoothness of the transition probabilities and irreducibility of the processes. Both properties can be proven in exactly the same way as in [34], Corollary 7.4, thus we only sketch the proof for X w : (i) Smoothness of the transition probability densities: Due to Assumption (10) the transition probability of two (or more) consecutive steps ψ w t • ψ w t of our integrator has a smooth density. (ii) Irreducibility: As in the time-continuous case in Sect. 5 we have to establish a reachability condition, i.e. the numerical method starting at x ∈ R 6 can reach any y ∈ R 6 after a fixed number of steps. For our splitting method, two consecutive steps are sufficient to reach any point y by suitably choosing the vectors ξ( t) such that In fact, Eq. (27) is a six-dimensional system of equations with six degrees of freedom (three Gaussian random variables for each step ψ w t ) which can always be solved under Assumption 10.
To summarise, the numerical approximations X ou and X w are geometrically ergodic with respect to a unique invariant measure η ou t and η w t under mild restrictions on the time-step-size t. Furthermore, as X ou and X w converge towards X in the meansquare sense, η ou t and η w t are convergent approximations of the original invariant measure η (see [53], Theorem 3.3, for details). Thus, our numerical approximations of the marginal densities in Sect. 5 (see Fig. 6) are supported by the theory.

Summary and Conclusions
We proposed a version of the original JR-NMM incorporating random input, as a stochastic Hamiltonian system with nonlinear displacement, and discussed a range of properties based on results available in the framework of stochastic analysis, in particular properties such as moment bounds and the existence of invariant measures. The latter represent a step towards analysing the dynamical properties of a stochastic formulation of the JR-NMM. Furthermore, we presented an efficient numerical scheme based on a splitting approach which preserves the qualitative behaviour of the solution of the system. We have also discussed the advantages of applying such a scheme designed according to the obtained features of the stochastic JR-NMM for future computational studies in contrast to applying other numerical methods such as the Euler-Maruyama scheme. By a suitable introduction of noise our results can be generalised to both the extension of the JR-NMM to multiple populations [37,[54][55][56][57] and the extension to multiple areas, e.g. the 2-column model in [13] or the multi-area neural mass model in [56].
As (∂f 0 )g i = Σ (:,i) −2Γ Σ (:,i) and (∂g i )f 0 = 0 6 , the statement directly follows from Assumption (10) and σ i > 0 for i ∈ {3, 4, 5}. Note that the nonlinear term N does not play any role in the computation as its Jacobian is only non-zero at the derivatives corresponding to the Q-component, which are multiplied with the first three components of the vectors g i . However, these three components are always zero.
Irreducibility can be established via a control type argument. For this purpose, let P t (A, x) denote the transition probabilities of the Markov process X.
where W : [0, T ] → R 3 is a continuously differentiable function. If then Property (28) holds as long as X(τ ) = X τ . Now, let Q(t) be a smooth curve such that The control W (t) is given via the second-order differential equation which has a unique solution as G is globally Lipschitz. Consequently, the point X τ is reachable for solutions of the controlled System (29). Condition (30) (and therefore irreducibility) is now an immediate consequence of the Stroock-Varadhan Support Theorem (see [59]).

Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.