### 5.1 Background: The Polynomial Chaos Expansion

In this subsection, we briefly review the polynomial chaos expansion method, which underpins the coarse-graining of the single- and double-cluster state dynamics in the following subsection. Wiener’s polynomial chaos expansion method [25], which has been widely used in the context of uncertainty quantification, allows one to obtain useful solutions to certain stochastic dynamical systems [26]. Consider a system described by a set of stochastic ODEs

where is the *n*-dimensional model variable, is the stochastic variable or parameter, an *m*-dimensional prescribed i.i.d. random variable each of which is drawn from the probability space . Here *Ω* is the sampling space, ℱ the *σ* field expanded by subsets of *Ω*, and *μ* the probability measure defined on ℱ. More complicated cases, e.g., where the dynamics is described by PDEs, can be formulated as well, however, such cases are not relevant to the current study.

Given a prescribed i.i.d. random variable *ω*, this method suggests the decomposition of the solution in the Hilbert space of the appropriately chosen polynomials of the random variable:

where is the member function or the basis function in the Hilbert space, and is called the *i* th order PC coefficient. Here, a one-dimensional relation is considered for simplicity; however, the concept itself can be readily extended to cases of higher dimension for the functionals and/or the random variables. The basis polynomial functions are orthonormal in the sense that

where is the complex conjugate of and is the Kronecker delta. From this orthonormality condition, can be computed by

In practice, the above expansion gets truncated at a certain order. Previous studies [35, 36] confirm that the orthonormal polynomials chosen from the Askey scheme for a given probability measure *μ* make the PC expansion converge exponentially with the rate of , where *κ* is a constant. However, the number of PC coefficients may rapidly increase as the random variable dimension *m* becomes larger, posing a computational challenge.

For low-dimensional random dynamical systems, where faster convergence arises through the PC expansion, one can substitute the truncated expansion Eq. (7) into Eq. (4),

Taking the Galerkin projection on both sides using the basis , the following weak form [26, 36] is obtained:

consisting of a set of coupled ODEs for the PC coefficients , which provide an alternative description of the system dynamics to the original model, once such a description is confirmed to exist.

### 5.2 Coarse-Graining of the Clustering Dynamics

A computational dynamical analysis at the individual neuron level, such as the one presented in the previous section, is too complicated to perform for any realistic population size; a coarse-grained, population-level dynamical description and analysis become not only preferred, but necessary. Instead of keeping track of the state of every single neuron, we need to keep only a few collective descriptors of these states; yet, since the neurons are not homogeneous in their synaptic dynamics, a few moments of the distribution of the states are not sufficient: We need to not only know what the average and standard deviation of the states are, we also need to know *which neurons* (e.g. the low-*τ* or the high-*τ* ones) have low or high state values. In this joint distribution of neuron identities and neuron states, the *marginal* distribution of neuron states is not informative enough. That is why we turn to PC coefficients quantifying the correlation between the neuron *identities* and the neuron *states*. As was observed in the single cluster formation in a few different networks of oscillators [24, 27–29], a similar type of correlation between the dynamical variables (, , , ) of the *i* th oscillator and its heterogeneity parameter rapidly develops in each of the clusters separately, during the initial transient (Fig. 2). The PC approach introduced to study the single cluster states [24] thus needs to be extended for the coarse-grained description of the double- and multiple-cluster states. In order to examine the possibility of applying the PC expansion to the double-cluster states, we first need to identify the distribution characteristics of the random (i.e., heterogeneity) parameters for *each* cluster, after the split.

When the network breaks up into two sub-networks, the original random parameters, ’s, are divided into two sets in a number of seemingly random ways, depending on the initial conditions of the neurons. Repeated numerical simulations from random initial configurations reveal that the random parameters for each cluster consistently span more or less the *same range* as the original random parameters (Fig. 2), and that the breaking of the original random parameter set into two subsets occurs in various permutations of the neuron identities. We quantitatively examine the statistical characteristics of the divided random parameters subsets using the Kolmogorov–Smirnov (KS) and the Wilk–Shapiro (WS) statistical tests [37], which compare the properties of an observed sample with those of the known distribution. As an illustrative example, we consider the case of a *normal* heterogeneity distribution.

The KS test compares the quartiles, or the cumulative distribution functions (CDFs). Denoting the sample CDF and the CDF of a known distribution as and *F*, respectively, let the largest difference between the two be

where *ω* is an i.i.d. random variable. For test statistics such as , the corresponding *p* value is the probability of obtaining a value of at least as extreme as that observed. For a given *p* value, the threshold value of can be computed. If exceeds the threshold, then the distribution of the sample is said to be *inconsistent* with the assumed distribution with significance level *p*. For below the threshold, all that can be said is that the distribution of the sample is *not inconsistent* with the assumed distribution characteristics, with significance level . The KS test is examined for the double-cluster states formed from a variety of initial configurations, in the case of the normal distribution of *ω*. When the population size exceeds hundreds of neurons, we find that the *p* value becomes very small, of the order of or even less than 0.01. The CDFs for varying network sizes are shown in Fig. 8. In addition to this, the WS test, comparing the ordered sample data with the expected value of the rank scores, or the normal scores or “rankits” [38], leads to the same conclusion.

Based on the above statistical tests, we conclude that the heterogeneity distribution *within each of the two sub-populations* is not inconsistent with the heterogeneity distribution of the entire population; and therefore, the same type of PC expansion used to coarse-grain the single-cluster state [24] can be applied to each of the double clusters *independently*, using the same basis functions and range. The PC expansion of the dynamical variables for each cluster reads

where is the dynamical variable (e.g., *V*, *m*, *n*, or *h*) of the *i* th neuronal cluster, which is expanded up to the *l* th order in the basis polynomials . is the *j* th order basis polynomial, which is chosen according to the characteristics of the random variable *ω*, following the generalized PC framework of the Askey scheme [36]. For instance, for a uniform random variable of *ω*, Legendre polynomials are the appropriate choice that leads to fast convergence. Likewise, Hermite polynomials [, , , , …] are appropriate for a normal random variable (Fig. 9) as in Wiener’s original work [25]. In the end, the states of 100 neurons in two clusters can be summarized in terms of a few PC coefficients per state variable per cluster; in our case 100 neurons (400 total variables, excluding synaptic variable) will be seen to usefully reduce to three coefficients per state variable (and thus 12 variables) for each cluster, for a total of 24 variables, a reduction by a factor of 16.7; 200 or even 2000 total neurons would still reduce to 24 coarse variables!

In an infinitely large network where the distribution of the random variable can be treated as continuous, the coefficients can be determined by the orthogonality relationship among the basis functions (Eq. (7)). However, in a finite-size network, as is often the case in practice, or when a truncated distribution is considered (e.g., the normal distribution for the current system, with the constraint of ), the orthogonality no longer holds exactly, and regression, such as a least squares fitting, determines the PC coefficients better. The *j* th PC coefficient for a particular variable *y* at a given time *t* is obtained by minimizing the residual of the *i* th cluster

where is a variable associated with the *k* th neuron belonging to the *i* th cluster, which consists of neurons. The first two coefficients have the following geometrical meaning on the coarse-grained level: is the average value, and measures the level of linear spread of the variable among the neurons around the average value , as a consequence of the heterogeneity. For the case of the membrane potential (when is ), measures the average potential, and roughly measures the instantaneous spread of the potential among the neurons in the *i* th cluster. The higher order PC coefficients are related to higher order moments of the spread of the individual neuron’s variables in each cluster.

The individual-level details, such as the exact composition of the neurons in each cluster, vary among different initial conditions and different draws of the random variable *ω*. However, the temporal trajectories of the PC coefficients remain robust over such microscopically distinguishable states, with a small level of statistical fluctuation. The PC expansion Eq. (11) converges rapidly; the magnitudes of rapidly decrease with increasing *j* (Fig. 9), as expected from the Askey scheme. Upon ensemble averaging, the PC description provides an appropriate statistical representation of the coarse-grained state.

So far, the random parameters in the divided clusters are assumed to be described by the same distribution as the original one for the entire network based on the findings of the statistical tests. However, even if statistically unlikely, the previously mentioned extreme case of “split-in-the-middle” state where one cluster is formed by the neurons of ( is a specific value around the middle value of 0) while the other cluster consists of neurons with the remaining values of *ω*, does exist; an artificially prepared double cluster state conforming to this grouping (whether or not) is indeed found to be stable. There exist only few limit cycle solutions of this type, and such states would be statistically insignificant in the coarse-grained description. Should such a split arise, the heterogeneity characteristics of each sub-network is clearly *inconsistent* with the full heterogeneity distribution. In this case, the heterogeneity sub-domain corresponding to each cluster should be treated separately to account for the split at . A variant or extension of the multi-element PC method developed for stochastic differential equations [39] should be considered in this case.