Measuring Edge Importance: A Quantitative Analysis of the Stochastic Shielding Approximation for Random Processes on Graphs

Schmidt, Deena R; Thomas, Peter J

doi:10.1186/2190-8567-4-6

Research
Open access
Published: 17 April 2014

Measuring Edge Importance: A Quantitative Analysis of the Stochastic Shielding Approximation for Random Processes on Graphs

Deena R Schmidt^1,2 &
Peter J Thomas^1,2,3

The Journal of Mathematical Neuroscience volume 4, Article number: 6 (2014) Cite this article

3220 Accesses
11 Citations
Metrics details

Abstract

Mathematical models of cellular physiological mechanisms often involve random walks on graphs representing transitions within networks of functional states. Schmandt and Galán recently introduced a novel stochastic shielding approximation as a fast, accurate method for generating approximate sample paths from a finite state Markov process in which only a subset of states are observable. For example, in ion-channel models, such as the Hodgkin–Huxley or other conductance-based neural models, a nerve cell has a population of ion channels whose states comprise the nodes of a graph, only some of which allow a transmembrane current to pass. The stochastic shielding approximation consists of neglecting fluctuations in the dynamics associated with edges in the graph not directly affecting the observable states. We consider the problem of finding the optimal complexity reducing mapping from a stochastic process on a graph to an approximate process on a smaller sample space, as determined by the choice of a particular linear measurement functional on the graph. The partitioning of ion-channel states into conducting versus nonconducting states provides a case in point. In addition to establishing that Schmandt and Galán’s approximation is in fact optimal in a specific sense, we use recent results from random matrix theory to provide heuristic error estimates for the accuracy of the stochastic shielding approximation for an ensemble of random graphs. Moreover, we provide a novel quantitative measure of the contribution of individual transitions within the reaction graph to the accuracy of the approximate process.

1 Introduction

Many biological systems exhibit a combination of stochastic (chance, random, noisy) and deterministic dynamics [1–3]. For example, mathematical models involving stochastic processes arise in physiology [4–7], ecology [8–10], and genetic regulatory systems [11–13]. Such mathematical models often originate as intrinsically complex, high-dimensional systems with many degrees of freedom, and many sources of variability. This inherent complexity presents two related challenges. First, the essential dynamics of such systems may be hard to discern, and model reduction based on first principles for stochastic systems on complex networks is difficult. Second, in order to predict the behavior of such systems under normal, pathological or experimental conditions, one must usually resort to numerical simulation studies. Even with the tremendous progress in computing power over the last decades, intrinsically high-dimensional stochastic systems remain prohibitive to simulate exhaustively. Moreover, because of their dimensionality, the results of ensembles of stochastic simulations can be challenging to interpret. Therefore, there is demand for efficient dimension reduction methods, both to provide high quality approximate numerical solutions to the stochastic evolution equations arising in high-dimensional systems, and to provide an efficient conceptual framework for interpretation of the behavior of such systems.

In [14], Schmandt and Galán introduced a stochastic shielding approximation as a fast, accurate method for generating sample paths from a finite state Markov process in which only a subset of states are observable. For example, in ion-channel models, such as the Hodgkin–Huxley or other conductance-based neural models, a nerve cell has a population of ion channels whose configurational states comprise the nodes of a graph, only some of which allow a transmembrane current to pass. That is, each vertex of the ion-channel state graph is labeled with a scalar “conductance”, which is either zero (nonconducting) or one (conducting). In a population of ion channels, the flux of individual channels making the transition from a state i to a state j is a stochastic process with mean rate, and it has fluctuations around the mean rate that depend on the population at state i. The stochastic shielding approximation consists of neglecting fluctuations associated with edges in the graph not directly affecting the observable states. Specifically, the random fluxes along edges connecting identically labeled states are replaced by the mean fluxes along those edges, while the random fluxes associated with edges connecting distinguishable states are left unchanged. This approximation is an example of complexity reduction, in the sense of reducing a stochastic process generated by K independent processes to a process on a smaller sample space, i.e. generated by $K^{'} < K$ processes. Schmandt and Galán observe that, remarkably, the variance of the observable state (the membrane conductance) is almost identical in the reduced and the unreduced system.^{Footnote 1} While the approximate process does not faithfully reproduce all aspects of the full process, it reproduces those features relevant to the neurophysiologist as well as to the larger biological system in which it is embedded.

Here we consider the problem of finding the optimal complexity reducing mapping from a stochastic process on a graph to an approximate process on a smaller sample space, as determined by the choice of a particular linear measurement functional on the graph. The partitioning of ion-channel states into conducting versus nonconducting states provides a case in point. In this paper we establish that Schmandt and Galán’s approximation is in fact optimal in a specific sense. We derive a quantitative measure of the contributions of individual edges in the graph to the accuracy of the approximation, relative to the chosen measurement functional. This approach allows quantitative comparison of edge importance, and sheds light on the parametric dependence of relative edge importance, for instance in a voltage-gated ion channel. In addition, we provide heuristic error estimates for the accuracy of the stochastic shielding approximation for an ensemble of symmetric random graphs.

Motivated by [14], we consider a multidimensional Ornstein–Uhlenbeck process on a graph $G = (V, E)$ with n nodes and m edges (reactions), and a linear measurement functional $M \in R^{n}$ . We show that the stochastic shielding approximation is the most accurate dimension reduction possible among those neglecting fluctuations in the same number of underlying processes. Neglecting a set of reactions in the full stochastic process X creates an approximate process $\tilde{X}$ which matches the behavior of the full process in the mean but deviates from the full process in the fluctuations.

Extending this idea for an ensemble of symmetric directed graphs $G = (V, E)$ , we establish two main results. Lemma 1, our first main result, allows us to find the optimal complexity reducing mapping from a stochastic process on a graph to an approximate process on a smaller sample space, as determined by the measurement M. Neglecting the fluctuations associated with a subset $E^{'}$ of the edge set ℰ defines a new process $\tilde{X} (t)$ that deviates from the full process $X (t)$ by an amount that we call the deficiency, $U (t) = \tilde{X} (t) - X (t)$ . The observed error, given M, is then $M^{⊺} U$ ; its mean is zero by construction, and its variance is $R = E [{(M^{⊺} U)}^{2}]$ . In Lemma 1 we provide an exact formula for the contribution of the k th edge to this error. This formula, which arises from a spectral decomposition of the graph Laplacian associated with the full process, gives an explicit criterion for choosing the k most important edges in the graph, for any $0 < k < m$ .

Our second main result, Theorem 2, applies this criterion to networks generated from a broad class of random graph ensembles with a randomly chosen binary measurement vector M. We show that the importance measures of individual edges cluster tightly around one of two values. For moderately large graphs, these clusters correspond with very high accuracy to Schmandt and Galán’s stochastic shielding heuristic; an extremely accurate, reduced complexity approximation is obtained by neglecting fluctuations associated with edges connecting states that are indistinguishable under the measurement M. We illustrate this result with a sample from the Erdös–Rényi random graph ensemble in Sect. 3.3.

The analysis of Schmandt and Galán focused on an accurate, efficient approximation of Markov processes arising from ion-channel models. In Sect. 4 we apply our analysis to processes on two graphs arising from the classical Hodgkin–Huxley system of ion channels: the 5-state model for the voltage-gated potassium channel, and the 8-state model for the voltage-gated sodium channel. In a more general setting, the transition rates connecting adjacent states in these models are voltage-dependent. Here we restrict attention to the stationary case, corresponding biologically to the behavior of the channels under “voltage clamped” conditions. For both the voltage-gated potassium and voltage-gated sodium channel state graphs we show that our ranking reproduces the Schmandt–Galán stochastic shielding heuristic over all physiologically relevant voltages. This example also demonstrates that our results apply to graphs with non-symmetric adjacency matrices, as well as to the symmetric case.

In Sect. 5 we discuss possible extensions of our results to examples including signal transduction networks and calcium-induced calcium release models, as well as systems with graded rather than binary measurement functionals.

2 Model

2.1 Connection to the Population Process

We develop our results in the context of stationary Ornstein–Uhlenbeck processes. In contrast, Schmandt and Galán [14] introduced stochastic shielding in the broader context of density dependent random walks on a graph from which our OU process arises as a large population approximation. To set the stage before moving to the OU process framework, we briefly describe a population process on a graph of the type considered by Schmandt and Galán. In particular, we consider a stationary stochastic process on a directed graph $G = (V, E)$ where $| V | = n$ and $| E | = m$ , the number of nodes and edges in the graph, respectively. Each directed edge corresponds to one reaction in the system. The k th edge $i j (k) = (i (k), j (k)) \in E$ is defined to start at node $i (k)$ and end at node $j (k)$ , so that the k th reaction effects a transition from state i to state j. Following [15, 16], we let $ζ_{k}$ be the stoichiometry vector associated with edge $i j (k) \in E$ . That is, the i th component of $ζ_{k}$ is −1, the j th component is 1, and all other components are zero.

ζ_{k} = (\begin{array}{c} ζ_{k} (1) \\ ⋮ \\ ζ_{k} (i) \\ ⋮ \\ ζ_{k} (j) \\ ⋮ \\ ζ_{k} (n) \end{array}) = (\begin{array}{c} 0 \\ ⋮ \\ - 1 \\ ⋮ \\ 1 \\ ⋮ \\ 0 \end{array}) .

(1)

Under stationary conditions, such as a population of ion channels under voltage clamp, the occupancy numbers of different states of a continuous time Markov process can be represented as the solution of the stochastic equation obtained from a random time change representation in terms of Poisson processes [17]. If $α_{k}$ gives the instantaneous per capita transition rate from state $i (k)$ to state $j (k)$ , then the full Markov process is specified by a collection of independent standard (unit rate) Poisson processes $Y_{k}$ each representing the occurrence of $i (k) \to j (k)$ transitions as follows. Letting $N (t) \in N^{n}$ be the nonnegative integer-valued vector representing the number of individuals in each of n states, we may write $N (t)$ as a sum of transitions occurring at random times specified by the collection of $Y_{k}$ .

N (t) = N (0) + \sum_{k \in E} ζ_{k} Y_{k} (\int_{0}^{t} α_{k} N_{i (k)} (s) d s) .

(2)

Because each transition preserves the total number of individuals (i.e. the components of $ζ_{k}$ sum to zero for each k), we have $\sum_{i} N_{i} (t) = N_{tot} = \sum_{i} N_{i} (0)$ for all $t > 0$ .

In Appendix B we show that, provided $N_{tot}$ is sufficiently large, we can approximate the deviation of $N (t)$ from its mean $\bar{N} \in R^{n}$ by a multidimensional, Gaussian, Ornstein–Uhlenbeck process $X (t) \in R^{n}$ , $X (t) \approx N (t) - \bar{N}$ which satisfies a stochastic differential equation of the form given in Eq. 4 below. In particular, we show that $X (t)$ can be approximated by an SDE of the form

d X (t) = \sum_{k \in E} ζ_{k} (X_{i (k)} (t) α_{k} d t + \sqrt{{\bar{N}}_{i (k)} α_{k}} d W_{k} (t)) .

(3)

2.2 Multidimensional Ornstein–Uhlenbeck Process

To obtain our main mathematical result, we consider a multidimensional Ornstein–Uhlenbeck process $X \in R^{n}$ on the directed graph $G = (V, E)$ where $| V | = n$ and $| E | = m$ . The state of the system at time t, $X (t)$ , satisfies Eq. 3, which we write in the equivalent form

d X = L X d t + B d W .

(4)

Here $L = {(A - D)}^{⊺}$ is the graph Laplacian (A is the weighted adjacency matrix of with entries $A_{i j} = α_{k} > 0$ if there is an edge from node $i (k)$ to $j (k)$ and zero otherwise, and D is the diagonal matrix such that entry $D_{i i} = \sum_{j} A_{i j}$ is the out-degree of node i). B is an $n \times m$ matrix, and $W \in R^{m}$ is an m-dimensional Brownian motion, i.e. each component $d W_{k}$ represents the increment of an independent standard Brownian motion capturing the fluctuations of the k th reaction about its mean.^{Footnote 2} Matrix B decomposes into a sum over the m reactions

B = \sum_{k = 1}^{m} B_{k}

(5)

such that the k th column of matrix $B_{k} = σ_{k} ζ_{k}$ and all other columns of $B_{k}$ are zero.

The stochastic shielding approximation for a system of the form given in Eq. 4 amounts to preserving the mean, but neglecting the fluctuations, for the processes driving a subset of the reactions, i.e. replacing B with an alternative matrix $\tilde{B}$ obtained by replacing a subset of columns in B with null vectors. The trajectories of the resulting SDE, $\tilde{X} (t)$ (see Eq. 7), are approximations of the trajectories of the full system.

In order to compare different complexity reduction choices, we define the deficiency of an approximation to be the difference between the true and approximate trajectories, $U (t) = \tilde{X} (t) - X (t)$ , when projected onto the measurement functional of interest M. As suggested by Schmandt and Galán, the stationary variance of the projection of the deficiency on the measurement vector provides an appropriate measure for comparing the quality of alternative reductions. That is, we use $R = Var [M^{⊺} U] = Var [M^{⊺} (\tilde{X} - X)]$ as our error measure. We focus on reductions that preserve the behavior of the system (Eq. 4) relative to a given linear measurement functional $M \in R^{n}$ . In the case of ion channels, $M \in {0, 1}^{n}$ represents the conductance of each channel state. We consider the case of graded rather than binary measurements in Sect. 5. Whether binary or graded, the measurement vector identifies the stochastic process of interest as the projection $Y (t) = M^{⊺} X (t)$ .

Formally, we consider two processes $X (t)$ (full process) and $\tilde{X} (t)$ (reduced process) defined on a common probability space $(Ω, F_{t}, P)$ . The sample space $Ω = C {[0, \infty)}^{n}$ , filtration $F_{t}$ , and Wiener measure P are those associated with m independent copies of the standard Brownian process. The approximate process $\tilde{X} (t)$ has the same sample space Ω and is measurable with respect to the same filtration $F_{t}$ , but also with respect to a smaller filtration ${\tilde{F}}_{t} \subset F_{t}$ generated by the Wiener processes associated with a subset of edges of the graph. The covariance of the deficiency, then, is well defined in terms of the underlying measure P on the full probability space.

In Appendix C.1 we show the standard result [18] that the stationary covariance matrix of the full process decomposes into a sum of the contributions from the m different reaction processes:

Cov [X (t), X^{⊺} (t)] = lim_{t \to \infty} \int_{0}^{t} \sum_{k = 1}^{m} σ_{k}^{2} exp [L (t - t^{'})] ζ_{k} ζ_{k}^{⊺} exp [L^{⊺} (t - t^{'})] d t^{'} .

(6)

Similarly, the variance of the projection $Y (t) = M^{⊺} X (t)$ also decomposes into a sum, because $Var [Y] = M^{⊺} Cov [X] M$ .

Because the (left) eigenvector corresponding to the leading (0) eigenvalue of L has constant components, it is orthogonal to $ζ_{k}$ for each k. (If L is symmetric, the right and left eigenvectors are interchangeable.) Therefore the corresponding eigenspace is contained in the kernel of the matrix $B_{k} B_{k}^{⊺}$ , for each k, which guarantees that the limit on the RHS of Eq. 6 remains finite.

Neglecting a set of reactions $E^{'} \subset E$ creates an approximate processes, $\tilde{X} (t)$ , which matches the behavior of the full process in the mean, but deviates from the full process in the fluctuations. This reduced process satisfies the following SDE

d \tilde{X} = L \tilde{X} d t + \tilde{B} d W,

(7)

where $\tilde{B} = \sum_{k \in E ∖ E^{'}} B_{k}$ sums over the edges we keep. Given the linear measurement functional $M \in R^{n}$ above, we define the approximate projection $\tilde{Y} (t) = M^{⊺} \tilde{X} (t)$ . Note that in the case of an ion-channel system, M is binary so Y and $\tilde{Y}$ just pull out the observable (i.e., conducting) states of each system. In Sect. 2.3, for instance, we consider a 3-state chain with one observable state (state 3) as a simple model of an ion-channel system. In that case, $M = {[0, 0, 1]}^{⊺}$ and $Y (t) = M^{⊺} X (t) = X_{3} (t)$ .

Neglecting a subset of reactions also introduces an error in the representation of the measurement $Y (t)$ versus $\tilde{Y} (t)$ due to the difference between $X (t)$ and $\tilde{X} (t)$ . Recall that $U (t) = \tilde{X} (t) - X (t)$ is the deficiency of the reduced model compared to the full model. Then $\tilde{Y} (t) - Y (t) = M^{⊺} U (t)$ , and $U (t)$ satisfies the SDE

d U = L U d t + (\tilde{B} - B) d W .

(8)

It is important to note that the noise source dW that appears in Eqs. 4 and 7 refers to the same noise process W in both cases. The deficiency of the approximation relative to the full process is given by taking the limit of the mean squared error (MSE) of $\tilde{Y} - Y$ (equivalent to the stationary variance of $\tilde{Y} - Y$ ), which, as shown in the proof of Lemma 1, is an expression of the sum over all neglected reactions.

Lemma 1 For an irreducible graph with a symmetric Laplacian L, let X and $\tilde{X}$ be the full and reduced processes defined by Eqs. 4 and 7, respectively, and let $M \in R^{n}$ . Let $E^{'} \subset E$ be the subset of edges neglected in the definition of $\tilde{X}$ . Let L be diagonalizable with eigenpairs ${(λ_{i}, v_{i})}_{i = 1}^{n}$ listed with eigenvalues $λ_{i}$ in order of decreasing real part and ${∥ v_{i} ∥}_{2} = 1$ . Then the stationary variance of the discrepancy $\tilde{Y} - Y = M^{⊺} (\tilde{X} - X)$ satisfies

R [E^{'}] \equiv lim_{t \to \infty} Var (\tilde{Y} - Y) = \sum_{k \in E^{'}} R_{k},

(9)

where

R_{k} = σ_{k}^{2} \sum_{i = 2}^{n} \sum_{j = 2}^{n} (\frac{- 1}{λ_{i} + λ_{j}}) (M^{⊺} v_{i}) (v_{i}^{⊺} ζ_{k}) (ζ_{k}^{⊺} v_{j}) (v_{j}^{⊺} M) .

(10)

We can rank the error terms $R_{k}$ in descending order, thereby ordering the corresponding reactions in terms of their “importance”. The most important reaction is the one with the largest value of $R_{k}$ ; if neglected, it would introduce the largest error. See Appendix C.2 for the proof of Lemma 1. Note that an individual term in the sum (10) will be zero if either $ζ_{k} ⊥ v_{i}$ or if $M ⊥ v_{i}$ for a given eigenvector $v_{i}$ . Typically, however, these vectors will not be orthogonal. Therefore, it is of interest to know how the values of $R_{k}$ are distributed for different examples: graphs of actual ion-channel states such as those in the classical Hodgkin–Huxley model, and more generally, ensembles of random graphs. In Sect. 4, we compute the distribution of $R_{k}$ for the graphs of the potassium and sodium channel states in the Hodgkin–Huxley model. In Sect. 3, we consider an ensemble of random graphs such as the Erdös–Rényi ensemble with randomly assigned binary measurement vector M and prove our main result, which is a statement about the expected value of $R_{k}$ . Should our random graph ensemble produce a graph that does not consist of a single connected component, then we may apply Lemma 1 to each isolated component of the graph separately. However, for the random graph ensembles we consider, the probability of drawing a disconnected graph decays very rapidly as $n \to \infty$ . We discuss this point further in Appendix D.

For a random graph ensemble, the eigenvectors of the graph Laplacian are distributed randomly on the unit sphere [19, 20]. Hence, they are unlikely to be exactly orthogonal to either $ζ_{k}$ or M. Given a series of assumptions (see Sect. 3.1) that are true for naturally occurring random ensembles such as the symmetric Gaussian and Erdös–Rényi ensembles, we state our main result.

Theorem 2 Given an ensemble of symmetric directed graphs $G = (V, E)$ with n nodes satisfying assumptions A0–A5 (see Sect. 3.1), a binary measurement vector $M \in {0, 1}^{n}$ satisfying $0 < \sum_{i} M_{i} \sim O (1)$ as $n \to \infty$ , and a stoichiometry vector $ζ_{k}$ corresponding to the kth reaction, the mean squared error $R_{k}$ resulting from neglecting the kth reaction has expected value

E [R_{k} | M] = \frac{σ_{k}^{2} | M^{⊺} ζ_{k} |}{n C} + O (n^{- q}), as n \to \infty, for some q > 1,

(11)

where the constant C depends on the mean edge weight.

This result shows that the edges in the graph naturally decompose into two classes, distinguished by their asymptotic behavior for large n. The first class of edges represents connections between differently labeled nodes, in terms of the measurement vector M. The first class comprises the “important” edges in the graph, in the sense that these edges have mean $R_{k}$ values that scale as order $1 / n$ . The second class of edges connects identically labeled nodes. These edges have mean $R_{k}$ values of order less than $n^{- q}$ , where $q > 1$ is driven by the fourth moment of the eigenvector components (see assumption A4a in Sect. 3.1 for details). As n increases, these edges become relatively “unimportant” and, hence, can be neglected under the stochastic shielding approximation with minimal loss of accuracy. For the case of the Gaussian ensemble, $q = 2$ . Empirically, for the Erdös–Rényi random graph ensemble, $q \approx 5 / 3$ (see discussion in Sect. 3.3 and also Fig. 4). The proof of Theorem 2 is given in Sect. 3.2. Before discussing more complicated examples, we illustrate the decomposition of the full process into approximate subprocesses for a simple 3-state example in the next subsection.

2.3 3-State Example

We illustrate Schmandt and Galán’s [14] stochastic shielding heuristic with the following simple example they considered. Figure 1 shows a 3-state chain which has adjacency matrix entries $A_{i j} = α_{k} = 1$ if there is an edge from $i (k)$ to $j (k)$ and zero otherwise. State 3 is designated as the only observable state. We think of this as the conducting state in an ion-channel model. Table 1 illustrates the notation introduced in Eq. 1 for this case.

Table 1 Indexing of nodes and edges for the 3-state process, cf. Eq. 1 and Fig. 1. The first column gives the reaction number, the middle column gives the direction of the reaction, and the last column gives the contribution of the reaction to the measurement $Y = M^{⊺} X$

Full size table

In this case, we suppose $σ_{k} = 1$ in the matrix B and use the linear measurement functional $M = {[0, 0, 1]}^{⊺}$ to pull out the third component of $X (t)$ , yielding the projection $Y (t) = M^{⊺} X (t) = X_{3} (t)$ . The vector $X (t) = {(X_{1} (t), X_{2} (t), X_{3} (t))}^{⊺}$ gives the occupancy of the system states at time t and satisfies the constant coefficient SDE given in Eq. 4 with

L = {(A - D)}^{⊺} = (\begin{array}{c} - 1 & 1 & 0 \\ 1 & - 2 & 1 \\ 0 & 1 & - 1 \end{array}),

(12)

B = (\begin{array}{c} σ_{1} ζ_{1} & σ_{2} ζ_{2} & σ_{3} ζ_{3} & σ_{4} ζ_{4} \end{array}) = (\begin{array}{c} - 1 & 1 & 0 & 0 \\ 1 & - 1 & - 1 & 1 \\ 0 & 0 & 1 & - 1 \end{array}),

(13)

W (t) = (\begin{array}{c} W_{1} (t) \\ W_{2} (t) \\ W_{3} (t) \\ W_{4} (t) \end{array}),

(14)

where the $W_{k} (t)$ are independent and identically distributed standard Brownian motions, and

A = (\begin{array}{c} 0 & 1 & 0 \\ 1 & 0 & 1 \\ 0 & 1 & 0 \end{array}), D = (\begin{array}{c} 1 & 0 & 0 \\ 0 & 2 & 0 \\ 0 & 0 & 1 \end{array}) .

(15)

Since we are assuming $σ_{k} = 1$ for all k, the k th column of B is exactly the stoichiometry vector associated with the k th reaction, and in particular, $B_{k} B_{k}^{⊺} = ζ_{k} ζ_{k}^{⊺}$ .

The full process $X (t)$ has four stochastic transitions and a reduced process $\tilde{X} (t)$ is defined by keeping a subset of the four stochastic transitions. We use the notation $\tilde{X} = X_{(i, j, k)}$ to explicitly define which columns of the full matrix B are neglected in the approximate process, i.e. which stochastic transitions are neglected. We are interested in the accuracy of the approximation of the trajectory itself.

Figure 2 illustrates the deficiency $U_{(i, j)} (t) = X_{(i, j)} (t) - X (t)$ between the full process and all possible two noise source reductions $X_{(i, j)}$ on the 3-state chain, as projected onto each of the three components in the system. The “optimal complexity reduction” is not well defined in general because it is underspecified. For example, asking to reduce the norm of the deficiency U while eliminating two of the four noise sources gives no preference between the six possible reductions. Asking for the best reduction to preserve a specific component may give an answer: to preserve the trajectory as projected onto the first component, keep the two noise sources that directly affect it (transitions between edges 1 and 2); for the third component, keep the other two (transitions between edges 3 and 4); for the second component there is no preference since it is affected directly by all transitions. This gives an intuitive explanation of stochastic shielding consistent with Schmandt and Galán’s explanation.

If we fix a point in the underlying sample space (a choice of four Poisson processes $Y_{k} (t)$ in the system $N (t)$ or a choice of four white noise processes $d W_{k} (t)$ in the system $X (t)$ ) and then choose to neglect the fluctuations in two of the four, i.e. by replacing $Y_{k} (t)$ with $E [Y_{k} (t)]$ or $d W_{k} (t)$ with zero, respectively, then the question is: which choice leads to the most accurate representation of the process as seen by the measurement?

By Lemma 1, we have the following expression for the edge importance terms $R_{k}$ :

R_{k} = \sum_{i = 2}^{3} \sum_{j = 2}^{3} (\frac{- 1}{λ_{i} + λ_{j}}) (M^{⊺} v_{i}) (v_{i}^{⊺} ζ_{k}) (ζ_{k}^{⊺} v_{j}) (v_{j}^{⊺} M) .

(16)

Evaluating this expression for the measurement functional $M = {[0, 0, 1]}^{⊺}$ yields

\begin{aligned} R_{1} & = R_{2} = 0.0417, \\ R_{3} & = R_{4} = 0.2917 . \end{aligned}

Table 2 shows the stationary variance of the discrepancy $M^{⊺} U_{(i, j, k)} = M^{⊺} (X_{(i, j, k)} - X)$ for all possible reduced processes $X_{(i, j, k)}$ . For instance, $X_{(1, 2)}$ is the reduced process that neglects fluctuations in reactions 1 and 2 and the stationary variance of $M^{⊺} U_{(1, 2)}$ is $R_{1} + R_{2} = 0.0833$ . Note that $X_{(1, 2)}$ is the optimal reduced process in terms of the Schmandt and Galán stochastic shielding approximation (among all approximations neglecting exactly two edges) for the 3-state chain.

Table 2 Table of discrepancies $M^{⊺} U_{(i, j, k)} = M^{⊺} (X_{(i, j, k)} - X)$ for the 3-state Markov process. The discrepancy $M^{⊺} U_{(1, 2)}$ (marked by ∗) corresponds to reduced process $X_{(1, 2)}$ projected onto the third component, which is the optimal two-edge-neglecting approximation of X for this example, in agreement with Schmandt and Galán [14]

Full size table

Figure 3 shows the mean squared error as a function of time for $M^{⊺} U_{(i, j)} (t)$ corresponding to the three classes of reduced processes $X_{(i, j)} (t)$ on the 3-state chain (i.e., the classes are $X_{(1, 2)}$ , $X_{(3, 4)}$ , and ${X_{(1, 3)}, X_{(1, 4)}, X_{(2, 3)}, X_{(2, 4)}}$ , corresponding to the three different $M^{⊺} U_{(i, j)} (t)$ values shown in Table 2 above). The error function is shown with the theoretical MSE ( $\sum_{k \in E^{'}} R_{k}$ ) for each case. Therefore, since $M ζ_{1} = M ζ_{2} = 0$ , $M ζ_{3} = 1$ , and $M ζ_{4} = - 1$ we confirm the claim made by Schmandt and Galán [14] that reactions 3 and 4 are important whereas reactions 1 and 2 are unimportant in terms of stochastic shielding for this 3-state example.

3 Analysis of Stochastic Shielding for a Random Graph Ensemble

For any particular Ornstein–Uhlenbeck process on a graph, Lemma 1 provides the edge importance values $R_{k}$ (Eq. 10), which may be used to compute explicitly the contribution to the deficiency made by neglecting any particular reaction, relative to a given measurement vector M. In order to make general observations about the stochastic shielding approximation, we now consider an ensemble of random graphs. The proof of our main result (Theorem 2, restated below) will rely on properties of the joint distribution of components of eigenvectors of L, the graph Laplacian. Previously, we used i and j to refer to the source and destination nodes in a reaction. In this section, we will adapt the notation so that edge k is a reaction from node $l_{-}$ to node $l_{+}$ , denoted by $l_{\pm} (k) \in E$ (see Eq. 17). In this section, i and j will instead index eigenvectors of L.

ζ_{k} = (\begin{array}{c} ζ_{k} (1) \\ ⋮ \\ ζ_{k} (l_{-}) \\ ⋮ \\ ζ_{k} (l_{+}) \\ ⋮ \\ ζ_{k} (n) \end{array}) = (\begin{array}{c} 0 \\ ⋮ \\ - 1 \\ ⋮ \\ 1 \\ ⋮ \\ 0 \end{array}) .

(17)

Because our methods combine heuristic numerical evidence with probabilistic calculations, we use “≈” to represent “heuristic equality”. Where precise order estimates are available, we use “O” notation. For the reader’s convenience, we restate Theorem 2.

Theorem 2 Given an ensemble of symmetric directed graphs $G = (V, E)$ with n nodes satisfying assumptions A0–A5 (see Sect. 3.1), a binary measurement vector $M \in {0, 1}^{n}$ satisfying $0 < \sum_{i} M_{i} \sim O (1)$ as $n \to \infty$ , and a stoichiometry vector $ζ_{k}$ corresponding to the kth reaction, the mean squared error $R_{k}$ resulting from neglecting the kth reaction has expected value

E [R_{k} | M] = \frac{σ_{k}^{2} | M^{⊺} ζ_{k} |}{n C} + O (n^{- q}), as n \to \infty, for some q > 1,

(18)

where the constant C depends on the mean edge weight.

In other words, since

| M^{⊺} ζ_{k} | = {\begin{matrix} 1, & if reaction k connects nodes with different M values, \\ 0, & if reaction k connects nodes with the same M value \end{matrix}

(19)

reactions connecting nodes with identical values of M have a small contribution to the error, so these reactions can be neglected under the stochastic shielding approximation. This result relies on a list of assumptions which are described in detail below. The proof of this theorem requires Lemma 3, which is stated after the assumptions and proved in Appendix C.3.

3.1 Assumptions on the Random Graph Ensemble

We state a sequence of assumptions on the random graph ensemble needed to establish our main result. Each assumption is reasonable for a broad class of graphs of interest, for reasons articulated in the Remarks following each assumption. In several instances we impose on our random graph ensemble, as assumptions, properties that are known to hold for broad classes of random matrices, such as the Wigner ensemble [19, 20]. The ensemble we consider is not equivalent to a generalized Wigner ensemble. Nevertheless, for the reasons detailed below, it appears reasonable, that certain aspects of the eigenvector and eigenvalue distribution may be similar in the two cases.

We consider an ensemble of symmetric directed graphs $G = (V, E)$ with $| V | = n$ . Let $ζ_{k}$ be the stoichiometry vector corresponding to the k th reaction (Eq. 17) and let $(λ_{i}, v_{i})$ denote the eigenpairs of the graph Laplacian $L = {(A - D)}^{⊺}$ listed with eigenvalues in descending order. We assume that the eigenvector components are $l_{2}$ -normalized with mean 0 and variance $1 / n$ , and we assume the following:

A0. (Following [21].) Let $a_{i j} \geq 0$ , the entries of the adjacency matrix, be random variables defined on a common probability space, with ${a_{i j}, 1 \leq i < j \leq n}$ independent (but not necessarily identically distributed), with $a_{i j} = a_{j i}$ , $E [a_{i j}] = μ_{A}$ , $V [a_{i j}] = σ_{A}^{2} > 0$ for all $1 \leq i < j \leq n$ , and ${sup}_{1 \leq i < j \leq n} E | (a_{i j} - μ_{A}) / σ_{A} |^{κ} < \infty$ for some $κ > 0$ .

A1a. The graph is drawn from a random ensemble with the property that the eigenvalues $λ_{i}$ and eigenvectors $v_{i}$ of the associated graph Laplacian are nearly independent. That is, for any $i, j, k, l \in {1, \dots, n}$ and arbitrary measurable functions $f : R^{2} \to R$ and $g : R^{n} \times R^{n} \to R$

\begin{aligned} E [f (λ_{i}, λ_{j}) g (v_{k}, v_{l})] = & E [f (λ_{i}, λ_{j})] E [g (v_{k}, v_{l})] \\ + O (\frac{1}{n^{4}}), as n \to \infty . \end{aligned}

(20)

Remark 1a: Assumption A1 holds for the symmetric Gaussian ensemble as well as for the more general Wigner ensemble [19, 20]. Indeed for these ensembles the eigenvalues and eigenvectors are independent. The weaker assumption, that they are at most weakly correlated, appears reasonable for e.g. the ensemble of graph Laplacians obtained from the symmetric Erdös–Rényi random graph ensemble.

A1b. The graph is drawn from a random ensemble with the property that the joint (eigenvalue, eigenvector) distribution is nearly invariant under permutation of eigenvectors. That is, for any $i, j, k, l \in {1, \dots, n}$

\begin{aligned} E [f (λ_{i}, λ_{j}) g (v_{i}, v_{j})] = & E [f (λ_{i}, λ_{j}) g (v_{k}, v_{l})] \\ + O (\frac{1}{n^{4}}), as n \to \infty . \end{aligned}

(21)

Remark 1b: The symmetric Gaussian and Wigner ensembles are fully invariant under permutation of eigenvectors, and the weaker assumption of near invariance appears reasonable for the Erdös–Rényi ensemble. In particular, the pair $(\frac{- 1}{λ_{i} + λ_{j}})$ , $(M^{⊺} v_{i} v_{i}^{⊺} ζ_{k} ζ_{k}^{⊺} v_{j} v_{j}^{⊺} M)$ appearing in the definition of $R_{k}$ (Lemma 1) are assumed to be approximately uncorrelated. This assumption is reasonable by virtue of the approximate rotational symmetry of the eigenvector distribution under our choice of random graph model, which we expect to be close (heuristically) to the eigenvector distribution of the symmetric Gaussian ensemble [19, 20].

A2. $E [v_{i} (l)] = 0$ for any $i, l \in {1, \dots, n}$ where $v_{i} (l)$ denotes the l th component of the i th eigenvector. Remark 2a: Note that $E [v_{i} {(l)}^{2}] = 1 / n$ by the $l_{2}$ -normalization of the eigenvectors because ${∥ v ∥}_{2} = \sqrt{\sum_{l = 1}^{n} v {(l)}^{2}} = 1$ for each eigenvector v. This normalization leaves a 2-fold ambiguity in the choice of eigenvector v. Since +v and −v both have ${∥ v ∥}_{2} = 1$ , we choose randomly between them so that the first non-zero component is positive with probability $1 / 2$ .^{Footnote 3} Remark 2b: By the symmetry of our random graph ensemble under the symmetric group acting on the change of labels, assumption A2 holds not just for the Gaussian and Wigner ensembles, but for any reasonable symmetric ensemble. In particular, it holds for the symmetric Erdös–Rényi random graph ensemble.

A3. For any $i, j \in {2, \dots, n}$ and $l, l^{'} \in {1, \dots, n}$ ,

a.
$E [v_{i} (l) v_{j} (l^{'})] = O (n^{- 3})$
as $n \to \infty$ , for $i \neq j$ .
b.
$E [v_{i} (l) v_{i} (l^{'})] = O (n^{- 2})$
as $n \to \infty$ , for $l \neq l^{'}$ .

Remark 3: Figure 4 provides numerical evidence for the plausibility of assumption A3 in the Erdös–Rényi case. As described in the figure, the empirical expectation of $v_{i} (l) v_{i} (l^{'})$ scales as $O (n^{- 2})$ for $10 \leq n \leq 1000$ ; over this range the empirical expectation of $v_{i} (l) v_{j} (l^{'})$ , $i \neq j$ , is within machine error (≤10⁻¹⁹) of zero.

A4. For any $i \in {2, \dots, n}$ and $l, l^{'} \in {1, \dots, n}$ ,

a.
$E [v_{i} {(l)}^{4}] = O (n^{- q})$
as $n \to \infty$ , for some $q > 1$ .
b.
$E [v_{i} {(l)}^{2} v_{i} {(l^{'})}^{2}] = O (n^{- 2})$
as $n \to \infty$ , for $l \neq l^{'}$ .

Remark 4: Assumption A4a holds for the Gaussian case for $q = 2$ . For the Erdös–Rényi case, empirically we see that assumption A4a holds for $q \approx 5 / 3$ as shown in Fig. 4. Specifically, empirical evidence suggests that $E [v_{i} {(l)}^{4}] \approx \sqrt{2} n^{- 5 / 3}$ in this case.

A5. Suppose that $p_{1}$ , $p_{2}$ , $p_{3}$ , and $p_{4}$ are nonnegative integers with $\sum_{m = 1}^{4} p_{m} = 4$ , at least three of which are non-zero. Then for any $i \in {2, \dots, n}$ and for any distinct components ${l_{1}, l_{2}, l_{3}, l_{4}}$

E [{(v_{i} (l_{1}))}^{p_{1}} {(v_{i} (l_{2}))}^{p_{2}} {(v_{i} (l_{3}))}^{p_{3}} {(v_{i} (l_{4}))}^{p_{4}}] = O (n^{- 3}) as n \to \infty .

(22)

Remark 5: The reason for this assumption will become clear in the proof of Theorem 2. It is similar in spirit to the four moment theorem for eigenvector components of a Wigner or Gaussian random matrix, different versions have been established by Tao and Vu [20] and Knowles and Yin [19]. Figure 4 provides numerical evidence for the plausibility of assumption A5 in the Erdös–Rényi case.

In addition to assumptions A0–A5 on the random graph ensemble, the statement of Theorem 2 places an assumption on the measurement vector $M \in {0, 1}^{n}$ . This vector contains $n_{1} > 0$ ones and $n_{0} > 0$ zeros such that $n_{1} + n_{0} = n$ . We assume $n_{1} = O (1)$ as $n \to \infty$ , that is, we exclude the case where $n_{1}$ grows without bound as n grows. (If M has the same value for all nodes, the output is constant and the error is identically zero. The expression in Theorem 2 holds trivially so we ignore this case.)

To motivate this assumption, Table 3 shows the total number of states (n) and the number of conducting states ( $n_{1}$ ) for representative ion-channel models. Model refinements driven by empirical evidence have tended to increase the total number of states relative to Hodgkin and Huxley’s original model, without significantly increasing the number of conducting states.

Table 3 Total number of states (n) and number of conducting states ( $n_{1}$ ) for different ion-channel models. Empirically based model refinements have led to increasing numbers of channel states, without dramatically increasing the number of conducting states

Full size table

Although assuming that $n_{1} = O (1)$ is biologically plausible, we make this assumption mainly for technical reasons as indicated in the proof of Theorem 2. We note, however, that in the numerical example in Sect. 3.3, the conclusions of Theorem 2 appear to hold equally well when $n_{1} = n_{2} = n / 2$ .

Lemma 3 If assumptions A0–A5 hold and $M \in {0, 1}^{n}$ satisfies $0 < \sum_{i} M_{i} \sim O (1)$ as $n \to \infty$ . Then as $n \to \infty$ ,

A.
$E [M^{⊺} v_{i} v_{i}^{⊺} ζ_{k}] = E [\sum_{l \in 1_{M}} v_{i} (l) (v_{i} (l_{+}) - v_{i} (l_{-}))] = \frac{1}{n} M^{⊺} ζ_{k} + O (n^{- 2})$
.
B.
$E {[M^{⊺} v_{i} v_{i}^{⊺} ζ_{k}]}^{2} = E {[\sum_{l \in 1_{M}} v_{i} (l) (v_{i} (l_{+}) - v_{i} (l_{-}))]}^{2} = \frac{1}{n^{2}} | M^{⊺} ζ_{k} | + O (n^{- 4})$
.
C.
$E [{(M^{⊺} v_{i} v_{i}^{⊺} ζ_{k})}^{2}] = E [{(\sum_{l \in 1_{M}} v_{i} (l))}^{2} {(v_{i} (l_{+}) - v_{i} (l_{-}))}^{2}] = O (n^{- q})$
for some $q > 1$ .

Note that the exponent $q > 1$ in part C is governed by the fourth moment of the eigenvector components of the graph Laplacian (see assumption A4a). The proof of Lemma 3 is given in Appendix C.3.

3.2 Proof of Main Theorem

Suppose assumptions A0–A5 hold and $M \in {0, 1}^{n}$ satisfies $0 < \sum_{i} M_{i} \sim O (1)$ as $n \to \infty$ . By Lemma 1, $R_{k}$ denotes the contribution of the k th reaction to the deficiency of the approximate process. Given the measurement vector M, we have (exactly)

E [R_{k} | M] = E [σ_{k}^{2} \sum_{i = 2}^{n} \sum_{j = 2}^{n} (\frac{- 1}{λ_{i} + λ_{j}}) (M^{⊺} v_{i} v_{i}^{⊺} ζ_{k}) (ζ_{k}^{⊺} v_{j} v_{j}^{⊺} M)] .

(23)

This expectation is taken over the space of symmetric directed graphs $G = (V, E)$ where edge k is chosen at random from the set of $(\begin{matrix} n \\ 2 \end{matrix})$ possible bidirectional edges. If $l_{\pm} (k) \notin E$ , then $E [R_{k} | M] = 0$ .

If the graph Laplacian were drawn from a symmetric Gaussian ensemble (or Wigner ensemble; see [19, 20]), then the eigenvalues and the eigenvectors would be independent. For other ensembles we impose the weaker condition of near independence (assumption A1a), which in this case means that for each $i \geq 2$ and $j \geq 2$ , we assume

\begin{aligned} E [(\frac{- 1}{λ_{i} + λ_{j}}) (M^{⊺} v_{i} v_{i}^{⊺} ζ_{k}) (ζ_{k}^{⊺} v_{j} v_{j}^{⊺} M)] \\ = E [(\frac{- 1}{λ_{i} + λ_{j}})] E [(M^{⊺} v_{i} v_{i}^{⊺} ζ_{k}) (ζ_{k}^{⊺} v_{j} v_{j}^{⊺} M)] + O (\frac{1}{n^{4}}), as n \to \infty . \end{aligned}

(24)

Under assumption A1b, the joint distribution of eigenvalues and eigenvectors is approximately separable into the product of two measures, one for the eigenvalues and a second for the eigenvectors. In this case the expectation $E [(\frac{- 1}{λ_{i} + λ_{j}})]$ in the sum (23) can be replaced by its average,

S \equiv \frac{1}{{(n - 1)}^{2}} \sum_{i = 2}^{n} \sum_{j = 2}^{n} \frac{- 1}{λ_{i} + λ_{j}},

(25)

to obtain

E [R_{k} | M] = σ_{k}^{2} E [S] E [\sum_{i = 2}^{n} \sum_{j = 2}^{n} (M^{⊺} v_{i} v_{i}^{⊺} ζ_{k}) (ζ_{k}^{⊺} v_{j} v_{j}^{⊺} M)] + O (\frac{1}{n^{2}}) .

(26)

As shown in [21], assumption A0 implies that the empirical eigenvalue distribution for the graph Laplacian L,

{\tilde{F}}_{n} (x) = \frac{1}{n} \sum_{i = 1}^{n} I {\frac{λ_{i} + n μ_{A}}{\sqrt{n} σ_{A}} \leq x},

(27)

converges weakly (with probability one) as $n \to \infty$ to the free convolution γ of the semicircle law, $ρ_{sc} (x) = \frac{1}{2 π} \sqrt{4 - x^{2}} I (| x | \leq 2)$ , with the standard Gaussian, $g (x) = exp [- x^{2} / 2] / \sqrt{2 π}$ . The measure γ becomes concentrated around $λ_{i} \approx - n μ_{A}$ as n grows. In particular, most terms in the sum (Eq. 25) concentrate around $1 / (2 n μ_{A})$ , as $n \to \infty$ . Therefore, by imposing assumption A0 and setting $C = 2 μ_{A}$ , we have $E [S] \to 1 / (n C)$ , as $n \to \infty$ , yielding in the limit

E [R_{k} | M] = \frac{σ_{k}^{2}}{n C} E [\sum_{i = 2}^{n} \sum_{j = 2}^{n} (M^{⊺} v_{i} v_{i}^{⊺} ζ_{k}) (ζ_{k}^{⊺} v_{j} v_{j}^{⊺} M)] + O (\frac{1}{n^{2}}) .

(28)

For the Erdös–Rényi ensemble with n nodes and edge probability p, we have $E [S] \to 1 / (n C)$ for $C = 2 p$ . Figure 5 shows that the sample mean of S over 10 realizations (i.e. 10 different Erdös–Rényi random graph configurations with the same parameters) rapidly approaches $1 / (2 p n)$ , as n increases, for values of p ranging from 0.3 to 0.9. As the factor of $1 / n$ is common across all k, it does not affect the stochastic shielding argument.

To prove Theorem 2, we will show that

\begin{aligned} E [\sum_{i = 2}^{n} \sum_{j = 2}^{n} (M^{⊺} v_{i} v_{i}^{⊺} ζ_{k}) (ζ_{k}^{⊺} v_{j} v_{j}^{⊺} M)] \\ = {\begin{matrix} 1 + O (n^{1 - q}), & | M^{⊺} ζ_{k} | = 1, \\ O (n^{1 - q}), & | M^{⊺} ζ_{k} | = 0, \end{matrix} as n \to \infty, \end{aligned}

(29)

for some $q > 1$ , corresponding to the parameter q appearing in assumption A4. This dichotomy is the basis for neglecting the edges k such that $M^{⊺} ζ_{k} = 0$ , as in the stochastic shielding approximation. To do this, we will use assumption A3a and Lemma 3 to show the following:

E [\sum_{i = 2}^{n} \sum_{j = 2}^{n} (M^{⊺} v_{i} v_{i}^{⊺} ζ_{k}) (ζ_{k}^{⊺} v_{j} v_{j}^{⊺} M)]

(30)

= \sum_{i = 2}^{n} \sum_{j \neq i} E [(M^{⊺} v_{i} v_{i}^{⊺} ζ_{k}) (M^{⊺} v_{j} v_{j}^{⊺} ζ_{k})] + \sum_{i = 2}^{n} E [{(M^{⊺} v_{i} v_{i}^{⊺} ζ_{k})}^{2}]

(31)

\begin{aligned} = \sum_{i = 2}^{n} \sum_{j \neq i} E [M^{⊺} v_{i} v_{i}^{⊺} ζ_{k}] E [M^{⊺} v_{j} v_{j}^{⊺} ζ_{k}] \\ + \sum_{i = 2}^{n} E [{(M^{⊺} v_{i} v_{i}^{⊺} ζ_{k})}^{2}] + O (\frac{1}{n}), as n \to \infty \end{aligned}

(32)

= | M^{⊺} ζ_{k} | + O (n^{1 - q}), as n \to \infty .

(33)

It suffices to show that the first term in Eq. 32 is

\sum_{i = 2}^{n} \sum_{j \neq i} E [M^{⊺} v_{i} v_{i}^{⊺} ζ_{k}] E [M^{⊺} v_{j} v_{j}^{⊺} ζ_{k}] = | M^{⊺} ζ_{k} | + O (\frac{1}{n}), as n \to \infty,

(34)

and the second term is

\sum_{i = 2}^{n} E [{(M^{⊺} v_{i} v_{i}^{⊺} ζ_{k})}^{2}] = O (n^{1 - q}), as n \to \infty .

(35)

Starting with the first term in Eq. 31, it follows from assumption A3a that, as $n \to \infty$ ,

\begin{aligned} E [(M^{⊺} v_{i} v_{i}^{⊺} ζ_{k}) (M^{⊺} v_{j} v_{j}^{⊺} ζ_{k})] \\ = E [M^{⊺} v_{i} v_{i}^{⊺} ζ_{k}] E [M^{⊺} v_{j} v_{j}^{⊺} ζ_{k}] + O (\frac{1}{n^{3}}) \end{aligned}

(36)

which means

\begin{aligned} \sum_{i = 2}^{n} \sum_{j \neq i} E [(M^{⊺} v_{i} v_{i}^{⊺} ζ_{k}) (M^{⊺} v_{j} v_{j}^{⊺} ζ_{k})] \\ = \sum_{i = 2}^{n} \sum_{j \neq i} E [M^{⊺} v_{i} v_{i}^{⊺} ζ_{k}] E [M^{⊺} v_{j} v_{j}^{⊺} ζ_{k}] + O (\frac{1}{n}) . \end{aligned}

(37)

We can expand the left hand side of Eq. 34 by using the definitions $M^{⊺} v_{i} = \sum_{l \in 1_{M}} v_{i} (l)$ and $v_{i}^{⊺} ζ_{k} = v_{i} (l_{+}) - v_{i} (l_{-})$ , which yield

\sum_{i = 2}^{n} \sum_{j \neq i} E [M^{⊺} v_{i} v_{i}^{⊺} ζ_{k}] E [M^{⊺} v_{j} v_{j}^{⊺} ζ_{k}]

(38)

= (n - 1) (n - 2) E [M^{⊺} v_{i} v_{i}^{⊺} ζ_{k}] E [M^{⊺} v_{j} v_{j}^{⊺} ζ_{k}]

(39)

= (n - 1) (n - 2) E {[\sum_{l \in 1_{M}} v_{i} (l) (v_{i} (l_{+}) - v_{i} (l_{-}))]}^{2} .

(40)

By Lemma 3 part B, we have that $E {[\sum_{l \in 1_{M}} v_{i} (l) (v_{i} (l_{+}) - v_{i} (l_{-}))]}^{2} = \frac{1}{n^{2}} | M^{⊺} ζ_{k} | + O (n^{- 4})$ , as $n \to \infty$ . Continuing Eq. 40 above we have

= (n - 1) (n - 2) [\frac{1}{n^{2}} | M^{⊺} ζ_{k} | + O (n^{- 4})]

(41)

= | M^{⊺} ζ_{k} | + O (n^{- 1})

(42)

as $n \to \infty$ , which establishes the first term (Eq. 34).

We now focus on the second term in Eq. 32. In Lemma 3 part C, we establish that as $n \to \infty$

E [{(M^{⊺} v_{i} v_{i}^{⊺} ζ_{k})}^{2}] = E [{(\sum_{l \in 1_{M}} v_{i} (l))}^{2} {(v_{i} (l_{+}) - v_{i} (l_{-}))}^{2}] = O (n^{- q}) .

(43)

Hence, $(n - 1) E [{(\sum_{l \in 1_{M}} v_{i} (l))}^{2} {(v_{i} (l_{+}) - v_{i} (l_{-}))}^{2}] = O (n^{1 - q})$ as $n \to \infty$ , which establishes the second term (Eq. 35). Therefore, we have established Theorem 2.

3.3 Symmetric Erdös–Rényi Random Graph Ensemble

Many varieties of random graphs have been used to describe biological systems [26, 27]. Here, we restrict attention to an ensemble of symmetric Erdös–Rényi random graphs $G (n, p)$ on n nodes, for which each of $(n^{2} - n) / 2$ possible bidirectional edges occurs independently with probability p [28, 29]. Consider a graph drawn from the Erdös–Rényi ensemble for $n = 50$ and $p = 0.5$ . See Fig. 6 for an example. Take A to be the unweighted adjacency matrix ( $α_{k} \in {0, 1}$ ) and let $σ_{k} = 1$ for all reactions k so that the k th column of the matrix B is exactly the stoichiometry vector for reaction k. Specifying any measurement vector $M \in {0, 1}^{50}$ induces a partition of edges into “important” (type 0–1) or “unimportant” (types 0–0 or 1–1) classes. Let $E_{I}$ be the set of important edges and $E_{U}$ be the set of unimportant edges. Clearly, $E = E_{I} \cup E_{U}$ . In the following example, we consider a vector M such that half the entries are 1 and other half are 0.

Theorem 2 says that if the matrix of eigenvector components of the Erdös–Rényi graph Laplacian is sufficiently similar to a random matrix drawn from the Gaussian ensemble (in terms of assumptions A0–A5) then one would expect the partitioning of the $R_{k}$ into two clusters. One cluster, containing the important edges, will be centered at $1 / n$ . A second cluster, containing the unimportant edges, will have smaller $R_{k}$ values ( $O (n^{- q})$ where $q > 1$ is governed by the fourth moment; see assumption A4a in Sect. 3.1). To the extent to which this similarity to the Gaussian ensemble holds, our calculation of $R_{k}$ involves projecting the measurement vector M and the vectors $ζ_{k}$ onto randomly chosen subspaces of $R^{n}$ .

As shown in Fig. 4, assumptions A0–A5 appear to be satisfied for the symmetric Erdös–Rényi random graph ensemble. In particular, the fourth moment of the eigenvector components (assumption A4a) appears to hold empirically for $q \approx 5 / 3$ ; in particular, we find that, empirically, $E [v_{i} {(l)}^{4}] \approx \sqrt{2} n^{- 5 / 3}$ . This behavior suggests that the unimportant edges should have a mean $R_{k}$ value $≲ \sqrt{2} n^{- 5 / 3}$ . Setting $n = 50$ , for example, we would expect one cluster of $R_{k}$ values centered at $1 / 50 = 0.02$ for $k \in E_{I}$ and another cluster close to $\sqrt{2} \cdot 50^{- 5 / 3} \approx 0.0021$ for $k \in E_{U}$ . Figure 7 shows the rank order of edge importance values $R_{k}$ corresponding to the m reactions in the Erdös–Rényi random graph. The top cluster is centered at 0.02 (upper horizontal red line) and the bottom cluster is bounded above by 0.0021 (lower horizontal red line) consistent with Theorem 2 for the Erdös–Rényi random graph ensemble with 50 nodes and edge probability $p = 0.5$ . Since the measurement functional M is binary, we see a significant gap between the two clusters, as expected. If the components of M are graded, i.e. drawn uniformly from the unit interval, then this curve appears to be smooth (see discussion in Sect. 5).

Figure 8 illustrates the distribution of eigenvector components of the Erdös–Rényi graph Laplacian in comparison with a Gaussian random matrix (i.e., each entry has mean 0 and variance $1 / n$ ). The quantile–quantile plots show good agreement within one standard deviation and begin to deviate in the second standard deviation. This is consistent with the observation that the fourth moment in the Erdös–Rényi case deviates from the Gaussian case ( $q \approx 5 / 3$ for Erdös–Rényi and $q = 2$ for Gaussian). Nevertheless, Theorem 2 predicts that there will be two clusters of $R_{k}$ values as described above and shown in Fig. 7 for the Erdös–Rényi case with $n = 50$ and $p = 0.5$ .

4 Application: Stochastic Shielding of Hodgkin–Huxley Channels Under Voltage Clamp

Hodgkin and Huxley’s (HH) model for the generation and propagation of action potentials along the giant axon of the squid Loligo lies at the foundations of modern neuroscience [22, 30]. In the classic HH model, action potentials are generated through the interaction of a leak current and two voltage-gated ionic currents, carried by a sodium ion specific channel and a potassium ion specific channel. The potassium channel comprises four identical subunits that open and close independently with voltage-dependent rates. The channel carries a current when all four subunits are in the open state. At the molecular level, a single channel can be represented as a continuous time Markov jump process on a chain of five states, the fifth of which has non-zero conductance. Of the eight transitions connecting states along this chain, only the last two connect states with different conductances, therefore the stochastic shielding approximation would preserve the fluctuations of these transitions and not the other six.

The sodium channel involves two types of subunits, an activation subunit (“m”) present in three identical copies, and an inactivation subunit (“h”) present in a single copy.^{Footnote 4} The resulting graph has eight distinct states connected by 20 different transitions, each occurring with a voltage-dependent rate [31–33]. Four of these 20 transitions connect states with differing conductance values (zero versus non-zero); the fluctuations of the remaining 16 transitions are ignored under the stochastic shielding approximation.

Schmandt and Galán compared simulations of a system comprising 5000 individual potassium channels and 25000 individual sodium channels, both with and without the stochastic shielding approximation. It is possible to construct an exact simulation scheme, analogous to Gillespie’s stochastic simulation algorithm [34], that takes into account the nonstationarity of the transition rates (propensities) arising from their voltage dependence [35]. However, Schmandt and Galán used a discrete time approximation to this process. Appendix A discusses Schmandt and Galán’s approach in more detail. Here we apply our analysis to evaluate the edge importance $R_{k}$ of each transition in the graph for the classic HH potassium and sodium channels, respectively. Rather than consider the case of time-varying transition rates, we restrict attention to the “voltage clamped” case. If the membrane potential is experimentally held constant for a given cell, the per capita transition rates remain constant and the fluctuating ion-channel population forms a stationary Markov process. In particular, our analysis approximates this stationary population process with a linear multidimensional Ornstein–Uhlenbeck process (see Appendix B); this approximation is reasonable given the large numbers of individual channels considered in Schmandt and Galán’s simulations.

In general, the ion-channel state graphs for the potassium and sodium channels in the HH model have graph Laplacians L that are not symmetric. Therefore, we need to modify our definition of the edge importance $R_{k}$ (Eq. 10) in order to apply our results. When L is not symmetric, we will assume that L is nevertheless diagonalizable, i.e. that there are eigenvalues $λ_{i}$ and a biorthogonal system of vectors $v_{i}$ , $w_{i}$ (right and left eigenvectors) satisfying

\begin{array}{rcl} L v_{i} & = & λ_{i} v_{i}, \\ w_{i}^{⊺} L & = & λ_{i} w_{i}^{⊺}, \\ w_{i}^{⊺} v_{j} & = & δ_{i j} . \end{array}

(44)

In this case the decomposition of L becomes $L = \sum_{i} λ_{i} v_{i} w_{i}^{⊺}$ , and the definition of $R_{k}$ is modified as follows:

R_{k} = σ_{k}^{2} \sum_{i = 2}^{n} \sum_{j = 2}^{n} (\frac{- 1}{λ_{i} + λ_{j}}) (M^{⊺} v_{i}) (w_{i}^{⊺} ζ_{k}) (ζ_{k}^{⊺} w_{j}) (v_{j}^{⊺} M) .

(45)

4.1 Hodgkin–Huxley Potassium Channel

The potassium channel state graph in the Hodgkin–Huxley model is a 5-state chain with one conducting state. Following the tau-leaping construction (Appendix B) we consider a stationary OU process $X (t) \in R^{5}$ , with linear measurement functional $M = {[0, 0, 0, 0, 1]}^{⊺}$ . See Fig. 9 for an illustration of this channel. The corresponding (weighted) adjacency matrix A is

A = (\begin{array}{c} 0 & 4 α_{n} (V) & 0 & 0 & 0 \\ β_{n} (V) & 0 & 3 α_{n} (V) & 0 & 0 \\ 0 & 2 β_{n} (V) & 0 & 2 α_{n} (V) & 0 \\ 0 & 0 & 3 β_{n} (V) & 0 & α_{n} (V) \\ 0 & 0 & 0 & 4 β_{n} (V) & 0 \end{array}),

(46)

which is evidently not symmetric. The voltage-dependent transition rates are given by

α_{n} (V) = \frac{0.01 (V + 55)}{1 - e^{(- 0.1 (V + 55))}},

(47)

β_{n} (V) = 0.125 e^{- (V + 65) / 80} .

(48)

Then the graph Laplacian $L = {(A - D)}^{⊺}$ is voltage-dependent and is given by

L = (\begin{array}{c} - 4 α_{n} (V) & β_{n} (V) & 0 & 0 & 0 \\ 4 α_{n} (V) & - (β_{n} (V) + 3 α_{n} (V)) & 2 β_{n} (V) & 0 & 0 \\ 0 & 3 α_{n} (V) & - 2 (β_{n} (V) + α_{n} (V)) & 3 β_{n} (V) & 0 \\ 0 & 0 & 2 α_{n} (V) & - (3 β_{n} (V) + α_{n} (V)) & 4 β_{n} (V) \\ 0 & 0 & 0 & α_{n} (V) & - 4 β_{n} (V) \end{array}),

since the entries in the diagonal matrix D are the weighted out-degrees of each node for a given voltage V, i.e. $D_{i i} (V) = \sum_{j = 1}^{5} A_{i j} (V)$ . The matrix B is also voltage-dependent. Recall that the k th column of B corresponds to the k th reaction, and this can be written as $σ_{k} (V) ζ_{k}$ . If $r_{k}$ is the per capita rate of reaction k (transition from node $i (k)$ to $j (k)$ ), then $σ_{k} (V) = \sqrt{r_{k} (V) {\bar{N}}_{i} (V)}$ where ${\bar{N}}_{i} (V)$ is the average number of channels at state i at equilibrium for voltage V. Hence, B is given by

B = (\sqrt{r_{1} (V) {\bar{N}}_{i (1)} (V)} ζ_{1}, \dots, \sqrt{r_{k} (V) {\bar{N}}_{i (k)} (V)} ζ_{k}, \dots, \sqrt{r_{m} (V) {\bar{N}}_{i (m)} (V)} ζ_{m}) .

(49)

Figure 10 shows the edge importance $R_{k}$ as a function of voltage for each reaction $k \in {1, \dots, 8}$ in the potassium channel state graph. Note that since the process is at steady state, and respects detailed balance, the mean flux due to the two reactions connecting the same pair of nodes will be equal and opposite. Thus, in this case, $R_{1} = R_{2}$ , $R_{3} = R_{4}$ , $R_{5} = R_{6}$ , and $R_{7} = R_{8}$ . The blue curve ( $R_{7} = R_{8}$ ) corresponds to edges 7 and 8, the transitions between state 4 and conducting state 5, and has the largest edge importance value in the voltage range $[- 100, 100] mV$ . This says that if either or both of these reactions are neglected, they would have the highest contribution to the error.

Physically, it is the current rather than the state occupancy that holds the greatest interest. The current through a population of potassium channels with net conductance g is $I = g (V - V_{k})$ ; here $V_{k} = - 77 mV$ is the potassium reversal potential, and the conductance $g = g^{o} N_{o}$ is the product of the unitary or single channel conductance $g^{o}$ with the total number of channels in the open state, $N_{o}$ . The variance of the current is therefore ${(g^{o} (V - V_{k}))}^{2}$ times the variance of the occupancy number, meaning that near the reversal potential, the current can have low variance even if the channel state has high variance. For convenience we set $g^{o} = 1$ , which amounts to a change of nominal units for measuring the conductance.

Figure 11 shows the variance of the nominal current, $R_{k} * {(V - V_{k})}^{2}$ as a function of voltage V for each reaction k for the potassium channel. In addition to having the highest edge importance curve, the blue curve $R_{7} = R_{8}$ also has the highest variance (left panel). The right panel shows the probability of being in each state as a function of voltage.

4.2 Hodgkin–Huxley Sodium Channel

The sodium channel state graph in the Hodgkin–Huxley model consists of two linked 4-state chains, for a total of eight states, including one conducting state, and 20 reactions. Again following the tau-leaping construction (Appendix B) we consider a stationary OU process $X (t) \in R^{8}$ , with linear measurement functional $M = {[0, 0, 0, 0, 0, 0, 0, 1]}^{⊺}$ . See Fig. 12 for an illustration.

The adjacency matrix in this case is

A = (\begin{array}{c} 0 & 3 α_{m} (V) & 0 & 0 & α_{h} (V) & 0 & 0 & 0 \\ β_{m} (V) & 0 & 2 α_{m} (V) & 0 & 0 & α_{h} (V) & 0 & 0 \\ 0 & 2 β_{m} (V) & 0 & α_{m} (V) & 0 & 0 & α_{h} (V) & 0 \\ 0 & 0 & 3 β_{m} (V) & 0 & 0 & 0 & 0 & α_{h} (V) \\ β_{h} (V) & 0 & 0 & 0 & 0 & 3 α_{m} (V) & 0 & 0 \\ 0 & β_{h} (V) & 0 & 0 & β_{m} (V) & 0 & 2 α_{m} (V) & 0 \\ 0 & 0 & β_{h} (V) & 0 & 0 & 2 β_{m} (V) & 0 & α_{m} (V) \\ 0 & 0 & 0 & β_{h} (V) & 0 & 0 & 3 β_{m} (V) & 0 \end{array}),

(50)

where the voltage-dependent entries are defined by

α_{m} (V) = \frac{0.1 (V + 40)}{1 - e^{- (V + 40) / 10}}, β_{m} (V) = 4 e^{- (V + 65) / 18},

(51)

α_{h} (V) = 0.07 e^{- (V + 65) / 20}, β_{h} (V) = \frac{1}{1 + e^{- (V + 35) / 10}} .

(52)

The graph Laplacian $L = {(A - D)}^{⊺}$ is

\begin{aligned} L = (\begin{array}{c} - D_{11} (V) & β_{m} (V) & 0 & 0 \\ 3 α_{m} (V) & - D_{22} (V) & 2 β_{m} (V) & 0 \\ 0 & 2 α_{m} (V) & - D_{33} (V) & 3 β_{m} (V) \\ 0 & 0 & α_{m} (V) & - D_{44} (V) \\ α_{h} (V) & 0 & 0 & 0 \\ 0 & α_{h} (V) & 0 & 0 \\ 0 & 0 & α_{h} (V) & 0 \\ 0 & 0 & 0 & α_{h} (V) \end{array} \\ \begin{array}{c} β_{h} (V) & 0 & 0 & 0 \\ 0 & β_{h} (V) & 0 & 0 \\ 0 & 0 & β_{h} (V) & 0 \\ 0 & 0 & 0 & β_{h} (V) \\ - D_{55} (V) & β_{m} (V) & 0 & 0 \\ 3 α_{m} (V) & - D_{66} (V) & 2 β_{m} (V) & 0 \\ 0 & 2 α_{m} (V) & - D_{77} (V) & 3 β_{m} (V) \\ 0 & 0 & α_{m} (V) & - D_{88} (V) \end{array}), \end{aligned}

where $D_{i i} (V) = \sum_{j = 1}^{8} A_{i j} (V)$ from the adjacency matrix above (Eq. 50). The matrix B is also voltage-dependent and is given by the general expression in Eq. 49.

Figure 13 shows the edge importance $R_{k}$ as a function of voltage for each reaction $k \in {1, \dots, 20}$ for the sodium channel state graph. The sodium channel also satisfies detailed balance, so each pair of complementary reactions $k_{i}$ , $k_{i + 1}$ connecting the same pair of nodes will have equal edge importance values $R_{k_{i}} = R_{k_{i + 1}}$ . The magenta curve corresponds to edges 11 and 12 and the yellow curve corresponds to edges 19 and 20, which are the transitions between state 7 and conducting state 8, and the transitions between state 4 and conducting state 8, respectively. Note that $R_{11} = R_{12} > R_{k}$ (magenta) for all other reactions k in the voltage range $[- 100, - 25] mV$ and then it switches so that $R_{19} = R_{20} > R_{k}$ (yellow) for all other reactions k in the range $[- 25, 100] mV$ . This means that if any of these four reactions are neglected, they would have the highest contribution to the error.

Figure 14 shows the variance of the nominal current $R_{k} * {(V - V_{k})}^{2}$ as a function of voltage V for each reaction k where $V_{k} = 45 mV$ is the reversal potential for the sodium channel. Again, we choose units for conductance such that the unitary channel conductance equals 1. As before, we see that the edges with the highest edge importance have the largest variance (left panel). The switch between the dominant curves (magenta vs. yellow) agrees with the switch in Fig. 13 which occurs at −25 mV. The right panel in Fig. 14 shows the probability of being in each state and how that changes with voltage.

In summary, our analysis fully supports the accuracy of Schmandt and Galán’s stochastic shielding algorithm for the Hodgkin–Huxley system, at least for the voltage clamped case that we consider. More significantly, our analysis allows one to calculate the relative importance of each transition in a network of first-order reactions, allowing a new quantitative basis for reduction of complexity of stochastic network models. In the case of a simple chain of states such as the Hodgkin–Huxley potassium channel, the rank ordering of transitions by importance $R_{k}$ is the same for all voltages. As shown in Fig. 13, however, for more complicated gating schemes, such as the Hodgkin–Huxley sodium channel, the rank ordering of transitions by importance can differ at different voltages.

For instance, the most important transition at subthreshold voltages ( $V ≲ - 40 mV$ ) is the transition connecting the $[m = (1, 1, 0), h = 1]$ state (state 7 in Fig. 12) to the $[m = (1, 1, 1), h = 1]$ state (state 8, the conducting state). This transition corresponds biophysically to the nonconducting-to-conducting transition that occurs via activation or deactivation [22], that is, the opening (or closing) of the last of three m-activation gates in the ion channel. It is significant that this transition is the most “important” for subthreshold voltages, because the activation transition is typically the last subthreshold event during spike generation.

On the other hand, at suprathreshold voltages the most important transition is that connecting the $[m = (1, 1, 1), h = 1]$ state (state 8) with the $[m = (1, 1, 1), h = 0]$ state (state 4). Biophysically, this transition corresponds to inactivation and deinactivation, or the closing (and opening) of the h-inactivation gate. During action potential generation this transition plays an essential role in terminating the voltage spike upstroke, and it is significant that it should be most “important” at suprathreshold voltages.

For more general channel schemes, and more elaborate stochastic processes in general, the identification of the relative quantitative importance of different transitions or edges to the observable behavior of the system is a powerful new tool for principled complexity reduction.

5 Discussion

In the ongoing race between growth of empirical data sets and growth of available computing power, conceptual understanding of complex dynamical systems can get left behind. Finding efficient lower-dimensional representations of high-dimensional systems, that accurately capture relevant aspects of system behavior, not only takes better advantage of computational resources, but can provide insights into the essential components of a system. Hence, there has been a significant effort in recent years to develop principled complexity reduction techniques for naturally occurring complex networks.

Schmandt and Galán [14] developed a method for efficient simulation of stochastic ion-channel gating in the membrane of a neuron. The random gating of ion channels provides an important class of biological processes which are naturally represented as Markov chains on graphs [33, 35]. The graphs in this case arise from the different configurations of ion-channel subunits or “gates”. Typically each state carries one of two functional labels: open or closed. This coarse-grained representation of the ion channel corresponds to a linear measurement functional, in the sense that current flowing through open channels can be measured experimentally, and individual ion channels typically exhibit binary all-or-none conductance. Schmandt and Galán implemented a novel form of coarse graining technique that ignores fluctuations between indistinguishable transitions (open-to-open or closed-to-closed) while preserving fluctuations between distinguishable states. In order to gain a deeper understanding of why their “stochastic shielding approximation” works so well, we analyzed it in the context of a multidimensional Ornstein–Uhlenbeck process on a variety of networks. First, we showed that this form of model reduction can be represented as a mapping from a many-dimensional sample space to a lower-dimensional sample space, rather than as a mapping from a many-node network to a few-node network, and that one can formulate the problem as a search for the optimal such mapping. Second, we showed that for the specific 3-state example presented in Schmandt and Galán’s paper, their approximation is indeed optimal in a specific sense. Third, we obtained a theoretical result showing that stochastic shielding works for an ensemble of random graphs with arbitrarily chosen binary measurement vectors, analogous to the identification of nodes as conducting versus nonconducting in ion-channel models. Finally, we evaluated the stochastic shielding approach for the graph representing the ion-channel states of the classical Hodgkin–Huxley model, and showed that this approach is optimal for a wide range of fixed voltages under “voltage clamped” conditions.

5.1 Relationship Between Different Levels of Modeling

The underlying description of Schmandt and Galán’s model [14] is given by the population process described in Sect. 2.1, a more general framework than the Ornstein–Uhlenbeck process that we study. The OU process connects to the population process via a tau-leaping approximation, as described in Appendix B. The tau-leaping method involves two key assumptions. First, assuming that the transition propensities $α_{i j (k)}$ do not change dramatically in an interval of length τ, we can approximate the number of transitions in each interval by a collection of independent Poisson processes. This approach is closely related to the framework of Schmandt and Galán, except that they use a binomial distribution instead of a multinomial distribution (see Appendix A). Second, if the expected number of occurrences of each reaction is sufficiently large (i.e. 10 s or 100 s) in time τ, then it is reasonable to use a Gaussian approximation to the Poisson process. The resulting model comprises the standard chemical Langevin formulation, in which the size of the fluctuations associated with each transition is state dependent. These two constraints can always be satisfied by taking a sufficiently large number of individuals in the population. The Ornstein–Uhlenbeck process is obtained by linearizing about the mean field steady state distribution of the tau-leaping model (see Appendix B). The intensity of the noise terms is determined by the mean steady state occupancy of each state, resulting in a linear OU process. A technical obstacle to extending our results beyond the linear OUP setting is the lack of an explicit closed form expression for the stationary covariance of the population process analogous to Eq. 6. Although our analysis is limited to the OU process version of the system, it is reasonable to expect that stochastic shielding will apply more broadly. For example, in the full population process one can decompose the fluxes in the model into a sum of a mean component and a mean zero fluctuating component. In this case, stochastic shielding amounts to setting the fluctuating component to zero while preserving the mean for those transitions connecting observationally equivalent states.

Limiting the investigation to voltage clamped conditions facilitated a more thorough mathematical analysis of the stochastic shielding approximation, but also restricted the biological applicability of the results. By approximating the population process with a closely related Ornstein–Uhlenbeck process we effectively linearized the system about a fixed point given by the mean field behavior. Therefore our analysis does not address important nonlinear dynamical behaviors arising in many physical and biological systems, such as noise driven transport between multiple quasiequilibria, fluctuation induced spiking in excitable systems (including noise induced spiking in nerve cells), or limit cycle oscillations (including regular spiking in nerve cells). On the one hand, we anticipate that transitions in a state graph corresponding to directly observable state changes, such as between conducting and nonconducting ion-channel states, will remain “important” under more general measures accounting for global, nonlinear behaviors. On the other hand, it is certainly possible that additional transitions may also become important with respect to more general measures, if the linear measurement vectors employed here fail to capture their contribution to global dynamics.

5.2 Broader Applications

The stochastic shielding approximation can be directly applied to various biological networks, not just ion-channel models. For instance, Lu et al. [13] describe a signal transduction network in which the phosphorylation and transport events are arranged with a ladder topology. The two sides of the ladder denote molecules in the nucleus and in the cytoplasm, respectively. On each side, there are $M + 1$ species having different levels of phosphorylation (see Fig. 1 of [13] for an illustration). This is a more elaborate Markov process than a simple ion-channel state model, but it can still be described with a binary measurement vector. The readout is 1 if the system is both in the nucleus and in a specific phosphorylated state, and 0 otherwise. The application of stochastic shielding to such a system is quite natural.

Another broad class of examples includes calcium-induced calcium release Markov models. Nguyen, Mathias and Smith [36] studied a stochastic automata network description of instantaneously coupled intracellular calcium channels which they derived from Markov models of single channel gating that include calcium activation, inactivation, or both. This high-dimensional system involves a large number of functional transitions; the transition probabilities of one channel depend on the local calcium concentration which is typically influenced in turn by the state of other channels in the population. Such models can easily become very high dimensional. For example, DeRemigio et al. [37] considered a discrete state continuous time Markov model of coupled calcium channels, taking explicit channel position in to account, which yields up to 1.6 million distinct states. Similarly, in order to investigate the relationship between single-molecule stochastic events and whole-cell behavior, Skupin et al. [38] implemented a multi scale calcium signaling and spike generation model. Their model connects channel state transitions on a millisecond time scale with interspike interval fluctuations on the scale of tens of seconds, and involves a large number of chemical states. For systems of such complexity, any reduction of the complexity of the stochastic process by stochastic shielding will likely be advantageous, both for simulation and for analysis.

We have focused here on discrete state ion-channel models with binary measurement vectors. However, it is possible that some ion channels may have a richer than binary readout structure. For example, Catterall [39] provides structural evidence that activation of a bacterial sodium channel may possess multiple non-equivalent conducting states, raising the possibility that conductance could be graded rather than binary. As another example which could lead to graded measurement vectors, adaptive evolution can be represented as a random walk on a graph representing genomic variants connected by possible mutation routes [40, 41]. While the stochastic process representing the evolution of a human pathogen such as influenza may have an enormous number of degrees of freedom [42, 43], the dynamics of interest may comprise a smaller number of dimensions, such as a strain’s virulence or fitness, which may naturally be graded rather than discrete quantities.

Stochastic shielding in a modified form would still apply even if the measurement functional were graded continuously. As an example, consider an Erdös–Rényi random graph on n nodes with edge probability p, with graded measurement vector $M \in {[0, 1]}^{n}$ instead of binary $M \in {0, 1}^{n}$ . The left panel of Fig. 15 shows the edge importance distribution for the case $n = 50$ and $p = 0.5$ where the components of M are chosen uniformly at random from the unit interval. The right panel of Fig. 15 illustrates the difference in measurement between nodes connected by edge k, $x = | M^{⊺} ζ_{k} |$ , versus the edge importance $R_{k}$ , and shows good agreement with the curve $y \approx x^{2} / n$ for the case $n = 50$ .

This empirical result (Fig. 15, right panel) suggests the following generalization of Theorem 2:

E [R_{k} | M] = \frac{σ_{k}^{2} {(M^{⊺} ζ_{k})}^{2}}{n C} + O (n^{- q}), as n \to \infty,

(53)

for some $q > 1$ (e.g., $q = 2$ for the Gaussian unitary ensemble, and $q \approx 5 / 3$ for the Erdös–Rényi ensemble, empirically). In the case of a binary measurement vector, $M \in {0, 1}$ , this formula would revert to the result given in Theorem 2. A rigorous derivation of Eq. 53 is beyond the scope of the present paper.

The behavior of stochastic processes arising in first-order reaction networks has been explored in broad generality by Cadgil, Lee and Othmer [44]. They used a spectral approach to analyze a general system of first-order reaction networks, and studied the effect of changes in the network topology on the distribution of the number of reactant molecules, as well as the difference between conversion and catalytic networks with the same topology. Exploring sample space reductions conditioned on a linear measurement functional for such general classes of networks would be of interest.

5.3 Different Levels of Model Simplification

Model simplification is an important goal for Markov chain models in many scientific contexts, and complexity reduction has been pursued through a corresponding variety of approaches. Newman and others have extensively developed techniques based on community structure, aggregating or lumping nodes together based on topological considerations [45, 46]. When applied to a stochastic process on a graph, the aggregation of $N ≫ n$ to n nodes is equivalent to a projection of the original process onto a subspace in which the process components on the aggregated fine-grained nodes are averaged. In most cases, the resulting coarsened process is no longer Markov, although in some cases exact dimension reduction to a lower-dimensional Markov processes can be accomplished [47–49]. Other aggregation schemes, such as spectral coarse graining [50–52], have been proposed based on the spectral properties of the graph Laplacian. Approaches based on topological or abstract spectral properties do not necessarily take into account functional properties of the system to be simplified. Because stochastic shielding simplifies the representation of a stochastic process taking into account the function of the system, namely by distinguishing conducting versus nonconducting ion-channel states, it may provide insights not afforded by graph aggregation based on modularity or graph spectra.

As another example of simplification based on functional properties, Bruno, Yang and Pearson [53] used independent open-closed transitions to describe a canonical form that can express all possible reaction schemes for binary ion channels.

Not all prior approaches to simplification of random processes on graphs proceed by aggregating nodes. For instance, Ullah, Bruno and Pearson [54] proposed model simplification by the elimination of nodes with low equilibrium occupancy probability using time scale separation arguments. The reduced system has fewer parameters, and the dynamics of the reduced system are identical to those of the original system except on very fast time scales. Other simplifications based on graph sparsification have been proposed by Koutis, Levin and Peng [55].

In this paper we have investigated a novel form of simplification of stochastic processes on graphs. Stochastic shielding is based on replacing a high-dimensional stochastic process defined on a graph with a lower-dimensional process on the same graph, rather than replacing a complex network with a simpler one. Specifically, we consider mappings from the original process to an approximate process defined on a significantly smaller sample space. In one sense, we can think of the full and a reduced system as two systems with partially shared stochastic input, and partially independent stochastic input of different magnitudes (magnitude zero, in one case). Structurally, this situation is analogous to the kind of mixed common-noise and independent-noise scenario studied in the context of neuronal synchronization [56–58]. In another sense, stochastic shielding can be seen as a different kind of projection, vs. that induced by lumping or pruning nodes. The latter methods simplify the graph, whereas stochastic shielding leaves the graph unchanged and simplifies the sample space on which the approximate process lives.

Appendix A: Stochastic Shielding Construction of Schmandt and Galán

In [14], Schmandt and Galán considered discrete time simulations approximating a continuous time, finite state Markov chain

N_{i} (t) = N_{i} (0) + \sum_{j \neq i} ({\tilde{N}}_{j i} (t) - {\tilde{N}}_{i j} (t)),

(54)

where $N_{i} (t)$ is the number of individuals in a population (of size $N_{tot}$ ) in state i at time t, and ${\tilde{N}}_{i j} (t)$ counts the number of $i \to j$ transitions that have occurred as of time t. The transition counts ${\tilde{N}}_{i j} (t)$ may be written using the random time change representation [17] as

{\tilde{N}}_{i j} (t) = Y_{i j} [\int_{s = 0}^{t} N_{i} (s) α_{i j} (s) d s] .

(55)

By convention we take $N_{i i} (t) \equiv 0$ and $α_{i i} (t) \equiv 0$ . The $Y_{i j}$ are independent unit rate Poisson processes driving the different state-to-state transitions. The transition from state i to state j occurs with per capita rate $α_{i j}$ . In a conductance-based model, such as a discrete stochastic version of the Hodgkin–Huxley equations, the vector $(N_{1} (t), \dots, N_{K} (t))$ would represent the number of ion channels in each of K distinct states, and the transition rates could vary with time, e.g. through dependence on membrane potential or second messenger concentration. Although Schmandt and Galán consider both the stationary and time-varying case, we restrict attention to the stationary case, which corresponds experimentally to a voltage clamped preparation.

One may (approximately) simulate trajectories of the Markov chain using a discrete time step approach. Following [14], we fix a time step $h > 0$ and define $N_{i j}$ as $N_{i j} (t) = {\tilde{N}}_{i j} (t + h) - {\tilde{N}}_{i j} (t)$ , that is, the number of $i \to j$ transitions occurring in the interval $(t, t + h]$ . The net increments in the state-occupancy numbers $N_{i}$ are then given by

Δ_{i} (t) \equiv N_{i} (t + h) - N_{i} (t) = \sum_{j \neq i} N_{j i} (t) - N_{i j} (t) .

(56)

To obtain a practical algorithm, Schmandt and Galán set $N_{i j} (t) \sim Binom [N_{i} (t), α_{i j} (t) h]$ . Since there is then a finite probability that $N_{i} (t + h) < 0$ , one must include an iterative resampling scheme to force $N_{i} (t + h) \geq 0$ . As an alternative, we consider instead a multinomial representation of the destinations of all $N_{i} (t)$ individuals beginning the time step at node i. That is, for each i, $1 \leq i \leq K$ , we set

\begin{aligned} (N_{i 1}, \dots, N_{i i}, \dots, N_{i K}) \\ \sim Multi [N_{i} (t), (α_{i 1} h, \dots, (1 - \sum_{j \neq i} α_{i j} h), \dots, α_{i K} h)] . \end{aligned}

(57)

The multinomial distribution produces an integer-valued random vector with mean and marginal distributions the same as that given by the binomial distribution; the only difference is that transitions emanating from a common node are not assumed to be independent.

The first and second moments arising from the multinomial transition distribution are

E [N_{i j} | \vec{N} (t)] = N_{i} (t) α_{i j} h, for i \neq j,

(58)

E [N_{i i} | \vec{N} (t)] = N_{i} (t) (1 - \sum_{j \neq i} α_{i j} h) = N_{i} (t) - \sum_{j \neq i} E [N_{i j}],

(59)

V [N_{i j} | \vec{N} (t)] = N_{i} (t) α_{i j} h (1 - α_{i j} h), for i \neq j,

(60)

V [N_{i i} | \vec{N} (t)] = N_{i} (t) (\sum_{j \neq i} α_{i j} h) (1 - \sum_{j \neq i} α_{i j} h),

(61)

Cov [N_{i j}, N_{i j^{'}} | \vec{N} (t)] = - N_{i} (t) α_{i j} α_{i j^{'}} h^{2}, for j \neq j^{'}, j \neq i, j^{'} \neq i,

(62)

Cov [N_{i j}, N_{i i} | \vec{N} (t)] = - N_{i} (t) α_{i j} h (1 - \sum_{j^{'} \neq i} α_{i j^{'}} h), for j \neq i .

(63)

Here all expectations are conditioned on the current state of the system,

\vec{N} (t) = (N_{1} (t), \dots, N_{i} (t), \dots, N_{K} (t)) .

The mean increment given the current distribution of the population, ${\bar{Δ}}_{i} (t) \equiv E [Δ_{i} (t) | \vec{N} (t)]$ , is written in terms of the mean transitions as

\begin{aligned} {\bar{Δ}}_{i} (t) & = \sum_{j \neq i} (E [N_{j i} (t) | N_{j} (t)] - E [N_{i j} (t) | N_{i} (t)]) \\ = \sum_{j \neq i} (N_{j} (t) α_{j i} h - N_{i} (t) α_{i j} h) . \end{aligned}

(64)

The deviation of the actual number of $i \to j$ transitions from the expected number is

\begin{aligned} δ Δ_{i} (t) & \equiv Δ_{i} (t) - {\bar{Δ}}_{i} (t) \\ = \sum_{j \neq i} ((N_{j i} (t) - N_{j} (t) α_{j i} h) - (N_{i j} (t) - N_{i} (t) α_{i j} h)) \\ = \sum_{j \neq i} (δ N_{j i} (t) - δ N_{i j} (t)), \end{aligned}

(65)

where $δ N_{i j} (t) = N_{i j} (t) - E [N_{i j} (t) | \vec{N} (t)]$ is the deviation of the number of $i \to j$ transitions from the expected number. The mean of $δ N_{i j} (t)$ is zero for all i, j, and all t, by construction. The stochastic shielding approximation amounts to setting $δ N_{i j} (t)$ to zero for selected $i \to j$ transitions, namely for those transitions between “unobservable states”, or (equivalently) between any two states with the same value of the measurement observable, i.e. the conductance. Since $E [δ N_{i j} (t) | \vec{N} (t)] = 0$ already, the only error introduced by suppressing the fluctuations associated with the $i \to j$ transition comes from the propagation of the fluctuations through the network to the observable states. But the fluctuations in the transitions, $N_{i j}$ , are only weakly correlated with the fluctuations in the occupancy numbers of observable states, $N_{k} (t)$ , when i and j have the same conductance. To introduce this shielding effect, Schmandt and Galán calculate the second moments for the population increments $δ Δ_{i} (t)$ . As an example, in the three node case, for the multinomial transition model, the variances are given by

\begin{array}{rcl} E [δ Δ_{1}^{2} (t) | \vec{N} (t)] & = & V [N_{12} | \vec{N} (t)] + V [N_{21} | \vec{N} (t)] \\ = & N_{1} (t) α_{12} h (1 - α_{12} h) + N_{2} (t) α_{21} h (1 - α_{21} h), \end{array}

(66)

\begin{array}{rcl} E [δ Δ_{2}^{2} (t) | \vec{N} (t)] & = & V [N_{12} | \vec{N} (t)] + V [N_{21} | \vec{N} (t)] + V [N_{23} | \vec{N} (t)] + V [N_{32} | \vec{N} (t)] \\ + 2 Cov [N_{21}, N_{23} | \vec{N} (t)] \end{array}

(67)

\begin{array}{rcl} = & N_{1} (t) α_{12} h (1 - α_{12} h) + N_{2} (t) α_{21} h (1 - α_{21} h) \\ + N_{2} (t) α_{23} h (1 - α_{23} h) + N_{3} (t) α_{32} h (1 - α_{32} h) \\ - 2 N_{2} (t) α_{21} α_{23} h^{2}, \end{array}

(68)

\begin{array}{rcl} E [δ Δ_{3}^{2} (t) | \vec{N} (t)] & = & V [N_{23} | \vec{N} (t)] + V [N_{32} | \vec{N} (t)] \\ = & N_{2} (t) α_{23} h (1 - α_{23} h) + N_{3} (t) α_{32} h (1 - α_{32} h), \end{array}

(69)

and the covariances are given by

\begin{array}{rcl} E [δ Δ_{1} (t) δ Δ_{2} (t) | \vec{N} (t)] & = & - V [N_{12} | \vec{N} (t)] - V [N_{21} | \vec{N} (t)] - Cov [N_{21}, N_{23} | \vec{N} (t)] \\ = & - N_{1} α_{12} h (1 - α_{12} h) - N_{2} α_{21} h (1 - α_{21} h) \\ + N_{2} α_{21} α_{23} h^{2}, \end{array}

(70)

E [δ Δ_{1} (t) δ Δ_{3} (t) | \vec{N} (t)] = Cov [N_{21}, N_{23} | \vec{N} (t)] = - N_{2} (t) α_{21} α_{23} h^{2},

(71)

\begin{array}{rcl} E [δ Δ_{2} (t) δ Δ_{3} (t) | \vec{N} (t)] & = & - V [N_{23} | \vec{N} (t)] - V [N_{32} | \vec{N} (t)] - Cov [N_{21}, N_{23} | \vec{N} (t)] \\ = & - N_{2} α_{23} h (1 - α_{23} h) - N_{3} α_{32} h (1 - α_{32} h) \\ + N_{2} α_{21} α_{23} h^{2} . \end{array}

(72)

Schmandt and Galán obtain similar expressions that agree up to order $O (h)$ ; the difference between the binomial and multinomial expressions only appears in the $O (h^{2})$ terms. For example, they assert that $E [δ Δ_{1} (t) δ Δ_{3} (t) | \vec{N} (t)] \equiv 0$ , while under the multinomial model this covariance is equal to $- N_{2} (t) α_{21} α_{23} h^{2}$ . Fortunately, this difference does not undermine the main argument.

From this point, Schmandt and Galán obtain an expression for the stationary covariance matrix of the reduced process (compare Eq. (8) in [14] with our Lemma 1) and decompose the covariance into a sum over direct and indirect connections to a single conducting or observable state. This situation corresponds, in our analysis, to the case where the measurement vector M contains a single non-zero entry. Schmandt and Galán argue that suppressing the fluctuations associated with transitions not directly affecting the observable state decrease their contribution to the variance of the observable state occupancy, while increasing the contribution of the direct transitions to the same variance. In addition, they show through numerical comparisons that Hodgkin–Huxley equations with a full Markov process and the reduced process are practically indistinguishable both under voltage clamp (stationary transition rates) and current clamp (time-varying transition rates) conditions.

Appendix B: Derivation of Tau-Leaping for an Arbitrary Finite Graph

B.1 Tau-Leaping: General Case

We will use standard tau-leaping arguments [59–61] to derive the multidimensional Ornstein–Uhlenbeck process in Sect. 2.2 (Eq. 4). Given a symmetric directed graph $G = (V, E)$ with n nodes, let $N (t) \in N^{n}$ be the population process (Markov jump process) representing the number of individuals in each of n states at time t. Let $N_{tot} \geq 1$ be the total number of individuals in the system. Recall the random time change representation in terms of Poisson processes given in Eq. 2:

N (t) = N (0) + \sum_{k \in E} ζ_{k} Y_{k} (\int_{0}^{t} α_{k} N_{i (k)} (s) d s) .

(73)

Each $Y_{k}$ is an independent unit rate Poisson process counting the occurrence of reaction k (transition from state $i (k)$ to $j (k)$ ); $α_{k}$ is the per capita transition rate of reaction k; $N_{i (k)} (s)$ is the number of individuals at state $i (k)$ at time s, and $ζ_{k}$ is the stoichiometry vector for reaction k. For simplicity, we will suppress “k” in our notation so that state i means state $i (k)$ .

In the case $N_{tot} = 1$ , let $p_{i} (t) = P (X (t) = i)$ be the probability that a single random walker occupies state i at time t. Clearly, $\sum_{i = 1}^{n} p_{i} (t) = 1$ for each t. The time evolution of the probability vector $p (t) = [p_{1} (t), \dots, p_{n} (t)]$ is given by the following master equation

\frac{d p}{d t} = p L,

(74)

where

L = - \sum_{k \in E^{*}} α_{k} ζ_{k} ζ_{k}^{⊺}

(75)

is the graph Laplacian which can be represented as the sum over all undirected edges (denoted by the set $E^{*}$ ) given in Eq. 75.

Let π represent the steady state distribution, i.e. the row vector satisfying $π L = 0$ with entries such that $\sum_{i = 1}^{n} π_{i} = 1$ . Suppose we represent $N (t)$ as the deviation from its mean, $\bar{N} = π N_{tot}$ , so that $N (t) = \bar{N} + X (t)$ , where $X (t)$ is a mean zero stochastic process. Then

X (t) = N (t) - \bar{N}

(76)

= N (0) - \bar{N} + \sum_{k \in E} ζ_{k} Y_{k} (\int_{0}^{t} α_{k} N_{i} (s) d s)

(77)

= X (0) + \sum_{k \in E} ζ_{k} Y_{k} (\int_{0}^{t} α_{k} [\bar{N_{i}} + X_{i} (s)] d s)

(78)

= X (0) + \sum_{k \in E} ζ_{k} Y_{k} (t α_{k} \bar{N_{i}} + \int_{0}^{t} α_{k} X_{i} (s) d s),

(79)

since $N_{i} (s) = \bar{N_{i}} (s) + X_{i} (s)$ and $α_{k}$ and $\bar{N_{i}}$ are constants.

Now following standard tau-leaping results [59–61],

\begin{array}{rcl} X (t + τ) - X (t) & = & \sum_{k \in E} ζ_{k} [Y_{k} ((t + τ) α_{k} \bar{N_{i}} + \int_{0}^{t + τ} α_{k} X_{i} (s) d s) \\ - Y_{k} (t α_{k} \bar{N_{i}} + \int_{0}^{t} α_{k} X_{i} (s) d s)] \\ \approx & \sum_{k \in E} ζ_{k} {\tilde{Y}}_{k} (τ α_{k} \bar{N_{i}} + τ α_{k} X_{i} (t)) \end{array}

(80)

= \sum_{k \in E} ζ_{k} {\tilde{Y}}_{k} (τ α_{k} [\bar{N_{i}} + X_{i} (t)]),

(81)

which says that we can approximate Eq. 80 using an almost equivalent set of Poisson processes ${\tilde{Y}}_{k}$ where each ${\tilde{Y}}_{k}$ at time t is approximately Gaussian distributed with mean and variance $τ α_{k} [\bar{N_{i}} + X_{i} (t)]$ . Note that if X is a stationary irreducible Markov process on a finite state space, then the occupancy probability of state i, $π_{i} > 0$ . By choosing $N_{tot} ≫ 1 / (min {π_{i}})$ , we may guarantee that the mean population ${\bar{X}}_{i}$ for each i is as large as necessary for the Gaussian approximation to hold. Since we are assuming that $| X_{i} (t) | ≪ {\bar{N}}_{i}$ (uniformly in time), and since we want the noise amplitude to be independent of X, we further approximate

{\tilde{Y}}_{k} (τ α_{k} [\bar{N_{i}} + X_{i} (t)]) \approx N (τ α_{k} [\bar{N_{i}} + X_{i} (t)], τ α_{k} \bar{N_{i}})

(82)

by dropping the dependence of the variance on X.

Dividing by τ and taking the limit as $τ \to \infty$ yields the SDE

d X = \sum_{k \in E} ζ_{k} ([{\bar{N}}_{i} + X_{i}] α_{k} d t + \sqrt{{\bar{N}}_{i} α_{k}} d W_{k}) .

(83)

Recalling that the k th reaction is from node $i (k)$ to $j (k)$ , then the k th reaction in the first term in the RHS of Eq. 83 can be written as

ζ_{k} ({\bar{N}}_{l} + X_{l}) α_{k} d t = {\begin{matrix} - ({\bar{N}}_{l} + X_{l}) α_{k} d t & if component l = i (k), \\ ({\bar{N}}_{i} + X_{i}) α_{k} d t & if component l = j (k), \\ 0 & otherwise. \end{matrix}

(84)

Keeping track of components, we sum over the source and destination nodes for each reaction. Then for the l th component of X we have

d X_{l} = \sum_{i} ({\bar{N}}_{i} + X_{i}) α_{i l} d t - ({\bar{N}}_{l} + X_{l}) \sum_{j} α_{l j} d t

(85)

which yields

d X = (\bar{N} + X) Q d t, where {(Q)}_{i j} = {\begin{matrix} α_{i j} & if i \neq j, \\ - \sum_{j \neq i} α_{i j} & if i = j, \end{matrix}

(86)

where Q is the generator matrix. Note that we changed notation slightly to illustrate that $α_{i j}$ is the transition rate from state i to j rather than indexing by reaction k. The graph Laplacian we consider in Eq. 4 is actually $L = Q^{⊺}$ so we have $d X = L (\bar{N} + X) d t$ . Since $\bar{N}$ is proportional to the stationary distribution π, we have that $L \bar{N} = 0$ , and hence the first term in the SDE is $d X = L X d t$ .

Now the second term in the RHS of Eq. 83 can be written as

ζ_{k} \sqrt{{\bar{N}}_{l} α_{k}} d W_{k} = {\begin{matrix} - \sqrt{{\bar{N}}_{l} α_{k}} d W_{k} & if component l = i (k), \\ \sqrt{{\bar{N}}_{i} α_{k}} d W_{k} & if component l = j (k), \\ 0 & otherwise. \end{matrix}

(87)

Keeping track of components, here we sum over all m reactions to find

d X = (\begin{array}{c} \sqrt{{\bar{N}}_{l (1)} α_{1}} ζ_{1}, \sqrt{{\bar{N}}_{l (2)} α_{2}} ζ_{2}, \dots, \sqrt{{\bar{N}}_{l (m)} α_{m}} ζ_{m} \end{array}) (\begin{array}{c} d W_{1} \\ d W_{2} \\ ⋮ \\ d W_{m} \end{array})

(88)

= B d W,

(89)

where $σ_{k} = \sqrt{{\bar{N}}_{i (k)} α_{k}}$ in the definition of matrix B.

Therefore, putting the first and second terms together, we have derived the OU process $d X = L X d t + B d W$ given in Eq. 4.

B.2 Tau-Leaping: 3-State Example

Here we will explicitly derive the OU process from the population process given in Sect. 2.1 by using the tau-leaping argument above for the 3-state example in Sect. 2.3. We have $N (t) \in N^{3}$ and by Eq. 73,

N_{1} (t) = N_{1} (0) - Y_{1} [\int_{0}^{t} N_{1} (s) α_{1} d s] + Y_{2} [\int_{0}^{t} N_{2} (s) α_{2} d s],

(90)

\begin{array}{rcl} N_{2} (t) & = & N_{2} (0) + Y_{1} [\int_{0}^{t} N_{1} (s) α_{1} d s] - Y_{2} [\int_{0}^{t} N_{2} (s) α_{2} d s] \\ - Y_{3} [\int_{0}^{t} N_{2} (s) α_{3} d s] + Y_{4} [\int_{0}^{t} N_{3} (s) α_{4} d s], \end{array}

(91)

N_{3} (t) = N_{3} (0) + Y_{3} [\int_{0}^{t} N_{2} (s) α_{3} d s] - Y_{4} [\int_{0}^{t} N_{3} (s) α_{4} d s],

(92)

following the notation given in Sect. 2.3, specifically the labeling of reactions given in Table 1. Note that $α_{k}$ could be time dependent $α_{k} (t)$ .

The tau-leaping approximation above gives

\begin{array}{rcl} X_{1} (t) & = & X_{1} (0) - \int_{0}^{t} X_{1} (s) α_{1} d s - \int_{0}^{t} \sqrt{X_{1} (s) α_{1}} d W_{1} (s) \\ + \int_{0}^{t} X_{2} (s) α_{2} d s + \int_{0}^{t} \sqrt{X_{2} (s) α_{2}} d W_{2} (s), \end{array}

(93)

\begin{array}{rcl} X_{2} (t) & = & X_{2} (0) + \int_{0}^{t} X_{1} (s) α_{1} d s + \int_{0}^{t} \sqrt{X_{1} (s) α_{1}} d W_{1} (s) \\ - \int_{0}^{t} X_{2} (s) α_{2} d s - \int_{0}^{t} \sqrt{X_{2} (s) α_{2}} d W_{2} (s) \\ - \int_{0}^{t} X_{2} (s) α_{3} d s - \int_{0}^{t} \sqrt{X_{2} (s) α_{3}} d W_{3} (s) \\ + \int_{0}^{t} X_{3} (s) α_{4} d s + \int_{0}^{t} \sqrt{X_{3} (s) α_{4}} d W_{4} (s), \end{array}

(94)

\begin{array}{rcl} X_{3} (t) & = & X_{3} (0) + \int_{0}^{t} X_{2} (s) α_{3} d s + \int_{0}^{t} \sqrt{X_{2} (s) α_{3}} d W_{3} (s) \\ - \int_{0}^{t} X_{3} (s) α_{4} d s - \int_{0}^{t} \sqrt{X_{3} (s) α_{4}} d W_{4} (s) . \end{array}

(95)

Equivalently, we could write these integral equations in differential form

d X_{1} = - X_{1} α_{1} d t - \sqrt{X_{1} α_{1}} d W_{1} + X_{2} α_{2} d t + \sqrt{X_{2} α_{2}} d W_{2},

(96)

\begin{array}{rcl} d X_{2} & = & X_{1} α_{1} d t + \sqrt{X_{1} α_{1}} d W_{1} - X_{2} α_{2} d t - \sqrt{X_{2} α_{2}} d W_{2} \\ - X_{2} α_{3} d t - \sqrt{X_{2} α_{3}} d W_{3} + X_{3} α_{4} d t + \sqrt{X_{3} α_{4}} d W_{4}, \end{array}

(97)

d X_{3} = X_{2} α_{3} d t + \sqrt{X_{2} α_{3}} d W_{3} - X_{3} α_{4} d t - \sqrt{X_{3} α_{4}} d W_{4} .

(98)

These equations are nonlinear since the noise intensity depends on $X_{i}$ . Note that for any t, $X_{1} (t) + X_{2} (t) + X_{3} (t) = N_{tot}$ so that the total population is constant. The mean $\bar{X}$ satisfies

\frac{d \bar{X}}{d t} = \bar{X} (\begin{array}{c} - α_{1} & α_{1} & 0 \\ α_{2} & - (α_{2} + α_{3}) & α_{3} \\ 0 & α_{4} & - α_{3} \end{array})

(99)

where the matrix above is the generator Q, or our $L^{⊺}$ . In the case where Q is fixed, $\bar{X}$ is proportional to the null left eigenvector of Q; biologically, this is the voltage clamp case. Let $({\bar{X}}_{1}, \dots, {\bar{X}}_{n})$ be the corresponding stationary vector. Now we linearize Eqs. 96–98 around the stationary vector.

Let $V = X - \bar{X}$ and assume that $\frac{| V |}{\bar{X}} ≪ 1$ . Then since $\sqrt{X_{i} α_{k}} = \sqrt{({\bar{X}}_{i} + V_{i}) α_{k}} = \sqrt{{\bar{X}}_{i} α_{k}} + O (\frac{V_{i}}{{\bar{X}}_{i}})$ , we have

d V_{1} = (- V_{1} α_{1} + V_{2} α_{2}) d t - \sqrt{X_{1} α_{1}} d W_{1} + \sqrt{X_{2} α_{2}} d W_{2} + O (\frac{| V |}{N_{tot}}),

(100)

\begin{array}{rcl} d V_{2} & = & (V_{1} α_{1} - V_{2} α_{2} - V_{2} α_{3} + V_{3} α_{4}) d t + \sqrt{V_{1} α_{1}} d W_{1} - \sqrt{V_{2} α_{2}} d W_{2} \\ - \sqrt{V_{2} α_{3}} d W_{3} + \sqrt{V_{3} α_{4}} d W_{4} + O (\frac{| V |}{N_{tot}}), \end{array}

(101)

d V_{3} = (V_{2} α_{3} - V_{3} α_{4}) d t + \sqrt{V_{2} α_{3}} d W_{3} - \sqrt{V_{3} α_{4}} d W_{4} + O (\frac{| V |}{N_{tot}}) .

(102)

Neglecting the $O (\frac{| V |}{N_{tot}})$ terms gives us the multidimensional Ornstein–Uhlenbeck process of Eq. 4 for the 3-state example.

Appendix C: Proofs and Calculations

C.1 Stationary Covariance of a Multidimensional OU Process

The SDE for $X (t)$ in Eq. 4 has the explicit solution (see [18], Chap. 4.5)

X (t) = exp (L t) X (0) + \int_{0}^{t} exp (L (t - t^{'})) B d W (t^{'}) .

(103)

Assuming the initial condition is either deterministic or Gaussian, then $X (t)$ is Gaussian with mean

E [X (t)] = exp (L t) E [X (0)]

(104)

and correlation function

\begin{array}{rcl} Cov [X (t), X^{⊺} (s)] & = & exp (L t) E [X (0), X^{⊺} (0)] exp (L s) \\ + \int_{0}^{t \land s} exp [L (t - t^{'})] B B^{⊺} exp [L^{⊺} (s - t^{'})] d t^{'}, \end{array}

(105)

where $t \land s$ means the minimum of t and s. Setting $s = t$ and taking the limit as $t \to \infty$ , we obtain the stationary covariance function

Cov [X (t), X^{⊺} (t)] = lim_{t \to \infty} \int_{0}^{t} exp [L (t - t^{'})] B B^{⊺} exp [L^{⊺} (t - t^{'})] d t^{'} .

(106)

We exploit the fact that not only does B decompose into the sum $B = \sum_{k = 1}^{m} B_{k}$ , but in the case of a first-order reaction process, $B B^{⊺}$ also decomposes into the following sum:

B B^{⊺} = \sum_{k = 1}^{m} B_{k} B_{k}^{⊺},

(107)

and further, $B_{k} B_{k}^{⊺} = σ_{k}^{2} ζ_{k} ζ_{k}^{⊺}$ for each edge (reaction) $k \in E$ . Therefore, the stationary covariance of the full process decomposes into a sum of the contributions from the m different reaction processes:

Cov [X (t), X^{⊺} (t)] = lim_{t \to \infty} \int_{0}^{t} \sum_{k = 1}^{m} σ_{k}^{2} exp [L (t - t^{'})] ζ_{k} ζ_{k}^{⊺} exp [L^{⊺} (t - t^{'})] d t^{'} .

(108)

We note that the (left) eigenvector corresponding to the leading (0) eigenvalue of L has constant components, therefore it lies in the kernel of the matrix $B_{k} B_{k}^{⊺}$ for each k, which guarantees finite covariance in Eq. 108.

C.2 Computation of Edge Importance $R_{k}$ and Proof of Lemma 1

Using the spectral properties of the graph Laplacian L, we can rewrite the stationary covariance of $X (t)$ (Eq. 106) by replacing each expression involving a matrix exponential by the sum over the orthogonal eigendecomposition of L. Let $v_{i}$ be the i th eigenvector of L (written as a column vector), with eigenvalue $λ_{i}$ , i.e. $L v_{i} = λ_{i} v_{i}$ . Summing over each eigenvalue, we can write $L = \sum_{i = 1}^{n} λ_{i} v_{i} v_{i}^{⊺}$ . Note that this decomposition is only valid when L is symmetric; the non-symmetric case is discussed in Sect. 4. Then we have the following expression from Eq. 106:

exp [L (t - t^{'})] B B^{⊺} exp [L^{⊺} (t - t^{'})]

(109)

= (\sum_{i = 1}^{n} e^{λ_{i} (t - t^{'})} v_{i} v_{i}^{⊺}) B B^{⊺} (\sum_{j = 1}^{n} e^{λ_{j} (t - t^{'})} v_{j} v_{j}^{⊺})

(110)

= \sum_{i, j = 1}^{n} e^{(λ_{i} + λ_{j}) (t - t^{'})} (v_{i} v_{i}^{⊺}) (B B^{⊺}) (v_{j} v_{j}^{⊺}) .

(111)

Using the decomposition of matrix B (Eqs. 5 and 107), it follows that

B B^{⊺} = \sum_{k = 1}^{m} B_{k} B_{k}^{⊺} = \sum_{k = 1}^{m} σ_{k}^{2} ζ_{k} ζ_{k}^{⊺} .

(112)

The covariance of the full process X is therefore given by

\begin{aligned} Cov [X (t), X^{⊺} (t)] \\ = \int_{0}^{t} \sum_{i, j = 1}^{n} e^{(λ_{i} + λ_{j}) (t - t^{'})} (v_{i} v_{i}^{⊺}) (B B^{⊺}) (v_{j} v_{j}^{⊺}) d t^{'} \end{aligned}

(113)

= \sum_{k = 1}^{m} σ_{k}^{2} \sum_{i = 2}^{n} \sum_{j = 2}^{n} (\frac{1 - e^{(λ_{i} + λ_{j}) t}}{- (λ_{i} + λ_{j})}) (v_{i} v_{i}^{⊺}) (ζ_{k} ζ_{k}^{⊺}) (v_{j} v_{j}^{⊺}) .

(114)

By construction of the graph Laplacian, its leading eigenvalue $λ_{1} \equiv 0$ . The corresponding (right) eigenvector has constant components, $v_{1} = {(1, \dots, 1)}^{⊺} / \sqrt{n}$ . Therefore, for each stoichiometry vector we have $ζ_{k}^{⊺} v_{1} \equiv 0$ . Consequently the terms in the inner summation (114) with index $i = 1$ or $j = 1$ vanish, and may be omitted without changing the result. Taking the limit as $t \to \infty$ of the covariance function gives us the stationary covariance

\begin{aligned} Cov [X (t), X^{⊺} (t)] \\ = lim_{t \to \infty} \sum_{k = 1}^{m} σ_{k}^{2} \sum_{i = 2}^{n} \sum_{j = 2}^{n} (\frac{1 - e^{(λ_{i} + λ_{j}) t}}{- (λ_{i} + λ_{j})}) (v_{i} v_{i}^{⊺}) (ζ_{k} ζ_{k}^{⊺}) (v_{j} v_{j}^{⊺}) \end{aligned}

(115)

= \sum_{k = 1}^{m} σ_{k}^{2} \sum_{i = 2}^{n} \sum_{j = 2}^{n} (\frac{- 1}{λ_{i} + λ_{j}}) (v_{i} v_{i}^{⊺}) (ζ_{k} ζ_{k}^{⊺}) (v_{j} v_{j}^{⊺}) .

(116)

Recall that we are interested in the linear measurement functional $M \in R^{n}$ projected onto $X (t)$ , i.e. the projection $Y (t) = M^{⊺} X (t)$ . For edges $k \in E^{'}$ neglected in the approximation $\tilde{Y} = M^{⊺} \tilde{X} (t)$ , we take the limit as $t \to \infty$ of the mean squared error of $\tilde{Y} (t) - Y (t) = M^{⊺} U (t)$ to get

R [E^{'}] = lim_{t \to \infty} E [{∥ (\tilde{Y} (t) - Y (t)) ∥}_{2}^{2}]

(117)

= lim_{t \to \infty} E [{∥ M^{⊺} U (t) ∥}_{2}^{2}]

(118)

= lim_{t \to \infty} (M^{⊺} Cov [U (t), U^{⊺} (t)] M)

(119)

= \sum_{k \in E^{'}} σ_{k}^{2} \sum_{i = 2}^{n} \sum_{j = 2}^{n} (\frac{- 1}{λ_{i} + λ_{j}}) (M^{⊺} v_{i}) (v_{i}^{⊺} ζ_{k}) (ζ_{k}^{⊺} v_{j}) (v_{j}^{⊺} M)

(120)

= \sum_{k \in E^{'}} R_{k} .

(121)

C.3 Proof of Lemma 3

Suppose that assumptions A0–A5 given in Sect. 3.1 hold. We assume that $M \in {0, 1}^{n}$ is an arbitrary measurement vector consisting of $n_{1} > 0$ ones and $n_{0} > 0$ zeros such that $n_{1} + n_{0} = n$ , and $n_{1} = O (1)$ as $n \to \infty$ . That is, we exclude the case where $n_{1}$ grows without bound as n grows. If we look at the corresponding measurement value of the $l_{-}$ th and $l_{+}$ th components of $ζ_{k}$ (see Eq. 17), we have three possible cases:

1.
$l_{\pm} \in 1_{M}$
, i.e. $M (l_{-}) = M (l_{+}) = 1$ ;
2.
$l_{\pm} \notin 1_{M}$
, i.e. $M (l_{-}) = M (l_{+}) = 0$ ;
3.
$l_{-} \in 1_{M}$
and $l_{+} \notin 1_{M}$ , i.e. $M (l_{-}) = 1$ and $M (l_{+}) = 0$ (respectively, $M (l_{-}) = 0$ and $M (l_{+}) = 1$ , equivalent up to a sign change).

For each part of Lemma 3, we will prove the result for these three cases. If we let $n_{1}^{*}$ denote the number of terms in the set $1_{M} ∖ {l_{\pm}}$ , then

n_{1}^{*} = {\begin{matrix} n_{1} - 2, & if l_{\pm} \in 1_{M} (Case 1), \\ n_{1}, & if l_{\pm} \notin 1_{M} (Case 2), \\ n_{1} - 1, & if l_{-} \in 1_{M} and l_{+} \notin 1_{M} (Case 3), \end{matrix}

(122)

and we can consider all three cases at once using this notation where now $n_{1}^{*} = O (1)$ as $n \to \infty$ , by our assumption on M.

Let $a = v_{i} (l_{-})$ , $b = v_{i} (l_{+})$ , and $c = \sum_{l \in 1_{M} ∖ {l_{\pm}}} v_{i} (l)$ . By assumption A2, we have that $E [a] = E [b] = E [c] = 0$ and $E [a^{2}] = E [b^{2}] = n^{- 1}$ from the normalization of the eigenvectors, and it follows from assumption A3b that $E [c^{2}] = (n_{1}^{*}) n^{- 1} + O (n^{- 3})$ , as $n \to \infty$ . Assumption A3 gives conditions on second order terms. Assumptions A4 and A5 give conditions on fourth order moments and fourth order products of a, b, and c.

C.3.1 Proof of Part A

We will show that, as $n \to \infty$ ,

E [M^{⊺} v_{i} v_{i}^{⊺} ζ_{k}] = \frac{1}{n} (M^{⊺} ζ_{k}) + O (\frac{1}{n^{2}}) .

(123)

By definition

E [M^{⊺} v_{i} v_{i}^{⊺} ζ_{k}] = E [\sum_{l \in 1_{M}} v_{i} (l) (v_{i} (l_{+}) - v_{i} (l_{-}))]

(124)

since $M^{⊺} v_{i} = \sum_{l \in 1_{M}} v_{i} (l)$ and $v_{i}^{⊺} ζ_{k} = v_{i} (l_{+}) - v_{i} (l_{-})$ . We compute this expectation for the three cases listed at the beginning of Sect. C.3.

Using the notation introduced above, we note that this expectation has the form

E [(a + b + c) (b - a)] for Case 1,

(125)

E [c (b - a)] for Case 2,

(126)

E [(a + c) (b - a)] for Case 3 .

(127)

Case 1: $l_{\pm} \in 1_{M}$ . Expanding the expected value yields

E [(a + b + c) (b - a)] = E [b^{2} - a^{2} + b c - a c]

(128)

= E [b^{2}] - E [a^{2}] + E [b c] - E [a c]

(129)

= \frac{1}{n} - \frac{1}{n} + E [b c] - E [a c],

(130)

since $E [a^{2}] = E [b^{2}] = n^{- 1}$ by assumption A2 (eigenvector normalization). Note that $E [a c] = E [b c]$ , and each contains $n_{1}^{*}$ terms with the following expectation as $n \to \infty$ :

\begin{array}{rcl} E [a c] & = & E [v_{i} (l_{-}) \sum_{l \in 1_{M} ∖ {l_{\pm}}} v_{i} (l)] \\ = & \sum_{l \in 1_{M} ∖ {l_{\pm}}} E [v_{i} (l_{-}) v_{i} (l)] \\ = & n_{1}^{*} O (n^{- 2}) \end{array}

(131)

= O (n^{- 2}) .

(132)

This follows from the assumptions that, as $n \to \infty$ , $E [v_{i} (l) v_{i} (l^{'})] = O (n^{- 2})$ for $l \neq l^{'}$ (assumption A3b) and $n_{1}^{*} = O (1)$ (by assumption on M). Thus, since $M^{⊺} ζ_{k} = - 1 + 1 = 0$ in this case, as $n \to \infty$ ,

E [(a + b + c) (b - a)] = \frac{1}{n} (M^{⊺} ζ_{k}) + O (n^{- 2}) .

(133)

Case 2: $l_{\pm} \notin 1_{M}$ . Expanding the expected value yields

E [c (b - a)] = E [b c] - E [a c] = O (n^{- 2})

(134)

as $n \to \infty$ , by Eq. 131 in Case 1 above, which follows from assumption A3b and the assumption on M. Thus, since $M^{⊺} ζ_{k} = 0$ in this case, as $n \to \infty$ ,

E [c (b - a)] = \frac{1}{n} (M^{⊺} ζ_{k}) + O (n^{- 2}) .

(135)

Case 3: $l_{-} \in 1_{M}$ and $l_{+} \notin 1_{M}$ . Expanding the expected value yields

E [(a + c) (b - a)] = E [- a^{2} + a b + b c - a c]

(136)

= - E [a^{2}] + E [a b] + E [b c] - E [a c]

(137)

= - \frac{1}{n} + O (n^{- 2})

(138)

as $n \to \infty$ , which follows by Eq. 131 from Case 1 and by the assumptions that $E [v_{i} (l) v_{i} (l^{'})] = O (n^{- 2})$ for $l \neq l^{'}$ (assumption A3b) and $n_{1}^{*} = O (1)$ (by the assumption on M), as $n \to \infty$ . Since $M^{⊺} ζ_{k} = - 1$ in this case, then as $n \to \infty$ ,

E [(a + c) (b - a)] = \frac{1}{n} (M^{⊺} ζ_{k}) + O (n^{- 2}) .

(139)

Similarly, the alternate Case 3 where $l_{+} \in 1_{M}$ and $l_{-} \notin 1_{M}$ gives, as $n \to \infty$ ,

E [(b + c) (b - a)] = E [- a^{2} + a b + b c - a c]

(140)

= E [b^{2}] - E [a b] + E [b c] - E [a c]

(141)

= \frac{1}{n} + O (n^{- 2}),

(142)

and since $M^{⊺} ζ_{k} = 1$ in this case, we have as $n \to \infty$

E [(b + c) (b - a)] = \frac{1}{n} (M^{⊺} ζ_{k}) + O (n^{- 2}) .

(143)

C.3.2 Proof of Part B

We will show that, as $n \to \infty$ ,

E {[M^{⊺} v_{i} v_{i}^{⊺} ζ_{k}]}^{2} = \frac{1}{n^{2}} | M^{⊺} ζ_{k} | + O (\frac{1}{n^{4}}),

(144)

where now we take the absolute value of the term $M^{⊺} ζ_{k}$ . By definition (see Eq. 124), we have

E {[M^{⊺} v_{i} v_{i}^{⊺} ζ_{k}]}^{2} = E {[\sum_{l \in 1_{M}} v_{i} (l) (v_{i} (l_{+}) - v_{i} (l_{-}))]}^{2} .

(145)

Using the notation introduced above, this expectation has the following structure in each case:

E {[(a + b + c) (b - a)]}^{2} for Case 1,

(146)

E {[c (b - a)]}^{2} for Case 2,

(147)

E {[(a + c) (b - a)]}^{2} for Case 3 .

(148)

By Lemma 3 part A, we have, as $n \to \infty$ ,

E [(a + b + c) (b - a)] = 0 + O (n^{- 2}),

(149)

E [c (b - a)] = 0 + O (n^{- 2}),

(150)

E [(a + c) (b - a)] = - \frac{1}{n} + O (n^{- 2}),

(151)

E [(b + c) (b - a)] = \frac{1}{n} + O (n^{- 2}),

(152)

where the last two equations fall under Case 3. Squaring these terms yields, as $n \to \infty$ ,

E {[(a + b + c) (b - a)]}^{2} = 0 + O (n^{- 4}),

(153)

E {[c (b - a)]}^{2} = 0 + O (n^{- 4}),

(154)

E {[(a + c) (b - a)]}^{2} = \frac{1}{n^{2}} + O (n^{- 4}),

(155)

E {[(b + c) (b - a)]}^{2} = \frac{1}{n^{2}} + O (n^{- 4}) .

(156)

In this case, both versions of Case 3 are positive so we multiply $1 / n^{2}$ by $| M^{⊺} ζ_{k} |$ which gives us the desired result in Eq. 144.

C.3.3 Proof of Part C

We will show that, as $n \to \infty$ ,

E [{(M^{⊺} v_{i} v_{i}^{⊺} ζ_{k})}^{2}] = O (n^{- q}) for some q > 1 .

(157)

It follows by definition that

\begin{aligned} E [{(M^{⊺} v_{i} v_{i}^{⊺} ζ_{k})}^{2}] & = E [(M^{⊺} v_{i}) (v_{i}^{⊺} ζ_{k} ζ_{k}^{⊺} v_{i}) (v_{i}^{⊺} M)] \\ = E [{(\sum_{l \in 1_{M}} v_{i} (l))}^{2} {(v_{i} (l_{+}) - v_{i} (l_{-}))}^{2}], \end{aligned}

(158)

since $M^{⊺} v_{i} = v_{i}^{⊺} M = \sum_{l \in 1_{M}} v_{i} (l)$ and $v_{i}^{⊺} ζ_{k} ζ_{k}^{⊺} v_{i} = {(v_{i} (l_{+}) - v_{i} (l_{-}))}^{2}$ .

Note that this term has the following structure in each case:

E [{(a + b + c)}^{2} {(b - a)}^{2}] for Case 1,

(159)

E [c^{2} {(b - a)}^{2}] for Case 2,

(160)

E [{(a + c)}^{2} {(b - a)}^{2}] for Case 3 .

(161)

Expanding the sums above (Eqs. 159–161), we see that all but one term for Cases 2 and 3 also appear in Case 1, and that the term $E [a^{3} b]$ is of smaller order of magnitude than $E [a^{3} c]$ , which appears in Case 1. Thus, it suffices to consider only Case 1. Expanding the sum (Eq. 159) gives

\begin{aligned} E [{(a + b + c)}^{2} {(b - a)}^{2}] \\ = E [(a^{2} + b^{2} + c^{2} + 2 a b + 2 a c + 2 b c) (a^{2} - 2 a b + b^{2})] \end{aligned}

(162)

= E [a^{4} - 2 a^{2} b^{2} + b^{4} + a^{2} c^{2} - 2 a b c^{2} + b^{2} c^{2}

(163)

- 2 a b^{2} c - 2 a^{2} b c + 2 a^{3} c + 2 b^{3} c]

(164)

= E [a^{4}] + E [b^{4}] + O (n^{- 2}), as n \to \infty .

(165)

The leading order terms are $E [a^{4}] = E [b^{4}] = O (n^{- q})$ as $n \to \infty$ for some $q > 1$ by assumption A4a and the term $E [a^{2} b^{2}] = O (n^{- 2})$ as $n \to \infty$ by assumption A4b. Note that all terms involving powers of c carry an extra factor of $n_{1}^{*}$ (or ${(n_{1}^{*})}^{2}$ ), but this does not change the order of magnitude since $n_{1}^{*} = O (1)$ as $n \to \infty$ by our assumption on M. Therefore, the terms $E [a^{2} c^{2}]$ and $E [b^{2} c^{2}]$ are also $O (n^{- 2})$ , as shown below. As $n \to \infty$

E [a^{2} c^{2}] = E [a^{2} {(\sum_{l \in 1_{M} ∖ {l_{\pm}}} v_{i} (l))}^{2}]

(166)

= E [a^{2} v_{i} {(l_{1})}^{2} + \dots + a^{2} v_{i} {(l_{n_{1}^{*}})}^{2} + \sum_{l_{j}, l_{k} \in 1_{M} ∖ {l_{\pm}}, j \neq k} a^{2} v_{i} (l_{j}) v_{i} (l_{k})]

(167)

= n_{1}^{*} E [a^{2} v_{i} {(l_{1})}^{2}] + (\begin{matrix} n_{1}^{*} \\ 2 \end{matrix}) E [a^{2} v_{i} (l_{1}) v_{i} (l_{2})]

(168)

= O (n_{1}^{*} n^{- 2}) + O ({(n_{1}^{*})}^{2} n^{- 3}) by assumptions A4b and A5

(169)

= O (n^{- 2}) since n_{1}^{*} = O (1) .

(170)

The same holds for $E [b^{2} c^{2}]$ since $E [a^{2} c^{2}] = E [b^{2} c^{2}]$ . We can do a similar calculation for $E [a b c^{2}]$ , replacing $a^{2}$ with ab, and noting that assumption A5 holds for terms of the form $a b v_{i} (l_{1})$ and $a b v_{i} (l_{1}) v_{i} (l_{2})$ with distinct eigenvector components. Hence, $E [a b c^{2}] = O (n^{- 3})$ as $n \to \infty$ .

All other cross terms ( $E [a b^{2} c]$ , $E [a^{2} b c]$ , $E [a^{3} c]$ , $E [b^{3} c]$ ) are of order $O (n_{1}^{*} n^{- 3}) = O (n^{- 3})$ as $n \to \infty$ by assumption A5. Therefore, since the leading order terms are $O (n^{- q})$ , it follows that

E [{(a + b + c)}^{2} {(b - a)}^{2}] = O (n^{- q}), as n \to \infty, for some q > 1 .

(171)

Appendix D: Disconnected Graphs

Our general results (Lemma 1 and Theorem 2) implicitly assume that zero is a simple eigenvalue of the graph Laplacian L, or, equivalently, that the graph is irreducible. If we consider a random graph ensemble for which the entries of the adjacency matrix are independent, there can be a strictly positive probability of drawing a disconnected graph. To address this case, suppose the graph $G = (V, E)$ decomposes into G disconnected components, i.e.

G = ⨁_{g = 1}^{G} G_{g}

(172)

where $G_{g} = (V_{g}, E_{g})$ and the g th component contains $n_{g}$ vertices. For each $g \in {1, \dots, G}$ we have the corresponding graph Laplacian $L_{g}$ restricted to the g th component. If we neglect fluctuations associated with edges $E^{'} = ∐_{g = 1}^{G} E_{g}^{'}$ , then the resulting error, $Var [M^{⊺} (\tilde{X} - X)]$ , depends on which component the initial condition $X (0) = x_{0} = \tilde{X} (0)$ belongs to. That is,

R^{g} [E^{'}] \equiv Var [M^{⊺} (\tilde{X} - X) | x_{0} \in V_{g}] = \sum_{k \in {E^{'}}_{g}} R_{k}^{g}

(173)

and

R_{k}^{g} = σ_{k}^{2} \sum_{i = 2}^{n} \sum_{j = 2}^{n} (\frac{- 1}{λ_{i}^{g} + λ_{j}^{g}}) (M^{⊺} {(v^{g})}_{i}) ({(v^{g})}_{i}^{⊺} ζ_{k}) (ζ_{k}^{⊺} {(v^{g})}_{j}) ({(v^{g})}_{j}^{⊺} M) .

(174)

Here the eigenpairs $(λ_{i}^{g}, v_{i}^{g})$ refer to the eigenvalues of the g th Laplacian, and quantities such as $ζ_{k}^{⊺} {(v^{g})}_{j}$ and ${(v^{g})}_{j}^{⊺} M$ are interpreted with vectors $ζ_{k}$ and M restricted to those components that lie in the appropriate subspace of $R^{n}$ .

For the random graph ensembles we consider, the probability of drawing a disconnected graph, $P [\neg C]$ , decreases so rapidly that taking it into account does not affect our main result (Theorem 2). For example, consider the Erdös–Rényi ensemble with fixed edge probability $p \in (0, 1)$ as n grows. It is well known that $p_{n} = ln (n) / n$ is a sharp threshold for connectedness as $n \to \infty$ . E.g. if $p_{n} \geq 2 ln (n) / n$ , then $P (C) \to 1$ as $n \to \infty$ . Here we show that, if $p \in (0, 1)$ is fixed, then $P [\neg C]$ goes to zero faster than any power of p, as $n \to \infty$ .

Draw a graph from the standard Erdös–Rényi ensemble with parameters n and p. We call a subgraph of an isolated k-graph if it is a connected subgraph, with k components, that is disconnected from the rest of the graph. Let $P_{k}$ be the probability that has an isolated k-graph, conditioned on not having any isolated $k^{'}$ -graph for $k^{'} < k$ . Thus $P_{1}$ is the probability that contains an isolated singleton, $P_{2}$ is the probability that contains an isolated pair, given that it does not contain any isolated singletons, and so on. We set $P_{0} \equiv 0$ . If is reducible, then it contains an isolated k-graph for some $1 \leq k \leq [n / 2]$ , where $[\cdot]$ denotes the integer part of its argument. The probability that is disconnected is thus

P [\neg C] = \sum_{k = 1}^{[n / 2]} P_{k} (1 - P_{k - 1}) .

(175)

For any collection of k vertices to be disconnected from the remaining $n - k$ vertices in the graph requires k independent events, each of which has probability ${(1 - p)}^{(n - k)}$ . A crude estimate suffices for our purposes, namely, for all $1 \leq k \leq [n / 2]$ ,

P_{k} (1 - P_{k - 1}) \leq P_{k} \leq n {(1 - p)}^{k (n - k)} \leq n {(1 - p)}^{n - 1} .

(176)

Therefore we may conclude that

P [\neg C] \leq \sum_{k = 1}^{[n / 2]} P_{k} \leq \frac{n^{2}}{2} {(1 - p)}^{n - 1},

(177)

which decays exponentially fast as $n \to \infty$ , for any fixed p in the open interval $(0, 1)$ . For example, in Sect. 3.3 we illustrate our results with a sample taken from the Erdös–Rényi ensemble with $n = 50$ and $p = 0.5$ . The chance of drawing a reducible graph from this ensemble does not exceed $(50^{2}) / (2^{50}) ≲ 2.3 \times 10^{- 12}$ .

Notes

Cf. [14], Supplemental Material Sect. 5.
If $Q = (q_{i j})$ is the generator matrix of the stochastic process on the graph, with $q_{i j} = α_{k}$ whenever k is the edge leading from i to j, then $L = Q^{⊺}$ .
In contrast, Tao and Vu [20] always choose the first non-zero component to be positive to remove this ambiguity.
Modern measurements of purified sodium channel preparations suggest the presence of four activation gates [39]; for consistency with common usage we will restrict attention here to the classical $n^{4}$ potassium channel and $m^{3} h$ sodium channel formulations of the model.

References

Allen L: An Introduction to Stochastic Processes with Applications to Biology. Prentice Hall, Upper Saddle River; 2003.
MATH Google Scholar
Berg HC: Random Walks in Biology. Princeton University Press, Princeton; 1993.
Google Scholar
Calvetti D, Somersalo E: Computational Mathematical Modeling: An Integrated Approach Across Scales. Society for Industrial and Applied Mathematics, Philadelphia; 2013.
Google Scholar
Chennubhotla C, Bahar I: Signal propagation in proteins and relation to equilibrium fluctuations. PLoS Comput Biol 2007, 3(9):1716–1726. 10.1371/journal.pcbi.0030172
Article MathSciNet Google Scholar
Ge H, Qian M, Qian H: Stochastic theory of nonequilibrium steady states. Part II: applications in chemical biophysics. Phys Rep 2012, 510(3):87–118. 10.1016/j.physrep.2011.09.001
Article MathSciNet Google Scholar
Laing C, Lord GJ (Eds): Stochastic Methods in Neuroscience. Oxford University Press, New York; 2010.
MATH Google Scholar
Zhang X-J, Qian H, Qian M: Stochastic theory of nonequilibrium steady states and its applications, part I. Phys Rep 2012, 510(1–2):1–86. 10.1016/j.physrep.2011.09.002
Article MathSciNet Google Scholar
Lugo CA, McKane AJ: Quasicycles in a spatial predator–prey model. Phys Rev E 2008., 78: Article ID 051911 Article ID 051911 10.1103/PhysRevE.78.051911
Google Scholar
Nisbet RM, Gurney WSC: Modeling Fluctuating Populations. Wiley-Interscience, New York; 1982.
Google Scholar
Snyder RE: How demographic stochasticity can slow biological invasions. Ecology 2003, 84(5):1333–1339. 10.1890/0012-9658(2003)084[1333:HDSCSB]2.0.CO;2
Article Google Scholar
Elowitz MB, Levien AJ, Siggia ED, Swain PS: Stochastic gene expression in a single cell. Science 2002, 297(5584):1183–1186. 10.1126/science.1070919
Article Google Scholar
Golding I, Paulsson J, Zawilski SM, Cox EC: Real-time kinetics of gene activity in individual bacteria. Cell 2005, 123(6):1025–1036. 10.1016/j.cell.2005.09.031
Article Google Scholar
Lu T, Shen T, Zong C, Hasty J, Wolynes PG: Statistics of cellular signal transduction as a race to the nucleus by multiple random walkers in compartment/phosphorylation space. Proc Natl Acad Sci USA 2006, 103(45):16752–16757. 10.1073/pnas.0607698103
Article Google Scholar
Schmandt NT, Galán RF: Stochastic-shielding approximation of Markov chains and its application to efficiently simulate random ion-channel gating. Phys Rev Lett 2012., 109(11): Article ID 118101 Article ID 118101
Google Scholar
Higham DJ: Modeling and simulating chemical reactions. SIAM Rev 2008, 50(2):347–368. 10.1137/060666457
Article MATH MathSciNet Google Scholar
Lee C, Othmer H: A multi-time-scale analysis of chemical reaction networks: I. Deterministic systems. J Math Biol 2010, 60: 387–450. 10.1007/s00285-009-0269-4
Article MATH MathSciNet Google Scholar
Anderson DF, Kurtz TG: Continuous time Markov chain models for chemical reaction networks. In Design and Analysis of Biomolecular Circuits: Engineering Approaches to Systems and Synthetic Biology. Edited by: Koeppl H, Densmore D, Setti G, Bernardo M. Springer, New York; 2011:1–44.
Google Scholar
Gardiner CW: Stochastic Methods: A Handbook for the Natural and Social Sciences. 4th edition. Springer, Berlin; 2009.
Google Scholar
Knowles A, Yin J: Eigenvector distribution of Wigner matrices. Probab Theory Relat Fields 2013, 155(3–4):543–582. 10.1007/s00440-011-0407-y
Article MATH MathSciNet Google Scholar
Tao T, Vu V: Random matrices: universal properties of eigenvectors. Random Matrices: Theory Appl 2012., 01(01): Article ID 1150001 Article ID 1150001 10.1142/S2010326311500018
Ding X, Jiang T: Spectral distributions of adjacency and Laplacian matrices of random graphs. Ann Appl Probab 2010, 20(6):2086–2117. 10.1214/10-AAP677
Article MATH MathSciNet Google Scholar
Hodgkin AL, Huxley AF: A quantitative description of membrane current and its application to conduction and excitation in nerve. J Physiol 1952, 117: 500–544.
Article Google Scholar
Raman IM, Bean BP: Inactivation and recovery of sodium currents in cerebellar Purkinje neurons: evidence for two mechanisms. Biophys J 2001, 80(2):729–737. 10.1016/S0006-3495(01)76052-3
Article Google Scholar
Milescu LS, Yamanishi T, Ptak K, Smith JC: Kinetic properties and functional dynamics of sodium channels during repetitive spiking in a slow pacemaker neuron. J Neurosci 2010, 30(36):12113–12127. 10.1523/JNEUROSCI.0445-10.2010
Article Google Scholar
Carter BC, Giessel AJ, Sabatini BL, Bean BP: Transient sodium current at subthreshold voltages: activation by EPSP waveforms. Neuron 2012, 75(6):1081–1093. 10.1016/j.neuron.2012.08.033
Article Google Scholar
Albert R, Barabási A-L: Statistical mechanics of complex networks. Rev Mod Phys 2002, 74: 47–97. 10.1103/RevModPhys.74.47
Article MATH Google Scholar
Durrett R: Random Graph Dynamics. Cambridge University Press, New York; 2007.
MATH Google Scholar
Erdös P, Rényi A: On random graphs. Publ Math (Debr) 1959, 6: 290–297.
MATH Google Scholar
Erdös P, Rényi A: On the evolution of random graphs. Publ Math Inst Hung Acad Sci 1960, 5: 17–61.
MATH Google Scholar
Catterall WA, Raman IM, Robinson HPC, Sejnowski TJ, Paulsen O: The Hodgkin–Huxley heritage: from channels to circuits. J Neurosci 2012, 32: 14064–14073. 10.1523/JNEUROSCI.3403-12.2012
Article Google Scholar
Goldwyn JH, Shea-Brown E: The what and where of adding channel noise to the Hodgkin–Huxley equations. PLoS Comput Biol 2011., 7(11): Article ID e1002247 Article ID e1002247 10.1371/journal.pcbi.1002247
Schneidman E, Freedman B, Segev I: Ion channel stochasticity may be critical in determining the reliability and precision of spike timing. Neural Comput 1998, 10(7):1679–1703. 10.1162/089976698300017089
Article Google Scholar
Skaugen E, Walløe L: Firing behaviour in a stochastic nerve membrane model based upon the Hodgkin–Huxley equations. Acta Physiol Scand 1979, 107(4):343–363. 10.1111/j.1748-1716.1979.tb06486.x
Article Google Scholar
Gillespie DT: Exact stochastic simulation of coupled chemical reactions. J Phys Chem 1977, 81: 2340–2361. 10.1021/j100540a008
Article Google Scholar
Clay JR, DeFelice LJ: Relationship between membrane excitability and single channel open–close kinetics. Biophys J 1983, 42(2):151–157. 10.1016/S0006-3495(83)84381-1
Article Google Scholar
Nguyen V, Mathias R, Smith GD: A stochastic automata network descriptor for Markov chain models of instantaneously coupled intracellular Ca2+ channels. Bull Math Biol 2005, 67: 393–432. 10.1016/j.bulm.2004.08.010
Article MathSciNet Google Scholar
DeRemigio H, LaMar MD, Kemper P, Smith GD: Markov chain models of coupled calcium channels: Kronecker representations and iterative solution methods. Phys Biol 2008., 5(3): Article ID 036003 Article ID 036003
Google Scholar
Skupin A, Kettenmann H, Falcke M: Calcium signals driven by single channel noise. PLoS Comput Biol 2010., 6(8): Article ID e1000870 Article ID e1000870
Google Scholar
Catterall WA: Voltage-gated sodium channels at 60: structure, function and pathophysiology. J Physiol 2012, 590: 2577–2589. 10.1113/jphysiol.2011.224204
Article Google Scholar
Kauffman S, Levin S: Towards a general theory of adaptive walks on rugged landscapes. J Theor Biol 1987, 128(1):11–45. 10.1016/S0022-5193(87)80029-2
Article MathSciNet Google Scholar
McCandlish DM: Visualizing fitness landscapes. Evolution 2011, 65(6):1544–1558. 10.1111/j.1558-5646.2011.01236.x
Article Google Scholar
Kryazhimskiy S, Bazykin GA, Plotkin JB, Plotkin J, Dushoff J: Directionality in the evolution of influenza A haemagglutinin. Proc Biol Sci 2008, 275(1650):2455–2464. 10.1098/rspb.2008.0521
Article Google Scholar
Nelson MI, Holmes EC: The evolution of epidemic influenza. Nat Rev Genet 2007, 8(3):196–205. 10.1038/nrg2053
Article Google Scholar
Gadgil C, Lee CH, Othmer HG: A stochastic analysis of first-order reaction networks. Bull Math Biol 2005, 67: 901–946. 10.1016/j.bulm.2004.09.009
Article MathSciNet Google Scholar
Newman MEJ: Modularity and community structure in networks. Proc Natl Acad Sci USA 2006, 103(23):8577–8582. 10.1073/pnas.0601602103
Article Google Scholar
Newman MEJ: Networks: An Introduction. Oxford University Press, New York; 2010.
Book Google Scholar
Agarwala EK, Chiel HJ, Thomas PJ: Pursuit of food versus pursuit of information in Markov chain models of a perception-action loop. J Theor Biol 2012, 304: 235–272.
Article Google Scholar
Buchholz P: Exact and ordinary lumpability in finite Markov chains. J Appl Probab 1994, 31(1):59–75. 10.2307/3215235
Article MATH MathSciNet Google Scholar
Wei J, Kuo JCW: A lumping analysis in monomolecular reaction systems: analysis of the exactly lumpable system. Ind Eng Chem 1969, 8(1):114–123. 10.1021/i260029a020
Article Google Scholar
Gfeller D: Simplifying complex networks: from a clustering to a coarse graining strategy. PhD thesis. Swiss Federal Institute of Technology in Lausanne; 2007 Gfeller D: Simplifying complex networks: from a clustering to a coarse graining strategy. PhD thesis. Swiss Federal Institute of Technology in Lausanne; 2007
Gfeller D, De Los Rios P: Spectral coarse graining of complex networks. Phys Rev Lett 2007., 99: Article ID 038701 Article ID 038701 10.1103/PhysRevLett.99.038701
Google Scholar
Gfeller D, De Los Rios P: Spectral coarse graining and synchronization in oscillator networks. Phys Rev Lett 2008., 100: Article ID 174104 Article ID 174104 10.1103/PhysRevLett.100.174104
Google Scholar
Bruno WJ, Yang J, Pearson JE: Using independent open-to-closed transitions to simplify aggregated Markov models of ion channel gating kinetics. Proc Natl Acad Sci USA 2005, 102(18):6326–6331. 10.1073/pnas.0409110102
Article Google Scholar
Ullah G, Bruno WJ, Pearson JE: Simplification of reversible Markov chains by removal of states with low equilibrium occupancy. J Theor Biol 2012, 311: 117–129. 10.1016/j.jtbi.2012.07.007
Article MathSciNet Google Scholar
Koutis I, Levin A, Peng R: Improved spectral sparsification and numerical algorithms for SDD matrices. Proceedings of the 29th Annual Symposium on Theoretical Aspects of Computer Science 2012.
Google Scholar
de la Rocha J, Doiron B, Shea-Brown E, Josić K, Reyes A: Correlation between neural spike trains increases with firing rate. Nature 2007, 448(7155):802–806. 10.1038/nature06028
Article Google Scholar
Josić K, Shea-Brown E, Doiron B, de la Rocha J: Stimulus-dependent correlations and population codes. Neural Comput 2009, 21(10):2774–2804. 10.1162/neco.2009.10-08-879
Article MATH MathSciNet Google Scholar
Shea-Brown E, Josić K, de la Rocha J, Doiron B: Correlation and synchrony transfer in integrate-and-fire neurons: basic properties and consequences for coding. Phys Rev Lett 2008., 100(10): Article ID 108102 Article ID 108102
Ethier SN, Kurtz TG: Markov Processes: Characterization and Convergence. Wiley, New York; 2005.
Google Scholar
Gillespie DT: Approximate accelerated stochastic simulation of chemically reacting systems. J Chem Phys 2001, 115: 1716–1733. 10.1063/1.1378322
Google Scholar
Petzold LR, Gillespie DT: Improved leap-size selection for accelerated stochastic simulation. J Chem Phys 2003, 119: 8229–8234. 10.1063/1.1613254
Google Scholar

Download references

Acknowledgements

This work was supported by the National Science Foundation (grant EF-1038677) and also in part by the Mathematical Biosciences Institute and the NSF (grant DMS 0931642), by a grant from the Simons Foundation (#259837 to PJT), and by the Council for the International Exchange of Scholars (CIES). The authors thank R. Galán and N. Schmandt for bringing the stochastic shielding approximation to their attention, and for helpful discussions. Thanks also to H. Chiel for critical comments on the manuscript.

Author information

Authors and Affiliations

Department of Mathematics, Applied Mathematics and Statistics, Case Western Reserve University, Cleveland, OH, 44106, USA
Deena R Schmidt & Peter J Thomas
Department of Biology, Case Western Reserve University, Cleveland, OH, 44106, USA
Deena R Schmidt & Peter J Thomas
Department of Cognitive Science, Case Western Reserve University, Cleveland, OH, 44106, USA
Peter J Thomas

Authors

Deena R Schmidt
View author publications
You can also search for this author in PubMed Google Scholar
Peter J Thomas
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Deena R Schmidt.

Additional information

Competing Interests

The authors declare that they have no competing interests.

Authors’ Contributions

All authors contributed equally to the writing of this paper. All authors read and approved the final manuscript.

Authors’ original submitted files for images

Below are the links to the authors’ original submitted files for images.

Authors’ original file for figure 1

Authors’ original file for figure 2

Authors’ original file for figure 3

Authors’ original file for figure 4

Authors’ original file for figure 5

Authors’ original file for figure 6

Authors’ original file for figure 7

Authors’ original file for figure 8

Authors’ original file for figure 9

Authors’ original file for figure 10

Authors’ original file for figure 11

Authors’ original file for figure 12

Authors’ original file for figure 13

Authors’ original file for figure 14

Authors’ original file for figure 15

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License (https://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Schmidt, D.R., Thomas, P.J. Measuring Edge Importance: A Quantitative Analysis of the Stochastic Shielding Approximation for Random Processes on Graphs. J. Math. Neurosc. 4, 6 (2014). https://doi.org/10.1186/2190-8567-4-6

Download citation

Received: 23 August 2013
Accepted: 24 January 2014
Published: 17 April 2014
DOI: https://doi.org/10.1186/2190-8567-4-6

Measuring Edge Importance: A Quantitative Analysis of the Stochastic Shielding Approximation for Random Processes on Graphs

Abstract

1 Introduction

2 Model

2.1 Connection to the Population Process

2.2 Multidimensional Ornstein–Uhlenbeck Process

2.3 3-State Example

3 Analysis of Stochastic Shielding for a Random Graph Ensemble

3.1 Assumptions on the Random Graph Ensemble

3.2 Proof of Main Theorem

3.3 Symmetric Erdös–Rényi Random Graph Ensemble

4 Application: Stochastic Shielding of Hodgkin–Huxley Channels Under Voltage Clamp

4.1 Hodgkin–Huxley Potassium Channel

4.2 Hodgkin–Huxley Sodium Channel

5 Discussion

5.1 Relationship Between Different Levels of Modeling

5.2 Broader Applications

5.3 Different Levels of Model Simplification

Appendix A: Stochastic Shielding Construction of Schmandt and Galán

Appendix B: Derivation of Tau-Leaping for an Arbitrary Finite Graph

B.1 Tau-Leaping: General Case

B.2 Tau-Leaping: 3-State Example

Appendix C: Proofs and Calculations

C.1 Stationary Covariance of a Multidimensional OU Process

C.2 Computation of Edge Importance R k and Proof of Lemma 1

C.3 Proof of Lemma 3

C.3.1 Proof of Part A

C.3.2 Proof of Part B

C.3.3 Proof of Part C

Appendix D: Disconnected Graphs

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Competing Interests

Authors’ Contributions

Authors’ original submitted files for images

Rights and permissions

About this article

Cite this article

Share this article

Keywords

C.2 Computation of Edge Importance $R_{k}$ and Proof of Lemma 1