Measuring Edge Importance: A Quantitative Analysis of the Stochastic Shielding Approximation for Random Processes on Graphs

Mathematical models of cellular physiological mechanisms often involve random walks on graphs representing transitions within networks of functional states. Schmandt and Galán recently introduced a novel stochastic shielding approximation as a fast, accurate method for generating approximate sample paths from a finite state Markov process in which only a subset of states are observable. For example, in ion-channel models, such as the Hodgkin–Huxley or other conductance-based neural models, a nerve cell has a population of ion channels whose states comprise the nodes of a graph, only some of which allow a transmembrane current to pass. The stochastic shielding approximation consists of neglecting fluctuations in the dynamics associated with edges in the graph not directly affecting the observable states. We consider the problem of finding the optimal complexity reducing mapping from a stochastic process on a graph to an approximate process on a smaller sample space, as determined by the choice of a particular linear measurement functional on the graph. The partitioning of ion-channel states into conducting versus nonconducting states provides a case in point. In addition to establishing that Schmandt and Galán’s approximation is in fact optimal in a specific sense, we use recent results from random matrix theory to provide heuristic error estimates for the accuracy of the stochastic shielding approximation for an ensemble of random graphs. Moreover, we provide a novel quantitative measure of the contribution of individual transitions within the reaction graph to the accuracy of the approximate process.


Introduction
Many biological systems exhibit a combination of stochastic (chance, random, noisy) and deterministic dynamics [1][2][3]. For example, mathematical models involving stochastic processes arise in physiology [4][5][6][7], ecology [8][9][10], and genetic regulatory systems [11][12][13]. Such mathematical models often originate as intrinsically complex, high-dimensional systems with many degrees of freedom, and many sources of variability. This inherent complexity presents two related challenges. First, the essential dynamics of such systems may be hard to discern, and model reduction based on first principles for stochastic systems on complex networks is difficult. Second, in order to predict the behavior of such systems under normal, pathological or experimental conditions, one must usually resort to numerical simulation studies. Even with the tremendous progress in computing power over the last decades, intrinsically high-dimensional stochastic systems remain prohibitive to simulate exhaustively. Moreover, because of their dimensionality, the results of ensembles of stochastic simulations can be challenging to interpret. Therefore, there is demand for efficient dimension reduction methods, both to provide high quality approximate numerical solutions to the stochastic evolution equations arising in high-dimensional systems, and to provide an efficient conceptual framework for interpretation of the behavior of such systems.
In [14], Schmandt and Galán introduced a stochastic shielding approximation as a fast, accurate method for generating sample paths from a finite state Markov process in which only a subset of states are observable. For example, in ion-channel models, such as the Hodgkin-Huxley or other conductance-based neural models, a nerve cell has a population of ion channels whose configurational states comprise the nodes of a graph, only some of which allow a transmembrane current to pass. That is, each vertex of the ion-channel state graph is labeled with a scalar "conductance", which is either zero (nonconducting) or one (conducting). In a population of ion channels, the flux of individual channels making the transition from a state i to a state j is a stochastic process with mean rate, and it has fluctuations around the mean rate that depend on the population at state i. The stochastic shielding approximation consists of neglecting fluctuations associated with edges in the graph not directly affecting the observable states. Specifically, the random fluxes along edges connecting identically labeled states are replaced by the mean fluxes along those edges, while the random fluxes associated with edges connecting distinguishable states are left unchanged. This approximation is an example of complexity reduction, in the sense of reducing a stochastic process generated by K independent processes to a process on a smaller sample space, i.e. generated by K < K processes. Schmandt and Galán observe that, remarkably, the variance of the observable state (the membrane conductance) is almost identical in the reduced and the unreduced system. 1 While the approximate process does not faithfully reproduce all aspects of the full process, it reproduces those features relevant to the neurophysiologist as well as to the larger biological system in which it is embedded.
Here we consider the problem of finding the optimal complexity reducing mapping from a stochastic process on a graph to an approximate process on a smaller sample space, as determined by the choice of a particular linear measurement functional on the graph. The partitioning of ion-channel states into conducting versus nonconducting states provides a case in point. In this paper we establish that Schmandt and Galán's approximation is in fact optimal in a specific sense. We derive a quantitative measure of the contributions of individual edges in the graph to the accuracy of the approximation, relative to the chosen measurement functional. This approach allows quantitative comparison of edge importance, and sheds light on the parametric dependence of relative edge importance, for instance in a voltage-gated ion channel. In addition, we provide heuristic error estimates for the accuracy of the stochastic shielding approximation for an ensemble of symmetric random graphs.
Motivated by [14], we consider a multidimensional Ornstein-Uhlenbeck process on a graph G = (V, E) with n nodes and m edges (reactions), and a linear measurement functional M ∈ R n . We show that the stochastic shielding approximation is the most accurate dimension reduction possible among those neglecting fluctuations in the same number of underlying processes. Neglecting a set of reactions in the full stochastic process X creates an approximate processX which matches the behavior of the full process in the mean but deviates from the full process in the fluctuations.
Extending this idea for an ensemble of symmetric directed graphs G = (V, E), we establish two main results. Lemma 1, our first main result, allows us to find the optimal complexity reducing mapping from a stochastic process on a graph to an approximate process on a smaller sample space, as determined by the measurement M. Neglecting the fluctuations associated with a subset E of the edge set E defines a new processX(t) that deviates from the full process X(t) by an amount that we call the deficiency, U(t) =X(t) − X(t). The observed error, given M, is then M U ; its mean is zero by construction, and its variance is R = E[(M U) 2 ]. In Lemma 1 we provide an exact formula for the contribution of the kth edge to this error. This formula, which arises from a spectral decomposition of the graph Laplacian associated with the full process, gives an explicit criterion for choosing the k most important edges in the graph, for any 0 < k < m.
Our second main result, Theorem 2, applies this criterion to networks generated from a broad class of random graph ensembles with a randomly chosen binary measurement vector M. We show that the importance measures of individual edges cluster tightly around one of two values. For moderately large graphs, these clusters correspond with very high accuracy to Schmandt and Galán's stochastic shielding heuristic; an extremely accurate, reduced complexity approximation is obtained by neglecting fluctuations associated with edges connecting states that are indistinguishable under the measurement M. We illustrate this result with a sample from the Erdös-Rényi random graph ensemble in Sect. 3.3.
The analysis of Schmandt and Galán focused on an accurate, efficient approximation of Markov processes arising from ion-channel models. In Sect. 4 we apply our analysis to processes on two graphs arising from the classical Hodgkin-Huxley system of ion channels: the 5-state model for the voltage-gated potassium channel, and the 8-state model for the voltage-gated sodium channel. In a more general setting, the transition rates connecting adjacent states in these models are voltage-dependent. Here we restrict attention to the stationary case, corresponding biologically to the behavior of the channels under "voltage clamped" conditions. For both the voltagegated potassium and voltage-gated sodium channel state graphs we show that our ranking reproduces the Schmandt-Galán stochastic shielding heuristic over all physiologically relevant voltages. This example also demonstrates that our results apply to graphs with non-symmetric adjacency matrices, as well as to the symmetric case.
In Sect. 5 we discuss possible extensions of our results to examples including signal transduction networks and calcium-induced calcium release models, as well as systems with graded rather than binary measurement functionals.

Connection to the Population Process
We develop our results in the context of stationary Ornstein-Uhlenbeck processes. In contrast, Schmandt and Galán [14] introduced stochastic shielding in the broader context of density dependent random walks on a graph from which our OU process arises as a large population approximation. To set the stage before moving to the OU process framework, we briefly describe a population process on a graph of the type considered by Schmandt and Galán. In particular, we consider a stationary stochastic process on a directed graph G = (V, E) where |V| = n and |E| = m, the number of nodes and edges in the graph, respectively. Each directed edge corresponds to one reaction in the system. The kth edge ij (k) = (i(k), j (k)) ∈ E is defined to start at node i(k) and end at node j (k), so that the kth reaction effects a transition from state i to state j . Following [15,16], we let ζ k be the stoichiometry vector associated with edge ij (k) ∈ E. That is, the ith component of ζ k is −1, the j th component is 1, and all other components are zero. (1) . . .
Under stationary conditions, such as a population of ion channels under voltage clamp, the occupancy numbers of different states of a continuous time Markov pro-cess can be represented as the solution of the stochastic equation obtained from a random time change representation in terms of Poisson processes [17]. If α k gives the instantaneous per capita transition rate from state i(k) to state j (k), then the full Markov process is specified by a collection of independent standard (unit rate) Poisson processes Y k each representing the occurrence of i(k) → j (k) transitions as follows. Letting N(t) ∈ N n be the nonnegative integer-valued vector representing the number of individuals in each of n states, we may write N(t) as a sum of transitions occurring at random times specified by the collection of Y k .
Because each transition preserves the total number of individuals (i.e. the components of ζ k sum to zero for each k), In Appendix B we show that, provided N tot is sufficiently large, we can approximate the deviation of N(t) from its meanN ∈ R n by a multidimensional, Gaussian, Ornstein-Uhlenbeck process X(t) ∈ R n , X(t) ≈ N(t) −N which satisfies a stochastic differential equation of the form given in Eq. 4 below. In particular, we show that X(t) can be approximated by an SDE of the form

Multidimensional Ornstein-Uhlenbeck Process
To obtain our main mathematical result, we consider a multidimensional Ornstein-Uhlenbeck process X ∈ R n on the directed graph G = (V, E) where |V| = n and |E| = m. The state of the system at time t, X(t), satisfies Eq. 3, which we write in the equivalent form Here L = (A − D) is the graph Laplacian (A is the weighted adjacency matrix of G with entries A ij = α k > 0 if there is an edge from node i(k) to j (k) and zero otherwise, and D is the diagonal matrix such that entry D ii = j A ij is the out-degree of node i). B is an n × m matrix, and W ∈ R m is an m-dimensional Brownian motion, i.e. each component dW k represents the increment of an independent standard Brownian motion capturing the fluctuations of the kth reaction about its mean. 2 Matrix B decomposes into a sum over the m reactions such that the kth column of matrix B k = σ k ζ k and all other columns of B k are zero.
The stochastic shielding approximation for a system of the form given in Eq. 4 amounts to preserving the mean, but neglecting the fluctuations, for the processes driving a subset of the reactions, i.e. replacing B with an alternative matrixB obtained by replacing a subset of columns in B with null vectors. The trajectories of the resulting SDE,X(t) (see Eq. 7), are approximations of the trajectories of the full system.
In order to compare different complexity reduction choices, we define the deficiency of an approximation to be the difference between the true and approximate trajectories, U(t) =X(t) − X(t), when projected onto the measurement functional of interest M. As suggested by Schmandt and Galán, the stationary variance of the projection of the deficiency on the measurement vector provides an appropriate measure for comparing the quality of alternative reductions. That is, we use ] as our error measure. We focus on reductions that preserve the behavior of the system (Eq. 4) relative to a given linear measurement functional M ∈ R n . In the case of ion channels, M ∈ {0, 1} n represents the conductance of each channel state. We consider the case of graded rather than binary measurements in Sect. 5. Whether binary or graded, the measurement vector identifies the stochastic process of interest as the projection Y (t) = M X(t).
Formally, we consider two processes X(t) (full process) andX(t) (reduced process) defined on a common probability space (Ω, F t , P ). The sample space Ω = C [0, ∞) n , filtration F t , and Wiener measure P are those associated with m independent copies of the standard Brownian process. The approximate processX(t) has the same sample space Ω and is measurable with respect to the same filtration F t , but also with respect to a smaller filtrationF t ⊂ F t generated by the Wiener processes associated with a subset of edges of the graph. The covariance of the deficiency, then, is well defined in terms of the underlying measure P on the full probability space.
In Appendix C.1 we show the standard result [18] that the stationary covariance matrix of the full process decomposes into a sum of the contributions from the m different reaction processes: Similarly, the variance of the projection Because the (left) eigenvector corresponding to the leading (0) eigenvalue of L has constant components, it is orthogonal to ζ k for each k. (If L is symmetric, the right and left eigenvectors are interchangeable.) Therefore the corresponding eigenspace is contained in the kernel of the matrix B k B k , for each k, which guarantees that the limit on the RHS of Eq. 6 remains finite.
Neglecting a set of reactions E ⊂ E creates an approximate processes,X(t), which matches the behavior of the full process in the mean, but deviates from the full process in the fluctuations. This reduced process satisfies the following SDE dX = LXdt +BdW, whereB = k∈E\E B k sums over the edges we keep. Given the linear measurement functional M ∈ R n above, we define the approximate projectionỸ (t) = M X (t).
Note that in the case of an ion-channel system, M is binary so Y andỸ just pull out the observable (i.e., conducting) states of each system. In Sect. 2.3, for instance, we consider a 3-state chain with one observable state (state 3) as a simple model of an ion-channel system. In that case, M = [0, 0, 1] and Neglecting a subset of reactions also introduces an error in the representation of the measurement Y (t) versusỸ (t) due to the difference between X(t) andX(t). Recall that U(t) =X(t) − X(t) is the deficiency of the reduced model compared to the full model.
It is important to note that the noise source dW that appears in Eqs. 4 and 7 refers to the same noise process W in both cases. The deficiency of the approximation relative to the full process is given by taking the limit of the mean squared error (MSE) of Y − Y (equivalent to the stationary variance ofỸ − Y ), which, as shown in the proof of Lemma 1, is an expression of the sum over all neglected reactions.

Lemma 1
For an irreducible graph with a symmetric Laplacian L, let X andX be the full and reduced processes defined by Eqs. 4 and 7, respectively, and let M ∈ R n . Let E ⊂ E be the subset of edges neglected in the definition ofX. Let L be diag- listed with eigenvalues λ i in order of decreasing real part and v i 2 = 1. Then the stationary variance of the discrepancỹ where We can rank the error terms R k in descending order, thereby ordering the corresponding reactions in terms of their "importance". The most important reaction is the one with the largest value of R k ; if neglected, it would introduce the largest error. See Appendix C.2 for the proof of Lemma 1. Note that an individual term in the sum (10) will be zero if either ζ k ⊥ v i or if M ⊥ v i for a given eigenvector v i . Typically, however, these vectors will not be orthogonal. Therefore, it is of interest to know how the values of R k are distributed for different examples: graphs of actual ion-channel states such as those in the classical Hodgkin-Huxley model, and more generally, ensembles of random graphs. In Sect. 4, we compute the distribution of R k for the graphs of the potassium and sodium channel states in the Hodgkin-Huxley model. In Sect. 3, we consider an ensemble of random graphs such as the Erdös-Rényi ensemble with randomly assigned binary measurement vector M and prove our main result, which is a statement about the expected value of R k . Should our random graph ensemble produce a graph that does not consist of a single connected component, then we may apply Lemma 1 to each isolated component of the graph separately. However, for the random graph ensembles we consider, the probability of drawing a disconnected graph decays very rapidly as n → ∞. We discuss this point further in Appendix D.
For a random graph ensemble, the eigenvectors of the graph Laplacian are distributed randomly on the unit sphere [19,20]. Hence, they are unlikely to be exactly orthogonal to either ζ k or M. Given a series of assumptions (see Sect. 3.1) that are true for naturally occurring random ensembles such as the symmetric Gaussian and Erdös-Rényi ensembles, we state our main result.
where the constant C depends on the mean edge weight.
This result shows that the edges in the graph naturally decompose into two classes, distinguished by their asymptotic behavior for large n. The first class of edges represents connections between differently labeled nodes, in terms of the measurement vector M. The first class comprises the "important" edges in the graph, in the sense that these edges have mean R k values that scale as order 1/n. The second class of edges connects identically labeled nodes. These edges have mean R k values of order less than n −q , where q > 1 is driven by the fourth moment of the eigenvector components (see assumption A4a in Sect. 3.1 for details). As n increases, these edges become relatively "unimportant" and, hence, can be neglected under the stochastic shielding approximation with minimal loss of accuracy. For the case of the Gaussian ensemble, q = 2. Empirically, for the Erdös-Rényi random graph ensemble, q ≈ 5/3 (see discussion in Sect. 3.3 and also Fig. 4). The proof of Theorem 2 is given in Sect. 3.2. Before discussing more complicated examples, we illustrate the decomposition of the full process into approximate subprocesses for a simple 3-state example in the next subsection.

3-State Example
We illustrate Schmandt and Galán's [14] stochastic shielding heuristic with the following simple example they considered. Figure 1 shows a 3-state chain which has adjacency matrix entries A ij = α k = 1 if there is an edge from i(k) to j (k) and zero otherwise. State 3 is designated as the only observable state. We think of this as the conducting state in an ion-channel model. Table 1 illustrates the notation introduced in Eq. 1 for this case. Graph with three nodes and four reactions (edges) such that a transition from state i(k) to state j (k) happens at rate α k . For this example, we assume that only state 3 is observed. This is the system shown in Fig. 1 of Schmandt and Galán [14] Table 1 Indexing of nodes and edges for the 3-state process, cf. Eq. 1 and Fig. 1. The first column gives the reaction number, the middle column gives the direction of the reaction, and the last column gives the contribution of the reaction to the measurement Y = M X In this case, we suppose σ k = 1 in the matrix B and use the linear measurement functional M = [0, 0, 1] to pull out the third component of X(t), yielding the projection gives the occupancy of the system states at time t and satisfies the constant coefficient SDE given in Eq. 4 with where the W k (t) are independent and identically distributed standard Brownian motions, and Since we are assuming σ k = 1 for all k, the kth column of B is exactly the stoichiometry vector associated with the kth reaction, and in particular, B k B k = ζ k ζ k . The full process X(t) has four stochastic transitions and a reduced processX(t) is defined by keeping a subset of the four stochastic transitions. We use the notation  (3,4) is essentially zero which shows that reduced process X (3,4) is optimal for preserving the accuracy of the first component. Second panel: no reduced process is optimal for preserving the accuracy of the second component. Third panel: U (1,2) is essentially zero which shows that X (1,2) is optimal for preserving the accuracy of the third component (the conducting state in our 3-state example). Bottom panel: squared norm of the deficiency U (i,j ) 2 = X (i,j ) − X 2 X = X (i,j,k) to explicitly define which columns of the full matrix B are neglected in the approximate process, i.e. which stochastic transitions are neglected. We are interested in the accuracy of the approximation of the trajectory itself. Figure 2 illustrates the deficiency U (i,j ) (t) = X (i,j ) (t) − X(t) between the full process and all possible two noise source reductions X (i,j ) on the 3-state chain, as projected onto each of the three components in the system. The "optimal complexity reduction" is not well defined in general because it is underspecified. For example, asking to reduce the norm of the deficiency U while eliminating two of the four noise sources gives no preference between the six possible reductions. Asking for the best reduction to preserve a specific component may give an answer: to preserve the trajectory as projected onto the first component, keep the two noise sources that directly affect it (transitions between edges 1 and 2); for the third component, keep the other two (transitions between edges 3 and 4); for the second component there is no preference since it is affected directly by all transitions. This gives an intuitive explanation of stochastic shielding consistent with Schmandt and Galán's explanation.
for the 3-state Markov process. The discrepancy M U (1,2) (marked by * ) corresponds to reduced process X (1,2) projected onto the third component, which is the optimal two-edge-neglecting approximation of X for this example, in agreement with Schmandt and Galán [14] M U (i,j,k) R k Value M U (2,3,4) If we fix a point in the underlying sample space (a choice of four Poisson processes Y k (t) in the system N(t) or a choice of four white noise processes dW k (t) in the system X(t)) and then choose to neglect the fluctuations in two of the four, i.e. by replacing Y k (t) with E[Y k (t)] or dW k (t) with zero, respectively, then the question is: which choice leads to the most accurate representation of the process as seen by the measurement?
By Lemma 1, we have the following expression for the edge importance terms R k : Evaluating this expression for the measurement functional M = [0, 0, 1] yields Table 2 shows the stationary variance of the discrepancy M U (i,j,k) = M (X (i,j,k) − X) for all possible reduced processes X (i,j,k) . For instance, X (1,2) is the reduced process that neglects fluctuations in reactions 1 and 2 and the stationary variance of M U (1,2) is R 1 + R 2 = 0.0833. Note that X (1,2) is the optimal reduced process in terms of the Schmandt and Galán stochastic shielding approximation (among all approximations neglecting exactly two edges) for the 3-state chain. Figure 3 shows the mean squared error as a function of time for M U (i,j ) (t) corresponding to the three classes of reduced processes X (i,j ) (t) on the 3-state chain (i.e., the classes are X (1,2) , X (3,4) , and {X (1,3) , X (1,4) , X (2,3) , X (2,4) }, corresponding to the three different M U (i,j ) (t) values shown in Table 2 above). The error function is shown with the theoretical MSE ( k∈E R k ) for each case. Therefore, since  (1,2) has the smallest MSE out of the three classes of 2-noise source reduced process, showing that, as observed by Schmandt and Galán, X (1,2) is optimal at preserving the accuracy of the full process with respect to the third component of the system Mζ 1 = Mζ 2 = 0, Mζ 3 = 1, and Mζ 4 = −1 we confirm the claim made by Schmandt and Galán [14] that reactions 3 and 4 are important whereas reactions 1 and 2 are unimportant in terms of stochastic shielding for this 3-state example.

Analysis of Stochastic Shielding for a Random Graph Ensemble
For any particular Ornstein-Uhlenbeck process on a graph, Lemma 1 provides the edge importance values R k (Eq. 10), which may be used to compute explicitly the contribution to the deficiency made by neglecting any particular reaction, relative to a given measurement vector M. In order to make general observations about the stochastic shielding approximation, we now consider an ensemble of random graphs. The proof of our main result (Theorem 2, restated below) will rely on properties of the joint distribution of components of eigenvectors of L, the graph Laplacian. Previously, we used i and j to refer to the source and destination nodes in a reaction. In this section, we will adapt the notation so that edge k is a reaction from node l − to node l + , denoted by l ± (k) ∈ E (see Eq. 17). In this section, i and j will instead index eigenvectors of L. (1) . . .
Because our methods combine heuristic numerical evidence with probabilistic calculations, we use "≈" to represent "heuristic equality". Where precise order estimates are available, we use "O" notation. For the reader's convenience, we restate Theorem 2.

Theorem 2 Given an ensemble of symmetric directed graphs
as n → ∞, and a stoichiometry vector ζ k corresponding to the kth reaction, the mean squared error R k resulting from neglecting the kth reaction has expected value where the constant C depends on the mean edge weight.
In other words, since reactions connecting nodes with identical values of M have a small contribution to the error, so these reactions can be neglected under the stochastic shielding approximation. This result relies on a list of assumptions which are described in detail below. The proof of this theorem requires Lemma 3, which is stated after the assumptions and proved in Appendix C.3.

Assumptions on the Random Graph Ensemble
We state a sequence of assumptions on the random graph ensemble needed to establish our main result. Each assumption is reasonable for a broad class of graphs of interest, for reasons articulated in the Remarks following each assumption. In several instances we impose on our random graph ensemble, as assumptions, properties that are known to hold for broad classes of random matrices, such as the Wigner ensemble [19,20]. The ensemble we consider is not equivalent to a generalized Wigner ensemble. Nevertheless, for the reasons detailed below, it appears reasonable, that certain aspects of the eigenvector and eigenvalue distribution may be similar in the two cases.
We consider an ensemble of symmetric directed graphs G = (V, E) with |V| = n. Let ζ k be the stoichiometry vector corresponding to the kth reaction (Eq. 17) and let (λ i , v i ) denote the eigenpairs of the graph Laplacian L = (A − D) listed with eigenvalues in descending order. We assume that the eigenvector components are l 2normalized with mean 0 and variance 1/n, and we assume the following: A0. (Following [21].) Let a ij ≥ 0, the entries of the adjacency matrix, be random variables defined on a common probability space, with A1a. The graph is drawn from a random ensemble with the property that the eigenvalues λ i and eigenvectors v i of the associated graph Laplacian are nearly independent. That is, for any i, j, k, l ∈ {1, . . . , n} and arbitrary measurable functions f : R 2 → R and g : Remark 1a: Assumption A1 holds for the symmetric Gaussian ensemble as well as for the more general Wigner ensemble [19,20]. Indeed for these ensembles the eigenvalues and eigenvectors are independent. The weaker assumption, that they are at most weakly correlated, appears reasonable for e.g. the ensemble of graph Laplacians obtained from the symmetric Erdös-Rényi random graph ensemble. A1b. The graph is drawn from a random ensemble with the property that the joint (eigenvalue, eigenvector) distribution is nearly invariant under permutation of eigenvectors. That is, for any Remark 1b: The symmetric Gaussian and Wigner ensembles are fully invariant under permutation of eigenvectors, and the weaker assumption of near invariance appears reasonable for the Erdös-Rényi ensemble. In particular, the pair appearing in the definition of R k (Lemma 1) are assumed to be approximately uncorrelated. This assumption is reasonable by virtue of the approximate rotational symmetry of the eigenvector distribution under our choice of random graph model, which we expect to be close (heuristically) to the eigenvector distribution of the symmetric Gaussian ensemble [19,20].
This normalization leaves a 2-fold ambiguity in the choice of eigenvector v. Since +v and −v both have v 2 = 1, we choose randomly between them so that the first non-zero component is positive with probability 1/2. 3 Remark 2b: By the symmetry of our random graph ensemble under the symmetric group acting on the change of labels, assumption A2 holds not just for the Gaussian and Wigner ensembles, but for any reasonable symmetric ensemble. In particular, it holds for the symmetric Erdös-Rényi random graph ensemble.
are both of order n −3 (green and magenta). This is numerical evidence for assumptions A3-A5 below A3. For any i, j ∈ {2, . . . , n} and l, l ∈ {1, . . . , n}, Remark 5: The reason for this assumption will become clear in the proof of Theorem 2. It is similar in spirit to the four moment theorem for eigenvector Sodium ("resurgent") 13 1 Raman and Bean [23] Sodium ("slowly inactivating") 26 2 Milescu et al. [24] Sodium ("allosteric") 12 1 Carter et al. [25] components of a Wigner or Gaussian random matrix, different versions have been established by Tao and Vu [20] and Knowles and Yin [19]. Figure 4 provides numerical evidence for the plausibility of assumption A5 in the Erdös-Rényi case.
In addition to assumptions A0-A5 on the random graph ensemble, the statement of Theorem 2 places an assumption on the measurement vector M ∈ {0, 1} n . This vector contains n 1 > 0 ones and n 0 > 0 zeros such that n 1 + n 0 = n. We assume n 1 = O(1) as n → ∞, that is, we exclude the case where n 1 grows without bound as n grows. (If M has the same value for all nodes, the output is constant and the error is identically zero. The expression in Theorem 2 holds trivially so we ignore this case.) To motivate this assumption, Table 3 shows the total number of states (n) and the number of conducting states (n 1 ) for representative ion-channel models. Model refinements driven by empirical evidence have tended to increase the total number of states relative to Hodgkin and Huxley's original model, without significantly increasing the number of conducting states.
Although assuming that n 1 = O(1) is biologically plausible, we make this assumption mainly for technical reasons as indicated in the proof of Theorem 2. We note, however, that in the numerical example in Sect. 3.3, the conclusions of Theorem 2 appear to hold equally well when n 1 = n 2 = n/2.
Note that the exponent q > 1 in part C is governed by the fourth moment of the eigenvector components of the graph Laplacian (see assumption A4a). The proof of Lemma 3 is given in Appendix C.3.

Proof of Main Theorem
Suppose assumptions A0-A5 hold and M ∈ {0, 1} n satisfies 0 < i M i ∼ O(1) as n → ∞. By Lemma 1, R k denotes the contribution of the kth reaction to the deficiency of the approximate process. Given the measurement vector M, we have (exactly) This expectation is taken over the space of symmetric directed graphs G = (V, E) where edge k is chosen at random from the set of n 2 possible bidirectional edges. If If the graph Laplacian were drawn from a symmetric Gaussian ensemble (or Wigner ensemble; see [19,20]), then the eigenvalues and the eigenvectors would be independent. For other ensembles we impose the weaker condition of near independence (assumption A1a), which in this case means that for each i ≥ 2 and j ≥ 2, we assume Under assumption A1b, the joint distribution of eigenvalues and eigenvectors is approximately separable into the product of two measures, one for the eigenvalues and a second for the eigenvectors. In this case the expectation E[( −1 λ i +λ j )] in the sum (23) can be replaced by its average, to obtain As shown in [21], assumption A0 implies that the empirical eigenvalue distribution for the graph Laplacian L, converges weakly (with probability one) as n → ∞ to the free convolution γ of the semicircle law, The measure γ becomes concentrated around λ i ≈ −nμ A as n grows. In particular, most terms in the sum (Eq. 25) concentrate around 1/(2nμ A ), as n → ∞. Therefore, by imposing assumption A0 and setting C = 2μ A , we have E[S] → 1/(nC), as n → ∞, yielding in the limit For the Erdös-Rényi ensemble with n nodes and edge probability p, we have Figure 5 shows that the sample mean of S over 10 realizations (i.e. 10 different Erdös-Rényi random graph configurations with the same parameters) rapidly approaches 1/(2pn), as n increases, for values of p ranging from 0.3 to 0.9. As the factor of 1/n is common across all k, it does not affect the stochastic shielding argument.
To prove Theorem 2, we will show that for some q > 1, corresponding to the parameter q appearing in assumption A4. This dichotomy is the basis for neglecting the edges k such that M ζ k = 0, as in the stochastic shielding approximation. To do this, we will use assumption A3a and Lemma 3 to show the following: It suffices to show that the first term in Eq. 32 is (34) and the second term is Starting with the first term in Eq. 31, it follows from assumption A3a that, as n → ∞, We can expand the left hand side of Eq. 34 by using the definitions Fig. 6 Erdös-Rényi random graph. Realization of an Erdös-Rényi random graph with n = 50 nodes and edge probability p = 0.5 as n → ∞, which establishes the first term (Eq. 34). We now focus on the second term in Eq. 32. In Lemma 3 part C, we establish that as n → ∞ as n → ∞, which establishes the second term (Eq. 35). Therefore, we have established Theorem 2.

Symmetric Erdös-Rényi Random Graph Ensemble
Many varieties of random graphs have been used to describe biological systems [26,27]. Here, we restrict attention to an ensemble of symmetric Erdös-Rényi random graphs G(n, p) on n nodes, for which each of (n 2 − n)/2 possible bidirectional edges occurs independently with probability p [28,29]. Consider a graph drawn from the Erdös-Rényi ensemble for n = 50 and p = 0.5. See Fig. 6 for an example. Take A to be the unweighted adjacency matrix (α k ∈ {0, 1}) and let σ k = 1 for all reactions k so that the kth column of the matrix B is exactly the stoichiometry vector for reaction k. Specifying any measurement vector M ∈ {0, 1} 50 induces a partition of edges into "important" (type 0-1) or "unimportant" (types 0-0 or 1-1) classes. Let E I be the set of important edges and E U be the set of unimportant edges. Clearly, E = E I ∪ E U . In the following example, we consider a vector M such that half the entries are 1 and other half are 0. Theorem 2 says that if the matrix of eigenvector components of the Erdös-Rényi graph Laplacian is sufficiently similar to a random matrix drawn from the Gaussian ensemble (in terms of assumptions A0-A5) then one would expect the partitioning of the R k into two clusters. One cluster, containing the important edges, will be centered at 1/n. A second cluster, containing the unimportant edges, will have smaller R k values (O(n −q ) where q > 1 is governed by the fourth moment; see assumption A4a in Sect. 3.1). To the extent to which this similarity to the Gaussian ensemble holds, our calculation of R k involves projecting the measurement vector M and the vectors ζ k onto randomly chosen subspaces of R n .
As shown in Fig. 4, assumptions A0-A5 appear to be satisfied for the symmetric Erdös-Rényi random graph ensemble. In particular, the fourth moment of the eigenvector components (assumption A4a) appears to hold empirically for q ≈ 5/3; in particular, we find that, empirically, E[v i (l) 4 ] ≈ √ 2n −5/3 . This behavior suggests that the unimportant edges should have a mean R k value √ 2n −5/3 . Setting n = 50, for example, we would expect one cluster of R k values centered at 1/50 = 0.02 for k ∈ E I and another cluster close to √ 2 · 50 −5/3 ≈ 0.0021 for k ∈ E U . Figure 7 shows the rank order of edge importance values R k corresponding to the m reactions in the Erdös-Rényi random graph. The top cluster is centered at 0.02 (upper horizontal red line) and the bottom cluster is bounded above by 0.0021 (lower horizontal red line) consistent with Theorem 2 for the Erdös-Rényi random graph ensemble with 50 nodes and edge probability p = 0.5. Since the measurement functional M is binary, we see a significant gap between the two clusters, as expected. If the components of M are graded, i.e. drawn uniformly from the unit interval, then this curve appears to be smooth (see discussion in Sect. 5). Figure 8 illustrates the distribution of eigenvector components of the Erdös-Rényi graph Laplacian in comparison with a Gaussian random matrix (i.e., each entry has mean 0 and variance 1/n). The quantile-quantile plots show good agreement within one standard deviation and begin to deviate in the second standard deviation. This is consistent with the observation that the fourth moment in the Erdös-Rényi case deviates from the Gaussian case (q ≈ 5/3 for Erdös-Rényi and q = 2 for Gaussian). Nevertheless, Theorem 2 predicts that there will be two clusters of R k values as described above and shown in Fig. 7 for the Erdös-Rényi case with n = 50 and p = 0.5.

Application: Stochastic Shielding of Hodgkin-Huxley Channels Under Voltage Clamp
Hodgkin and Huxley's (HH) model for the generation and propagation of action potentials along the giant axon of the squid Loligo lies at the foundations of modern neuroscience [22,30]. In the classic HH model, action potentials are generated through the interaction of a leak current and two voltage-gated ionic currents, carried by a sodium ion specific channel and a potassium ion specific channel. The potassium channel comprises four identical subunits that open and close independently with voltage-dependent rates. The channel carries a current when all four subunits are in the open state. At the molecular level, a single channel can be represented as a continuous time Markov jump process on a chain of five states, the fifth of which has non-zero conductance. Of the eight transitions connecting states along this chain, only the last two connect states with different conductances, therefore the stochastic shielding approximation would preserve the fluctuations of these transitions and not the other six. The sodium channel involves two types of subunits, an activation subunit ("m") present in three identical copies, and an inactivation subunit ("h") present in a single copy. 4 The resulting graph has eight distinct states connected by 20 different transitions, each occurring with a voltage-dependent rate [31][32][33]. Four of these 20 tran-sitions connect states with differing conductance values (zero versus non-zero); the fluctuations of the remaining 16 transitions are ignored under the stochastic shielding approximation.
Schmandt and Galán compared simulations of a system comprising 5000 individual potassium channels and 25000 individual sodium channels, both with and without the stochastic shielding approximation. It is possible to construct an exact simulation scheme, analogous to Gillespie's stochastic simulation algorithm [34], that takes into account the nonstationarity of the transition rates (propensities) arising from their voltage dependence [35]. However, Schmandt and Galán used a discrete time approximation to this process. Appendix A discusses Schmandt and Galán's approach in more detail. Here we apply our analysis to evaluate the edge importance R k of each transition in the graph for the classic HH potassium and sodium channels, respectively. Rather than consider the case of time-varying transition rates, we restrict attention to the "voltage clamped" case. If the membrane potential is experimentally held constant for a given cell, the per capita transition rates remain constant and the fluctuating ion-channel population forms a stationary Markov process. In particular, our analysis approximates this stationary population process with a linear multidimensional Ornstein-Uhlenbeck process (see Appendix B); this approximation is reasonable given the large numbers of individual channels considered in Schmandt and Galán's simulations.
In general, the ion-channel state graphs for the potassium and sodium channels in the HH model have graph Laplacians L that are not symmetric. Therefore, we need to modify our definition of the edge importance R k (Eq. 10) in order to apply our results. When L is not symmetric, we will assume that L is nevertheless diagonalizable, i.e. that there are eigenvalues λ i and a biorthogonal system of vectors v i , w i (right and left eigenvectors) satisfying In this case the decomposition of L becomes L = i λ i v i w i , and the definition of R k is modified as follows:

Hodgkin-Huxley Potassium Channel
The potassium channel state graph in the Hodgkin-Huxley model is a 5-state chain with one conducting state. Following the tau-leaping construction (Appendix B) we consider a stationary OU process X(t) ∈ R 5 , with linear measurement functional M = [0, 0, 0, 0, 1] . See Fig. 9 for an illustration of this channel. The corresponding Fig. 9 Illustration of the Hodgkin-Huxley potassium channel state graph. This is a 5-state chain where state 5 is the conducting state. The eight reactions are labeled in blue and are used to define the edge importance values R k in the figures below. The reaction rates α n and β n are voltage-dependent as defined by Eqs. [47][48] (weighted) adjacency matrix A is which is evidently not symmetric. The voltage-dependent transition rates are given by Then the graph Laplacian L = (A − D) is voltage-dependent and is given by since the entries in the diagonal matrix D are the weighted out-degrees of each node for a given voltage V , i.e. D ii (V ) = 5 j =1 A ij (V ). The matrix B is also voltagedependent. Recall that the kth column of B corresponds to the kth reaction, and this can be written as σ k (V )ζ k . If r k is the per capita rate of reaction k (transition from node i(k) to j (k)), then σ k (V ) = r k (V )N i (V ) whereN i (V ) is the average number of channels at state i at equilibrium for voltage V . Hence, B is given by Figure 10 shows the edge importance R k as a function of voltage for each reaction k ∈ {1, . . . , 8} in the potassium channel state graph. Note that since the process is at steady state, and respects detailed balance, the mean flux due to the two reactions connecting the same pair of nodes will be equal and opposite. Thus, in this case, R 1 = R 2 , R 3 = R 4 , R 5 = R 6 , and R 7 = R 8 . The blue curve (R 7 = R 8 ) corresponds Physically, it is the current rather than the state occupancy that holds the greatest interest. The current through a population of potassium channels with net conductance g is I = g(V − V k ); here V k = −77 mV is the potassium reversal potential, and the conductance g = g o N o is the product of the unitary or single channel conductance g o with the total number of channels in the open state, N o . The variance of the current is therefore (g o (V − V k )) 2 times the variance of the occupancy number, meaning that near the reversal potential, the current can have low variance even if the channel state has high variance. For convenience we set g o = 1, which amounts to a change of nominal units for measuring the conductance. Figure 11 shows the variance of the nominal current, R k * (V − V k ) 2 as a function of voltage V for each reaction k for the potassium channel. In addition to having the highest edge importance curve, the blue curve R 7 = R 8 also has the highest variance (left panel). The right panel shows the probability of being in each state as a function of voltage.

Hodgkin-Huxley Sodium Channel
The sodium channel state graph in the Hodgkin-Huxley model consists of two linked 4-state chains, for a total of eight states, including one conducting state, and 20 reactions. Again following the tau-leaping construction (Appendix B) we consider a stationary OU process X(t) ∈ R 8 , with linear measurement functional M = [0, 0, 0, 0, 0, 0, 0, 1] . See Fig. 12 for an illustration.  The adjacency matrix in this case is where the voltage-dependent entries are defined by The graph Laplacian from the adjacency matrix above (Eq. 50). The matrix B is also voltage-dependent and is given by the general expression in Eq. 49. Figure 13 shows the edge importance R k as a function of voltage for each reaction k ∈ {1, . . . , 20} for the sodium channel state graph. The sodium channel also satisfies detailed balance, so each pair of complementary reactions k i , k i+1 connecting the same pair of nodes will have equal edge importance values R k i = R k i+1 . The magenta curve corresponds to edges 11 and 12 and the yellow curve corresponds to edges 19 and 20, which are the transitions between state 7 and conducting state 8, and  Figure 14 shows the variance of the nominal current R k * (V − V k ) 2 as a function of voltage V for each reaction k where V k = 45 mV is the reversal potential for the sodium channel. Again, we choose units for conductance such that the unitary channel conductance equals 1. As before, we see that the edges with the highest edge importance have the largest variance (left panel). The switch between the dominant curves (magenta vs. yellow) agrees with the switch in Fig. 13 which occurs at −25 mV. The right panel in Fig. 14 shows the probability of being in each state and how that changes with voltage.
In summary, our analysis fully supports the accuracy of Schmandt and Galán's stochastic shielding algorithm for the Hodgkin-Huxley system, at least for the voltage clamped case that we consider. More significantly, our analysis allows one to calculate the relative importance of each transition in a network of first-order reactions, allowing a new quantitative basis for reduction of complexity of stochastic network models. In the case of a simple chain of states such as the Hodgkin-Huxley potassium channel, the rank ordering of transitions by importance R k is the same for all voltages. As shown in Fig. 13, however, for more complicated gating schemes, such as the Hodgkin-Huxley sodium channel, the rank ordering of transitions by importance can differ at different voltages.
For instance, the most important transition at subthreshold voltages (V −40 mV) is the transition connecting the [m = (1, 1, 0), h = 1] state (state 7 in Fig. 12) to the [m = (1, 1, 1), h = 1] state (state 8, the conducting state). This tran-sition corresponds biophysically to the nonconducting-to-conducting transition that occurs via activation or deactivation [22], that is, the opening (or closing) of the last of three m-activation gates in the ion channel. It is significant that this transition is the most "important" for subthreshold voltages, because the activation transition is typically the last subthreshold event during spike generation.
On the other hand, at suprathreshold voltages the most important transition is that connecting the [m = (1, 1, 1), h = 1] state (state 8) with the [m = (1, 1, 1), h = 0] state (state 4). Biophysically, this transition corresponds to inactivation and deinactivation, or the closing (and opening) of the h-inactivation gate. During action potential generation this transition plays an essential role in terminating the voltage spike upstroke, and it is significant that it should be most "important" at suprathreshold voltages.
For more general channel schemes, and more elaborate stochastic processes in general, the identification of the relative quantitative importance of different transitions or edges to the observable behavior of the system is a powerful new tool for principled complexity reduction.

Discussion
In the ongoing race between growth of empirical data sets and growth of available computing power, conceptual understanding of complex dynamical systems can get left behind. Finding efficient lower-dimensional representations of high-dimensional systems, that accurately capture relevant aspects of system behavior, not only takes better advantage of computational resources, but can provide insights into the essential components of a system. Hence, there has been a significant effort in recent years to develop principled complexity reduction techniques for naturally occurring complex networks.
Schmandt and Galán [14] developed a method for efficient simulation of stochastic ion-channel gating in the membrane of a neuron. The random gating of ion channels provides an important class of biological processes which are naturally represented as Markov chains on graphs [33,35]. The graphs in this case arise from the different configurations of ion-channel subunits or "gates". Typically each state carries one of two functional labels: open or closed. This coarse-grained representation of the ion channel corresponds to a linear measurement functional, in the sense that current flowing through open channels can be measured experimentally, and individual ion channels typically exhibit binary all-or-none conductance. Schmandt and Galán implemented a novel form of coarse graining technique that ignores fluctuations between indistinguishable transitions (open-to-open or closed-to-closed) while preserving fluctuations between distinguishable states. In order to gain a deeper understanding of why their "stochastic shielding approximation" works so well, we analyzed it in the context of a multidimensional Ornstein-Uhlenbeck process on a variety of networks. First, we showed that this form of model reduction can be represented as a mapping from a many-dimensional sample space to a lower-dimensional sample space, rather than as a mapping from a many-node network to a few-node network, and that one can formulate the problem as a search for the optimal such mapping. Second, we showed that for the specific 3-state example presented in Schmandt and Galán's paper, their approximation is indeed optimal in a specific sense. Third, we obtained a theoretical result showing that stochastic shielding works for an ensemble of random graphs with arbitrarily chosen binary measurement vectors, analogous to the identification of nodes as conducting versus nonconducting in ion-channel models. Finally, we evaluated the stochastic shielding approach for the graph representing the ionchannel states of the classical Hodgkin-Huxley model, and showed that this approach is optimal for a wide range of fixed voltages under "voltage clamped" conditions.

Relationship Between Different Levels of Modeling
The underlying description of Schmandt and Galán's model [14] is given by the population process described in Sect. 2.1, a more general framework than the Ornstein-Uhlenbeck process that we study. The OU process connects to the population process via a tau-leaping approximation, as described in Appendix B. The tau-leaping method involves two key assumptions. First, assuming that the transition propensities α ij (k) do not change dramatically in an interval of length τ , we can approximate the number of transitions in each interval by a collection of independent Poisson processes. This approach is closely related to the framework of Schmandt and Galán, except that they use a binomial distribution instead of a multinomial distribution (see Appendix A). Second, if the expected number of occurrences of each reaction is sufficiently large (i.e. 10 s or 100 s) in time τ , then it is reasonable to use a Gaussian approximation to the Poisson process. The resulting model comprises the standard chemical Langevin formulation, in which the size of the fluctuations associated with each transition is state dependent. These two constraints can always be satisfied by taking a sufficiently large number of individuals in the population. The Ornstein-Uhlenbeck process is obtained by linearizing about the mean field steady state distribution of the tau-leaping model (see Appendix B). The intensity of the noise terms is determined by the mean steady state occupancy of each state, resulting in a linear OU process. A technical obstacle to extending our results beyond the linear OUP setting is the lack of an explicit closed form expression for the stationary covariance of the population process analogous to Eq. 6. Although our analysis is limited to the OU process version of the system, it is reasonable to expect that stochastic shielding will apply more broadly. For example, in the full population process one can decompose the fluxes in the model into a sum of a mean component and a mean zero fluctuating component. In this case, stochastic shielding amounts to setting the fluctuating component to zero while preserving the mean for those transitions connecting observationally equivalent states.
Limiting the investigation to voltage clamped conditions facilitated a more thorough mathematical analysis of the stochastic shielding approximation, but also restricted the biological applicability of the results. By approximating the population process with a closely related Ornstein-Uhlenbeck process we effectively linearized the system about a fixed point given by the mean field behavior. Therefore our analysis does not address important nonlinear dynamical behaviors arising in many physical and biological systems, such as noise driven transport between multiple quasiequilibria, fluctuation induced spiking in excitable systems (including noise induced spiking in nerve cells), or limit cycle oscillations (including regular spiking in nerve cells). On the one hand, we anticipate that transitions in a state graph corresponding to directly observable state changes, such as between conducting and nonconducting ion-channel states, will remain "important" under more general measures accounting for global, nonlinear behaviors. On the other hand, it is certainly possible that additional transitions may also become important with respect to more general measures, if the linear measurement vectors employed here fail to capture their contribution to global dynamics.

Broader Applications
The stochastic shielding approximation can be directly applied to various biological networks, not just ion-channel models. For instance, Lu et al. [13] describe a signal transduction network in which the phosphorylation and transport events are arranged with a ladder topology. The two sides of the ladder denote molecules in the nucleus and in the cytoplasm, respectively. On each side, there are M + 1 species having different levels of phosphorylation (see Fig. 1 of [13] for an illustration). This is a more elaborate Markov process than a simple ion-channel state model, but it can still be described with a binary measurement vector. The readout is 1 if the system is both in the nucleus and in a specific phosphorylated state, and 0 otherwise. The application of stochastic shielding to such a system is quite natural.
Another broad class of examples includes calcium-induced calcium release Markov models. Nguyen, Mathias and Smith [36] studied a stochastic automata network description of instantaneously coupled intracellular calcium channels which they derived from Markov models of single channel gating that include calcium activation, inactivation, or both. This high-dimensional system involves a large number of functional transitions; the transition probabilities of one channel depend on the local calcium concentration which is typically influenced in turn by the state of other channels in the population. Such models can easily become very high dimensional. For example, DeRemigio et al. [37] considered a discrete state continuous time Markov model of coupled calcium channels, taking explicit channel position in to account, which yields up to 1.6 million distinct states. Similarly, in order to investigate the relationship between single-molecule stochastic events and whole-cell behavior, Skupin et al. [38] implemented a multi scale calcium signaling and spike generation model. Their model connects channel state transitions on a millisecond time scale with interspike interval fluctuations on the scale of tens of seconds, and involves a large number of chemical states. For systems of such complexity, any reduction of the complexity of the stochastic process by stochastic shielding will likely be advantageous, both for simulation and for analysis.
We have focused here on discrete state ion-channel models with binary measurement vectors. However, it is possible that some ion channels may have a richer than binary readout structure. For example, Catterall [39] provides structural evidence that activation of a bacterial sodium channel may possess multiple non-equivalent conducting states, raising the possibility that conductance could be graded rather than binary. As another example which could lead to graded measurement vectors, adaptive evolution can be represented as a random walk on a graph representing genomic variants connected by possible mutation routes [40,41]. While the stochastic process representing the evolution of a human pathogen such as influenza may have an enormous number of degrees of freedom [42,43], the dynamics of interest may comprise a smaller number of dimensions, such as a strain's virulence or fitness, which may naturally be graded rather than discrete quantities.
Stochastic shielding in a modified form would still apply even if the measurement functional were graded continuously. As an example, consider an Erdös-Rényi random graph on n nodes with edge probability p, with graded measurement vector M ∈ [0, 1] n instead of binary M ∈ {0, 1} n . The left panel of Fig. 15 shows the edge importance distribution for the case n = 50 and p = 0.5 where the components of M are chosen uniformly at random from the unit interval. The right panel of Fig. 15 illustrates the difference in measurement between nodes connected by edge k, x = |M ζ k |, versus the edge importance R k , and shows good agreement with the curve y ≈ x 2 /n for the case n = 50.
This empirical result (Fig. 15, right panel) suggests the following generalization of Theorem 2: for some q > 1 (e.g., q = 2 for the Gaussian unitary ensemble, and q ≈ 5/3 for the Erdös-Rényi ensemble, empirically). In the case of a binary measurement vector, M ∈ {0, 1}, this formula would revert to the result given in Theorem 2. A rigorous derivation of Eq. 53 is beyond the scope of the present paper. The behavior of stochastic processes arising in first-order reaction networks has been explored in broad generality by Cadgil, Lee and Othmer [44]. They used a spectral approach to analyze a general system of first-order reaction networks, and studied the effect of changes in the network topology on the distribution of the number of reactant molecules, as well as the difference between conversion and catalytic networks with the same topology. Exploring sample space reductions conditioned on a linear measurement functional for such general classes of networks would be of interest.

Different Levels of Model Simplification
Model simplification is an important goal for Markov chain models in many scientific contexts, and complexity reduction has been pursued through a corresponding variety of approaches. Newman and others have extensively developed techniques based on community structure, aggregating or lumping nodes together based on topological considerations [45,46]. When applied to a stochastic process on a graph, the aggregation of N n to n nodes is equivalent to a projection of the original process onto a subspace in which the process components on the aggregated fine-grained nodes are averaged. In most cases, the resulting coarsened process is no longer Markov, although in some cases exact dimension reduction to a lower-dimensional Markov processes can be accomplished [47][48][49]. Other aggregation schemes, such as spectral coarse graining [50][51][52], have been proposed based on the spectral properties of the graph Laplacian. Approaches based on topological or abstract spectral properties do not necessarily take into account functional properties of the system to be simplified. Because stochastic shielding simplifies the representation of a stochastic process taking into account the function of the system, namely by distinguishing conducting versus nonconducting ion-channel states, it may provide insights not afforded by graph aggregation based on modularity or graph spectra.
As another example of simplification based on functional properties, Bruno, Yang and Pearson [53] used independent open-closed transitions to describe a canonical form that can express all possible reaction schemes for binary ion channels.
Not all prior approaches to simplification of random processes on graphs proceed by aggregating nodes. For instance, Ullah, Bruno and Pearson [54] proposed model simplification by the elimination of nodes with low equilibrium occupancy probability using time scale separation arguments. The reduced system has fewer parameters, and the dynamics of the reduced system are identical to those of the original system except on very fast time scales. Other simplifications based on graph sparsification have been proposed by Koutis, Levin and Peng [55].
In this paper we have investigated a novel form of simplification of stochastic processes on graphs. Stochastic shielding is based on replacing a high-dimensional stochastic process defined on a graph with a lower-dimensional process on the same graph, rather than replacing a complex network with a simpler one. Specifically, we consider mappings from the original process to an approximate process defined on a significantly smaller sample space. In one sense, we can think of the full and a reduced system as two systems with partially shared stochastic input, and partially independent stochastic input of different magnitudes (magnitude zero, in one case). Structurally, this situation is analogous to the kind of mixed common-noise and independent-noise scenario studied in the context of neuronal synchronization [56][57][58]. In another sense, stochastic shielding can be seen as a different kind of projection, vs. that induced by lumping or pruning nodes. The latter methods simplify the graph, whereas stochastic shielding leaves the graph unchanged and simplifies the sample space on which the approximate process lives.

Appendix A: Stochastic Shielding Construction of Schmandt and Galán
In [14], Schmandt and Galán considered discrete time simulations approximating a continuous time, finite state Markov chain where N i (t) is the number of individuals in a population (of size N tot ) in state i at time t, andÑ ij (t) counts the number of i → j transitions that have occurred as of time t. The transition countsÑ ij (t) may be written using the random time change representation [17] asÑ By convention we take N ii (t) ≡ 0 and α ii (t) ≡ 0. The Y ij are independent unit rate Poisson processes driving the different state-to-state transitions. The transition from state i to state j occurs with per capita rate α ij . In a conductance-based model, such as a discrete stochastic version of the Hodgkin-Huxley equations, the vector (N 1 (t), . . . , N K (t)) would represent the number of ion channels in each of K distinct states, and the transition rates could vary with time, e.g. through dependence on membrane potential or second messenger concentration. Although Schmandt and Galán consider both the stationary and time-varying case, we restrict attention to the stationary case, which corresponds experimentally to a voltage clamped preparation.
One may (approximately) simulate trajectories of the Markov chain using a discrete time step approach. Following [14], we fix a time step h > 0 and define N ij as N ij (t) =Ñ ij (t + h) −Ñ ij (t), that is, the number of i → j transitions occurring in the interval (t, t + h]. The net increments in the state-occupancy numbers N i are then given by To obtain a practical algorithm, Schmandt and Galán set Since there is then a finite probability that N i (t + h) < 0, one must include an iterative resampling scheme to force N i (t + h) ≥ 0. As an alternative, we consider instead a multinomial representation of the destinations of all N i (t) individuals beginning the time step at node i. That is, for each i, 1 ≤ i ≤ K, we set The multinomial distribution produces an integer-valued random vector with mean and marginal distributions the same as that given by the binomial distribution; the only difference is that transitions emanating from a common node are not assumed to be independent. The first and second moments arising from the multinomial transition distribution are Here all expectations are conditioned on the current state of the system, The mean increment given the current distribution of the population,¯ i (t) ≡ E[ i (t)| N(t)], is written in terms of the mean transitions as The deviation of the actual number of i → j transitions from the expected number is where is the deviation of the number of i → j transitions from the expected number. The mean of δN ij (t) is zero for all i, j , and all t, by construction. The stochastic shielding approximation amounts to setting δN ij (t) to zero for selected i → j transitions, namely for those transitions between "unobservable states", or (equivalently) between any two states with the same value of the measurement observable, i.e. the conductance. Since As an example, in the three node case, for the multinomial transition model, the variances are given by and the covariances are given by Schmandt and Galán obtain similar expressions that agree up to order O(h); the difference between the binomial and multinomial expressions only appears in the O(h 2 ) terms. For example, they assert that E[δ 1 (t)δ 3 (t)| N(t)] ≡ 0, while under the multinomial model this covariance is equal to −N 2 (t)α 21 α 23 h 2 . Fortunately, this difference does not undermine the main argument. From this point, Schmandt and Galán obtain an expression for the stationary covariance matrix of the reduced process (compare Eq. (8) in [14] with our Lemma 1) and decompose the covariance into a sum over direct and indirect connections to a single conducting or observable state. This situation corresponds, in our analysis, to the case where the measurement vector M contains a single non-zero entry. Schmandt and Galán argue that suppressing the fluctuations associated with transitions not directly affecting the observable state decrease their contribution to the variance of the observable state occupancy, while increasing the contribution of the direct transitions to the same variance. In addition, they show through numerical comparisons that Hodgkin-Huxley equations with a full Markov process and the reduced process are practically indistinguishable both under voltage clamp (stationary transition rates) and current clamp (time-varying transition rates) conditions. evolution of the probability vector p(t) = [p 1 (t), . . . , p n (t)] is given by the following master equation where is the graph Laplacian which can be represented as the sum over all undirected edges (denoted by the set E * ) given in Eq. 75. Let π represent the steady state distribution, i.e. the row vector satisfying πL = 0 with entries such that n i=1 π i = 1. Suppose we represent N(t) as the deviation from its mean,N = πN tot , so that N(t) =N + X(t), where X(t) is a mean zero stochastic process. Then since N i (s) =N i (s) + X i (s) and α k andN i are constants. Now following standard tau-leaping results [59][60][61], which says that we can approximate Eq. 80 using an almost equivalent set of Poisson processesỸ k where eachỸ k at time t is approximately Gaussian distributed with mean and variance τ α k [N i + X i (t)]. Note that if X is a stationary irreducible Markov process on a finite state space, then the occupancy probability of state i, π i > 0. By choosing N tot 1/(min{π i }), we may guarantee that the mean populationX i for each i is as large as necessary for the Gaussian approximation to hold. Since we are assuming that |X i (t)| N i (uniformly in time), and since we want the noise amplitude to be independent of X, we further approximatẽ by dropping the dependence of the variance on X.
Dividing by τ and taking the limit as τ → ∞ yields the SDE Recalling that the kth reaction is from node i(k) to j (k), then the kth reaction in the first term in the RHS of Eq. 83 can be written as Keeping track of components, we sum over the source and destination nodes for each reaction. Then for the lth component of X we have which yields where Q is the generator matrix. Note that we changed notation slightly to illustrate that α ij is the transition rate from state i to j rather than indexing by reaction k.
The graph Laplacian we consider in Eq. 4 is actually L = Q so we have dX = L(N + X)dt. SinceN is proportional to the stationary distribution π , we have that LN = 0, and hence the first term in the SDE is dX = LXdt. Now the second term in the RHS of Eq. 83 can be written as Keeping track of components, here we sum over all m reactions to find where σ k = N i(k) α k in the definition of matrix B.
Therefore, putting the first and second terms together, we have derived the OU process dX = LXdt + BdW given in Eq. 4.

B.2 Tau-Leaping: 3-State Example
Here we will explicitly derive the OU process from the population process given in Sect. 2.1 by using the tau-leaping argument above for the 3-state example in Sect. 2.3. We have N(t) ∈ N 3 and by Eq. 73, following the notation given in Sect. 2.3, specifically the labeling of reactions given in Table 1. Note that α k could be time dependent α k (t).
The tau-leaping approximation above gives Equivalently, we could write these integral equations in differential form These equations are nonlinear since the noise intensity depends on X i . Note that for any t, X 1 (t) + X 2 (t) + X 3 (t) = N tot so that the total population is constant. The mean X satisfies where the matrix above is the generator Q, or our L . In the case where Q is fixed,X is proportional to the null left eigenvector of Q; biologically, this is the voltage clamp case. Let (X 1 , . . . ,X n ) be the corresponding stationary vector. Now we linearize Eqs. 96-98 around the stationary vector. Let V = X −X and assume that |V | Neglecting the O( |V | N tot ) terms gives us the multidimensional Ornstein-Uhlenbeck process of Eq. 4 for the 3-state example.

C.1 Stationary Covariance of a Multidimensional OU Process
The SDE for X(t) in Eq. 4 has the explicit solution (see [18], Chap. 4.5) Assuming the initial condition is either deterministic or Gaussian, then X(t) is Gaussian with mean and correlation function Cov X(t), X (s) = exp(Lt)E X(0), X (0) exp(Ls) where t ∧ s means the minimum of t and s. Setting s = t and taking the limit as t → ∞, we obtain the stationary covariance function We exploit the fact that not only does B decompose into the sum B = m k=1 B k , but in the case of a first-order reaction process, BB also decomposes into the following sum: and further, B k B k = σ 2 k ζ k ζ k for each edge (reaction) k ∈ E. Therefore, the stationary covariance of the full process decomposes into a sum of the contributions from the m different reaction processes: We note that the (left) eigenvector corresponding to the leading (0) eigenvalue of L has constant components, therefore it lies in the kernel of the matrix B k B k for each k, which guarantees finite covariance in Eq. 108.
C.2 Computation of Edge Importance R k and Proof of Lemma 1 Using the spectral properties of the graph Laplacian L, we can rewrite the stationary covariance of X(t) (Eq. 106) by replacing each expression involving a matrix exponential by the sum over the orthogonal eigendecomposition of L. Let v i be the ith eigenvector of L (written as a column vector), with eigenvalue λ i , i.e. Lv i = λ i v i . Summing over each eigenvalue, we can write L = n i=1 λ i v i v i . Note that this decomposition is only valid when L is symmetric; the non-symmetric case is discussed in Sect. 4. Then we have the following expression from Eq. 106: Using the decomposition of matrix B (Eqs. 5 and 107), it follows that The covariance of the full process X is therefore given by Cov X(t), X (t) By construction of the graph Laplacian, its leading eigenvalue λ 1 ≡ 0. The corresponding (right) eigenvector has constant components, v 1 = (1, . . . , 1) / √ n. Therefore, for each stoichiometry vector we have ζ k v 1 ≡ 0. Consequently the terms in the inner summation (114) with index i = 1 or j = 1 vanish, and may be omitted without changing the result. Taking the limit as t → ∞ of the covariance function gives us the stationary covariance Recall that we are interested in the linear measurement functional M ∈ R n projected onto X(t), i.e. the projection Y (t) = M X(t). For edges k ∈ E neglected in the approximationỸ = M X (t), we take the limit as t → ∞ of the mean squared error ofỸ (t) − Y (t) = M U(t) to get

C.3 Proof of Lemma 3
Suppose that assumptions A0-A5 given in Sect. 3.1 hold. We assume that M ∈ {0, 1} n is an arbitrary measurement vector consisting of n 1 > 0 ones and n 0 > 0 zeros such that n 1 + n 0 = n, and n 1 = O(1) as n → ∞. That is, we exclude the case where n 1 grows without bound as n grows. If we look at the corresponding measurement value of the l − th and l + th components of ζ k (see Eq. 17), we have three possible cases: 1 For each part of Lemma 3, we will prove the result for these three cases. If we let n * and we can consider all three cases at once using this notation where now n * 1 = O(1) as n → ∞, by our assumption on M.
Let a = v i (l − ), b = v i (l + ), and c = l∈1 M \{l ± } v i (l). By assumption A2, we have that E[a] = E[b] = E[c] = 0 and E[a 2 ] = E[b 2 ] = n −1 from the normalization of the eigenvectors, and it follows from assumption A3b that E[c 2 ] = (n * 1 )n −1 + O(n −3 ), as n → ∞. Assumption A3 gives conditions on second order terms. Assumptions A4 and A5 give conditions on fourth order moments and fourth order products of a, b, and c.

C.3.1 Proof of Part A
We will show that, as n → ∞, By definition since M v i = l∈1 M v i (l) and v i ζ k = v i (l + ) − v i (l − ). We compute this expectation for the three cases listed at the beginning of Sect. C.3. Using the notation introduced above, we note that this expectation has the form E (a + c)(b − a) for Case 3.
Case 1: l ± ∈ 1 M . Expanding the expected value yields , and each contains n * 1 terms with the following expectation as n → ∞: (171)

Appendix D: Disconnected Graphs
Our general results (Lemma 1 and Theorem 2) implicitly assume that zero is a simple eigenvalue of the graph Laplacian L, or, equivalently, that the graph is irreducible. If we consider a random graph ensemble for which the entries of the adjacency matrix are independent, there can be a strictly positive probability of drawing a disconnected graph. To address this case, suppose the graph G = (V, E) decomposes into G disconnected components, i.e.
where G g = (V g , E g ) and the gth component contains n g vertices. For each g ∈ {1, . . . , G} we have the corresponding graph Laplacian L g restricted to the gth component. If we neglect fluctuations associated with edges E = G g=1 E g , then the re-sulting error, Var[M (X − X)], depends on which component the initial condition X(0) = x 0 =X(0) belongs to. That is, Here the eigenpairs (λ g i , v g i ) refer to the eigenvalues of the gth Laplacian, and quantities such as ζ k (v g ) j and (v g ) j M are interpreted with vectors ζ k and M restricted to those components that lie in the appropriate subspace of R n .
For the random graph ensembles we consider, the probability of drawing a disconnected graph, P[¬C], decreases so rapidly that taking it into account does not affect our main result (Theorem 2). For example, consider the Erdös-Rényi ensemble with fixed edge probability p ∈ (0, 1) as n grows. It is well known that p n = ln(n)/n is a sharp threshold for connectedness as n → ∞. E.g. if p n ≥ 2 ln(n)/n, then P(C) → 1 as n → ∞. Here we show that, if p ∈ (0, 1) is fixed, then P[¬C] goes to zero faster than any power of p, as n → ∞.
Draw a graph G from the standard Erdös-Rényi ensemble with parameters n and p. We call a subgraph of G an isolated k-graph if it is a connected subgraph, with k components, that is disconnected from the rest of the graph. Let P k be the probability that G has an isolated k-graph, conditioned on G not having any isolated k -graph for k < k. Thus P 1 is the probability that G contains an isolated singleton, P 2 is the probability that G contains an isolated pair, given that it does not contain any isolated singletons, and so on. We set P 0 ≡ 0. If G is reducible, then it contains an isolated k-graph for some 1 ≤ k ≤ [n/2], where [·] denotes the integer part of its argument. The probability that G is disconnected is thus For any collection of k vertices to be disconnected from the remaining n − k vertices in the graph requires k independent events, each of which has probability (1 − p) (