 Research
 Open Access
 Published:
Measuring Edge Importance: A Quantitative Analysis of the Stochastic Shielding Approximation for Random Processes on Graphs
The Journal of Mathematical Neuroscience volume 4, Article number: 6 (2014)
Abstract
Mathematical models of cellular physiological mechanisms often involve random walks on graphs representing transitions within networks of functional states. Schmandt and Galán recently introduced a novel stochastic shielding approximation as a fast, accurate method for generating approximate sample paths from a finite state Markov process in which only a subset of states are observable. For example, in ionchannel models, such as the Hodgkin–Huxley or other conductancebased neural models, a nerve cell has a population of ion channels whose states comprise the nodes of a graph, only some of which allow a transmembrane current to pass. The stochastic shielding approximation consists of neglecting fluctuations in the dynamics associated with edges in the graph not directly affecting the observable states. We consider the problem of finding the optimal complexity reducing mapping from a stochastic process on a graph to an approximate process on a smaller sample space, as determined by the choice of a particular linear measurement functional on the graph. The partitioning of ionchannel states into conducting versus nonconducting states provides a case in point. In addition to establishing that Schmandt and Galán’s approximation is in fact optimal in a specific sense, we use recent results from random matrix theory to provide heuristic error estimates for the accuracy of the stochastic shielding approximation for an ensemble of random graphs. Moreover, we provide a novel quantitative measure of the contribution of individual transitions within the reaction graph to the accuracy of the approximate process.
1 Introduction
Many biological systems exhibit a combination of stochastic (chance, random, noisy) and deterministic dynamics [1–3]. For example, mathematical models involving stochastic processes arise in physiology [4–7], ecology [8–10], and genetic regulatory systems [11–13]. Such mathematical models often originate as intrinsically complex, highdimensional systems with many degrees of freedom, and many sources of variability. This inherent complexity presents two related challenges. First, the essential dynamics of such systems may be hard to discern, and model reduction based on first principles for stochastic systems on complex networks is difficult. Second, in order to predict the behavior of such systems under normal, pathological or experimental conditions, one must usually resort to numerical simulation studies. Even with the tremendous progress in computing power over the last decades, intrinsically highdimensional stochastic systems remain prohibitive to simulate exhaustively. Moreover, because of their dimensionality, the results of ensembles of stochastic simulations can be challenging to interpret. Therefore, there is demand for efficient dimension reduction methods, both to provide high quality approximate numerical solutions to the stochastic evolution equations arising in highdimensional systems, and to provide an efficient conceptual framework for interpretation of the behavior of such systems.
In [14], Schmandt and Galán introduced a stochastic shielding approximation as a fast, accurate method for generating sample paths from a finite state Markov process in which only a subset of states are observable. For example, in ionchannel models, such as the Hodgkin–Huxley or other conductancebased neural models, a nerve cell has a population of ion channels whose configurational states comprise the nodes of a graph, only some of which allow a transmembrane current to pass. That is, each vertex of the ionchannel state graph is labeled with a scalar “conductance”, which is either zero (nonconducting) or one (conducting). In a population of ion channels, the flux of individual channels making the transition from a state i to a state j is a stochastic process with mean rate, and it has fluctuations around the mean rate that depend on the population at state i. The stochastic shielding approximation consists of neglecting fluctuations associated with edges in the graph not directly affecting the observable states. Specifically, the random fluxes along edges connecting identically labeled states are replaced by the mean fluxes along those edges, while the random fluxes associated with edges connecting distinguishable states are left unchanged. This approximation is an example of complexity reduction, in the sense of reducing a stochastic process generated by K independent processes to a process on a smaller sample space, i.e. generated by ${K}^{\prime}<K$ processes. Schmandt and Galán observe that, remarkably, the variance of the observable state (the membrane conductance) is almost identical in the reduced and the unreduced system.^{Footnote 1} While the approximate process does not faithfully reproduce all aspects of the full process, it reproduces those features relevant to the neurophysiologist as well as to the larger biological system in which it is embedded.
Here we consider the problem of finding the optimal complexity reducing mapping from a stochastic process on a graph to an approximate process on a smaller sample space, as determined by the choice of a particular linear measurement functional on the graph. The partitioning of ionchannel states into conducting versus nonconducting states provides a case in point. In this paper we establish that Schmandt and Galán’s approximation is in fact optimal in a specific sense. We derive a quantitative measure of the contributions of individual edges in the graph to the accuracy of the approximation, relative to the chosen measurement functional. This approach allows quantitative comparison of edge importance, and sheds light on the parametric dependence of relative edge importance, for instance in a voltagegated ion channel. In addition, we provide heuristic error estimates for the accuracy of the stochastic shielding approximation for an ensemble of symmetric random graphs.
Motivated by [14], we consider a multidimensional Ornstein–Uhlenbeck process on a graph $\mathcal{G}=(\mathcal{V},\mathcal{E})$ with n nodes and m edges (reactions), and a linear measurement functional $M\in {\mathbb{R}}^{n}$. We show that the stochastic shielding approximation is the most accurate dimension reduction possible among those neglecting fluctuations in the same number of underlying processes. Neglecting a set of reactions in the full stochastic process X creates an approximate process $\tilde{X}$ which matches the behavior of the full process in the mean but deviates from the full process in the fluctuations.
Extending this idea for an ensemble of symmetric directed graphs $\mathcal{G}=(\mathcal{V},\mathcal{E})$, we establish two main results. Lemma 1, our first main result, allows us to find the optimal complexity reducing mapping from a stochastic process on a graph to an approximate process on a smaller sample space, as determined by the measurement M. Neglecting the fluctuations associated with a subset ${\mathcal{E}}^{\prime}$ of the edge set ℰ defines a new process $\tilde{X}(t)$ that deviates from the full process $X(t)$ by an amount that we call the deficiency, $U(t)=\tilde{X}(t)X(t)$. The observed error, given M, is then ${M}^{\u22ba}U$; its mean is zero by construction, and its variance is $R=E[{({M}^{\u22ba}U)}^{2}]$. In Lemma 1 we provide an exact formula for the contribution of the k th edge to this error. This formula, which arises from a spectral decomposition of the graph Laplacian associated with the full process, gives an explicit criterion for choosing the k most important edges in the graph, for any $0<k<m$.
Our second main result, Theorem 2, applies this criterion to networks generated from a broad class of random graph ensembles with a randomly chosen binary measurement vector M. We show that the importance measures of individual edges cluster tightly around one of two values. For moderately large graphs, these clusters correspond with very high accuracy to Schmandt and Galán’s stochastic shielding heuristic; an extremely accurate, reduced complexity approximation is obtained by neglecting fluctuations associated with edges connecting states that are indistinguishable under the measurement M. We illustrate this result with a sample from the Erdös–Rényi random graph ensemble in Sect. 3.3.
The analysis of Schmandt and Galán focused on an accurate, efficient approximation of Markov processes arising from ionchannel models. In Sect. 4 we apply our analysis to processes on two graphs arising from the classical Hodgkin–Huxley system of ion channels: the 5state model for the voltagegated potassium channel, and the 8state model for the voltagegated sodium channel. In a more general setting, the transition rates connecting adjacent states in these models are voltagedependent. Here we restrict attention to the stationary case, corresponding biologically to the behavior of the channels under “voltage clamped” conditions. For both the voltagegated potassium and voltagegated sodium channel state graphs we show that our ranking reproduces the Schmandt–Galán stochastic shielding heuristic over all physiologically relevant voltages. This example also demonstrates that our results apply to graphs with nonsymmetric adjacency matrices, as well as to the symmetric case.
In Sect. 5 we discuss possible extensions of our results to examples including signal transduction networks and calciuminduced calcium release models, as well as systems with graded rather than binary measurement functionals.
2 Model
2.1 Connection to the Population Process
We develop our results in the context of stationary Ornstein–Uhlenbeck processes. In contrast, Schmandt and Galán [14] introduced stochastic shielding in the broader context of density dependent random walks on a graph from which our OU process arises as a large population approximation. To set the stage before moving to the OU process framework, we briefly describe a population process on a graph of the type considered by Schmandt and Galán. In particular, we consider a stationary stochastic process on a directed graph $\mathcal{G}=(\mathcal{V},\mathcal{E})$ where $\mathcal{V}=n$ and $\mathcal{E}=m$, the number of nodes and edges in the graph, respectively. Each directed edge corresponds to one reaction in the system. The k th edge $ij(k)=(i(k),j(k))\in \mathcal{E}$ is defined to start at node $i(k)$ and end at node $j(k)$, so that the k th reaction effects a transition from state i to state j. Following [15, 16], we let ${\zeta}_{k}$ be the stoichiometry vector associated with edge $ij(k)\in \mathcal{E}$. That is, the i th component of ${\zeta}_{k}$ is −1, the j th component is 1, and all other components are zero.
Under stationary conditions, such as a population of ion channels under voltage clamp, the occupancy numbers of different states of a continuous time Markov process can be represented as the solution of the stochastic equation obtained from a random time change representation in terms of Poisson processes [17]. If ${\alpha}_{k}$ gives the instantaneous per capita transition rate from state $i(k)$ to state $j(k)$, then the full Markov process is specified by a collection of independent standard (unit rate) Poisson processes ${Y}_{k}$ each representing the occurrence of $i(k)\to j(k)$ transitions as follows. Letting $N(t)\in {\mathbb{N}}^{n}$ be the nonnegative integervalued vector representing the number of individuals in each of n states, we may write $N(t)$ as a sum of transitions occurring at random times specified by the collection of ${Y}_{k}$.
Because each transition preserves the total number of individuals (i.e. the components of ${\zeta}_{k}$ sum to zero for each k), we have ${\sum}_{i}{N}_{i}(t)={N}_{\mathrm{tot}}={\sum}_{i}{N}_{i}(0)$ for all $t>0$.
In Appendix B we show that, provided ${N}_{\mathrm{tot}}$ is sufficiently large, we can approximate the deviation of $N(t)$ from its mean $\overline{N}\in {\mathbb{R}}^{n}$ by a multidimensional, Gaussian, Ornstein–Uhlenbeck process $X(t)\in {\mathbb{R}}^{n}$, $X(t)\approx N(t)\overline{N}$ which satisfies a stochastic differential equation of the form given in Eq. 4 below. In particular, we show that $X(t)$ can be approximated by an SDE of the form
2.2 Multidimensional Ornstein–Uhlenbeck Process
To obtain our main mathematical result, we consider a multidimensional Ornstein–Uhlenbeck process $X\in {\mathbb{R}}^{n}$ on the directed graph $\mathcal{G}=(\mathcal{V},\mathcal{E})$ where $\mathcal{V}=n$ and $\mathcal{E}=m$. The state of the system at time t, $X(t)$, satisfies Eq. 3, which we write in the equivalent form
Here $L={(AD)}^{\u22ba}$ is the graph Laplacian (A is the weighted adjacency matrix of with entries ${A}_{ij}={\alpha}_{k}>0$ if there is an edge from node $i(k)$ to $j(k)$ and zero otherwise, and D is the diagonal matrix such that entry ${D}_{ii}={\sum}_{j}{A}_{ij}$ is the outdegree of node i). B is an $n\times m$ matrix, and $W\in {\mathbb{R}}^{m}$ is an mdimensional Brownian motion, i.e. each component $d{W}_{k}$ represents the increment of an independent standard Brownian motion capturing the fluctuations of the k th reaction about its mean.^{Footnote 2} Matrix B decomposes into a sum over the m reactions
such that the k th column of matrix ${B}_{k}={\sigma}_{k}{\zeta}_{k}$ and all other columns of ${B}_{k}$ are zero.
The stochastic shielding approximation for a system of the form given in Eq. 4 amounts to preserving the mean, but neglecting the fluctuations, for the processes driving a subset of the reactions, i.e. replacing B with an alternative matrix $\tilde{B}$ obtained by replacing a subset of columns in B with null vectors. The trajectories of the resulting SDE, $\tilde{X}(t)$ (see Eq. 7), are approximations of the trajectories of the full system.
In order to compare different complexity reduction choices, we define the deficiency of an approximation to be the difference between the true and approximate trajectories, $U(t)=\tilde{X}(t)X(t)$, when projected onto the measurement functional of interest M. As suggested by Schmandt and Galán, the stationary variance of the projection of the deficiency on the measurement vector provides an appropriate measure for comparing the quality of alternative reductions. That is, we use $R=Var[{M}^{\u22ba}U]=Var[{M}^{\u22ba}(\tilde{X}X)]$ as our error measure. We focus on reductions that preserve the behavior of the system (Eq. 4) relative to a given linear measurement functional $M\in {\mathbb{R}}^{n}$. In the case of ion channels, $M\in {\{0,1\}}^{n}$ represents the conductance of each channel state. We consider the case of graded rather than binary measurements in Sect. 5. Whether binary or graded, the measurement vector identifies the stochastic process of interest as the projection $Y(t)={M}^{\u22ba}X(t)$.
Formally, we consider two processes $X(t)$ (full process) and $\tilde{X}(t)$ (reduced process) defined on a common probability space $(\Omega ,{\mathcal{F}}_{t},P)$. The sample space $\Omega =C{[0,\mathrm{\infty})}^{n}$, filtration ${\mathcal{F}}_{t}$, and Wiener measure P are those associated with m independent copies of the standard Brownian process. The approximate process $\tilde{X}(t)$ has the same sample space Ω and is measurable with respect to the same filtration ${\mathcal{F}}_{t}$, but also with respect to a smaller filtration ${\tilde{\mathcal{F}}}_{t}\subset {\mathcal{F}}_{t}$ generated by the Wiener processes associated with a subset of edges of the graph. The covariance of the deficiency, then, is well defined in terms of the underlying measure P on the full probability space.
In Appendix C.1 we show the standard result [18] that the stationary covariance matrix of the full process decomposes into a sum of the contributions from the m different reaction processes:
Similarly, the variance of the projection $Y(t)={M}^{\u22ba}X(t)$ also decomposes into a sum, because $Var[Y]={M}^{\u22ba}Cov[X]M$.
Because the (left) eigenvector corresponding to the leading (0) eigenvalue of L has constant components, it is orthogonal to ${\zeta}_{k}$ for each k. (If L is symmetric, the right and left eigenvectors are interchangeable.) Therefore the corresponding eigenspace is contained in the kernel of the matrix ${B}_{k}{B}_{k}^{\u22ba}$, for each k, which guarantees that the limit on the RHS of Eq. 6 remains finite.
Neglecting a set of reactions ${\mathcal{E}}^{\prime}\subset \mathcal{E}$ creates an approximate processes, $\tilde{X}(t)$, which matches the behavior of the full process in the mean, but deviates from the full process in the fluctuations. This reduced process satisfies the following SDE
where $\tilde{B}={\sum}_{k\in \mathcal{E}\mathrm{\setminus}{\mathcal{E}}^{\prime}}{B}_{k}$ sums over the edges we keep. Given the linear measurement functional $M\in {\mathbb{R}}^{n}$ above, we define the approximate projection $\tilde{Y}(t)={M}^{\u22ba}\tilde{X}(t)$. Note that in the case of an ionchannel system, M is binary so Y and $\tilde{Y}$ just pull out the observable (i.e., conducting) states of each system. In Sect. 2.3, for instance, we consider a 3state chain with one observable state (state 3) as a simple model of an ionchannel system. In that case, $M={[0,0,1]}^{\u22ba}$ and $Y(t)={M}^{\u22ba}X(t)={X}_{3}(t)$.
Neglecting a subset of reactions also introduces an error in the representation of the measurement $Y(t)$ versus $\tilde{Y}(t)$ due to the difference between $X(t)$ and $\tilde{X}(t)$. Recall that $U(t)=\tilde{X}(t)X(t)$ is the deficiency of the reduced model compared to the full model. Then $\tilde{Y}(t)Y(t)={M}^{\u22ba}U(t)$, and $U(t)$ satisfies the SDE
It is important to note that the noise source dW that appears in Eqs. 4 and 7 refers to the same noise process W in both cases. The deficiency of the approximation relative to the full process is given by taking the limit of the mean squared error (MSE) of $\tilde{Y}Y$ (equivalent to the stationary variance of $\tilde{Y}Y$), which, as shown in the proof of Lemma 1, is an expression of the sum over all neglected reactions.
Lemma 1 For an irreducible graph with a symmetric Laplacian L, let X and $\tilde{X}$ be the full and reduced processes defined by Eqs. 4 and 7, respectively, and let $M\in {\mathbb{R}}^{n}$. Let ${\mathcal{E}}^{\prime}\subset \mathcal{E}$ be the subset of edges neglected in the definition of $\tilde{X}$. Let L be diagonalizable with eigenpairs ${\{({\lambda}_{i},{v}_{i})\}}_{i=1}^{n}$ listed with eigenvalues ${\lambda}_{i}$ in order of decreasing real part and ${\parallel {v}_{i}\parallel}_{2}=1$. Then the stationary variance of the discrepancy $\tilde{Y}Y={M}^{\u22ba}(\tilde{X}X)$ satisfies
where
We can rank the error terms ${R}_{k}$ in descending order, thereby ordering the corresponding reactions in terms of their “importance”. The most important reaction is the one with the largest value of ${R}_{k}$; if neglected, it would introduce the largest error. See Appendix C.2 for the proof of Lemma 1. Note that an individual term in the sum (10) will be zero if either ${\zeta}_{k}\perp {v}_{i}$ or if $M\perp {v}_{i}$ for a given eigenvector ${v}_{i}$. Typically, however, these vectors will not be orthogonal. Therefore, it is of interest to know how the values of ${R}_{k}$ are distributed for different examples: graphs of actual ionchannel states such as those in the classical Hodgkin–Huxley model, and more generally, ensembles of random graphs. In Sect. 4, we compute the distribution of ${R}_{k}$ for the graphs of the potassium and sodium channel states in the Hodgkin–Huxley model. In Sect. 3, we consider an ensemble of random graphs such as the Erdös–Rényi ensemble with randomly assigned binary measurement vector M and prove our main result, which is a statement about the expected value of ${R}_{k}$. Should our random graph ensemble produce a graph that does not consist of a single connected component, then we may apply Lemma 1 to each isolated component of the graph separately. However, for the random graph ensembles we consider, the probability of drawing a disconnected graph decays very rapidly as $n\to \mathrm{\infty}$. We discuss this point further in Appendix D.
For a random graph ensemble, the eigenvectors of the graph Laplacian are distributed randomly on the unit sphere [19, 20]. Hence, they are unlikely to be exactly orthogonal to either ${\zeta}_{k}$ or M. Given a series of assumptions (see Sect. 3.1) that are true for naturally occurring random ensembles such as the symmetric Gaussian and Erdös–Rényi ensembles, we state our main result.
Theorem 2 Given an ensemble of symmetric directed graphs $\mathcal{G}=(\mathcal{V},\mathcal{E})$ with n nodes satisfying assumptions A0–A5 (see Sect. 3.1), a binary measurement vector $M\in {\{0,1\}}^{n}$ satisfying $0<{\sum}_{i}{M}_{i}\sim O(1)$ as $n\to \mathrm{\infty}$, and a stoichiometry vector ${\zeta}_{k}$ corresponding to the kth reaction, the mean squared error ${R}_{k}$ resulting from neglecting the kth reaction has expected value
where the constant C depends on the mean edge weight.
This result shows that the edges in the graph naturally decompose into two classes, distinguished by their asymptotic behavior for large n. The first class of edges represents connections between differently labeled nodes, in terms of the measurement vector M. The first class comprises the “important” edges in the graph, in the sense that these edges have mean ${R}_{k}$ values that scale as order $1/n$. The second class of edges connects identically labeled nodes. These edges have mean ${R}_{k}$ values of order less than ${n}^{q}$, where $q>1$ is driven by the fourth moment of the eigenvector components (see assumption A4a in Sect. 3.1 for details). As n increases, these edges become relatively “unimportant” and, hence, can be neglected under the stochastic shielding approximation with minimal loss of accuracy. For the case of the Gaussian ensemble, $q=2$. Empirically, for the Erdös–Rényi random graph ensemble, $q\approx 5/3$ (see discussion in Sect. 3.3 and also Fig. 4). The proof of Theorem 2 is given in Sect. 3.2. Before discussing more complicated examples, we illustrate the decomposition of the full process into approximate subprocesses for a simple 3state example in the next subsection.
2.3 3State Example
We illustrate Schmandt and Galán’s [14] stochastic shielding heuristic with the following simple example they considered. Figure 1 shows a 3state chain which has adjacency matrix entries ${A}_{ij}={\alpha}_{k}=1$ if there is an edge from $i(k)$ to $j(k)$ and zero otherwise. State 3 is designated as the only observable state. We think of this as the conducting state in an ionchannel model. Table 1 illustrates the notation introduced in Eq. 1 for this case.
In this case, we suppose ${\sigma}_{k}=1$ in the matrix B and use the linear measurement functional $M={[0,0,1]}^{\u22ba}$ to pull out the third component of $X(t)$, yielding the projection $Y(t)={M}^{\u22ba}X(t)={X}_{3}(t)$. The vector $X(t)={({X}_{1}(t),{X}_{2}(t),{X}_{3}(t))}^{\u22ba}$ gives the occupancy of the system states at time t and satisfies the constant coefficient SDE given in Eq. 4 with
where the ${W}_{k}(t)$ are independent and identically distributed standard Brownian motions, and
Since we are assuming ${\sigma}_{k}=1$ for all k, the k th column of B is exactly the stoichiometry vector associated with the k th reaction, and in particular, ${B}_{k}{B}_{k}^{\u22ba}={\zeta}_{k}{\zeta}_{k}^{\u22ba}$.
The full process $X(t)$ has four stochastic transitions and a reduced process $\tilde{X}(t)$ is defined by keeping a subset of the four stochastic transitions. We use the notation $\tilde{X}={X}_{(i,j,k)}$ to explicitly define which columns of the full matrix B are neglected in the approximate process, i.e. which stochastic transitions are neglected. We are interested in the accuracy of the approximation of the trajectory itself.
Figure 2 illustrates the deficiency ${U}_{(i,j)}(t)={X}_{(i,j)}(t)X(t)$ between the full process and all possible two noise source reductions ${X}_{(i,j)}$ on the 3state chain, as projected onto each of the three components in the system. The “optimal complexity reduction” is not well defined in general because it is underspecified. For example, asking to reduce the norm of the deficiency U while eliminating two of the four noise sources gives no preference between the six possible reductions. Asking for the best reduction to preserve a specific component may give an answer: to preserve the trajectory as projected onto the first component, keep the two noise sources that directly affect it (transitions between edges 1 and 2); for the third component, keep the other two (transitions between edges 3 and 4); for the second component there is no preference since it is affected directly by all transitions. This gives an intuitive explanation of stochastic shielding consistent with Schmandt and Galán’s explanation.
If we fix a point in the underlying sample space (a choice of four Poisson processes ${Y}_{k}(t)$ in the system $N(t)$ or a choice of four white noise processes $d{W}_{k}(t)$ in the system $X(t)$) and then choose to neglect the fluctuations in two of the four, i.e. by replacing ${Y}_{k}(t)$ with $E[{Y}_{k}(t)]$ or $d{W}_{k}(t)$ with zero, respectively, then the question is: which choice leads to the most accurate representation of the process as seen by the measurement?
By Lemma 1, we have the following expression for the edge importance terms ${R}_{k}$:
Evaluating this expression for the measurement functional $M={[0,0,1]}^{\u22ba}$ yields
Table 2 shows the stationary variance of the discrepancy ${M}^{\u22ba}{U}_{(i,j,k)}={M}^{\u22ba}({X}_{(i,j,k)}X)$ for all possible reduced processes ${X}_{(i,j,k)}$. For instance, ${X}_{(1,2)}$ is the reduced process that neglects fluctuations in reactions 1 and 2 and the stationary variance of ${M}^{\u22ba}{U}_{(1,2)}$ is ${R}_{1}+{R}_{2}=0.0833$. Note that ${X}_{(1,2)}$ is the optimal reduced process in terms of the Schmandt and Galán stochastic shielding approximation (among all approximations neglecting exactly two edges) for the 3state chain.
Figure 3 shows the mean squared error as a function of time for ${M}^{\u22ba}{U}_{(i,j)}(t)$ corresponding to the three classes of reduced processes ${X}_{(i,j)}(t)$ on the 3state chain (i.e., the classes are ${X}_{(1,2)}$, ${X}_{(3,4)}$, and $\{{X}_{(1,3)},{X}_{(1,4)},{X}_{(2,3)},{X}_{(2,4)}\}$, corresponding to the three different ${M}^{\u22ba}{U}_{(i,j)}(t)$ values shown in Table 2 above). The error function is shown with the theoretical MSE (${\sum}_{k\in {\mathcal{E}}^{\prime}}{R}_{k}$) for each case. Therefore, since $M{\zeta}_{1}=M{\zeta}_{2}=0$, $M{\zeta}_{3}=1$, and $M{\zeta}_{4}=1$ we confirm the claim made by Schmandt and Galán [14] that reactions 3 and 4 are important whereas reactions 1 and 2 are unimportant in terms of stochastic shielding for this 3state example.
3 Analysis of Stochastic Shielding for a Random Graph Ensemble
For any particular Ornstein–Uhlenbeck process on a graph, Lemma 1 provides the edge importance values ${R}_{k}$ (Eq. 10), which may be used to compute explicitly the contribution to the deficiency made by neglecting any particular reaction, relative to a given measurement vector M. In order to make general observations about the stochastic shielding approximation, we now consider an ensemble of random graphs. The proof of our main result (Theorem 2, restated below) will rely on properties of the joint distribution of components of eigenvectors of L, the graph Laplacian. Previously, we used i and j to refer to the source and destination nodes in a reaction. In this section, we will adapt the notation so that edge k is a reaction from node ${l}_{}$ to node ${l}_{+}$, denoted by ${l}_{\pm}(k)\in \mathcal{E}$ (see Eq. 17). In this section, i and j will instead index eigenvectors of L.
Because our methods combine heuristic numerical evidence with probabilistic calculations, we use “≈” to represent “heuristic equality”. Where precise order estimates are available, we use “O” notation. For the reader’s convenience, we restate Theorem 2.
Theorem 2 Given an ensemble of symmetric directed graphs $\mathcal{G}=(\mathcal{V},\mathcal{E})$ with n nodes satisfying assumptions A0–A5 (see Sect. 3.1), a binary measurement vector $M\in {\{0,1\}}^{n}$ satisfying $0<{\sum}_{i}{M}_{i}\sim O(1)$ as $n\to \mathrm{\infty}$, and a stoichiometry vector ${\zeta}_{k}$ corresponding to the kth reaction, the mean squared error ${R}_{k}$ resulting from neglecting the kth reaction has expected value
where the constant C depends on the mean edge weight.
In other words, since
reactions connecting nodes with identical values of M have a small contribution to the error, so these reactions can be neglected under the stochastic shielding approximation. This result relies on a list of assumptions which are described in detail below. The proof of this theorem requires Lemma 3, which is stated after the assumptions and proved in Appendix C.3.
3.1 Assumptions on the Random Graph Ensemble
We state a sequence of assumptions on the random graph ensemble needed to establish our main result. Each assumption is reasonable for a broad class of graphs of interest, for reasons articulated in the Remarks following each assumption. In several instances we impose on our random graph ensemble, as assumptions, properties that are known to hold for broad classes of random matrices, such as the Wigner ensemble [19, 20]. The ensemble we consider is not equivalent to a generalized Wigner ensemble. Nevertheless, for the reasons detailed below, it appears reasonable, that certain aspects of the eigenvector and eigenvalue distribution may be similar in the two cases.
We consider an ensemble of symmetric directed graphs $\mathcal{G}=(\mathcal{V},\mathcal{E})$ with $\mathcal{V}=n$. Let ${\zeta}_{k}$ be the stoichiometry vector corresponding to the k th reaction (Eq. 17) and let $({\lambda}_{i},{v}_{i})$ denote the eigenpairs of the graph Laplacian $L={(AD)}^{\u22ba}$ listed with eigenvalues in descending order. We assume that the eigenvector components are ${l}_{2}$normalized with mean 0 and variance $1/n$, and we assume the following:
A0. (Following [21].) Let ${a}_{ij}\ge 0$, the entries of the adjacency matrix, be random variables defined on a common probability space, with $\{{a}_{ij},1\le i<j\le n\}$ independent (but not necessarily identically distributed), with ${a}_{ij}={a}_{ji}$, $E[{a}_{ij}]={\mu}_{A}$, $V[{a}_{ij}]={\sigma}_{A}^{2}>0$ for all $1\le i<j\le n$, and ${sup}_{1\le i<j\le n}E({a}_{ij}{\mu}_{A})/{\sigma}_{A}{}^{\kappa}<\mathrm{\infty}$ for some $\kappa >0$.
A1a. The graph is drawn from a random ensemble with the property that the eigenvalues ${\lambda}_{i}$ and eigenvectors ${v}_{i}$ of the associated graph Laplacian are nearly independent. That is, for any $i,j,k,l\in \{1,\dots ,n\}$ and arbitrary measurable functions $f:{\mathbb{R}}^{2}\to \mathbb{R}$ and $g:{\mathbb{R}}^{n}\times {\mathbb{R}}^{n}\to \mathbb{R}$
Remark 1a: Assumption A1 holds for the symmetric Gaussian ensemble as well as for the more general Wigner ensemble [19, 20]. Indeed for these ensembles the eigenvalues and eigenvectors are independent. The weaker assumption, that they are at most weakly correlated, appears reasonable for e.g. the ensemble of graph Laplacians obtained from the symmetric Erdös–Rényi random graph ensemble.
A1b. The graph is drawn from a random ensemble with the property that the joint (eigenvalue, eigenvector) distribution is nearly invariant under permutation of eigenvectors. That is, for any $i,j,k,l\in \{1,\dots ,n\}$
Remark 1b: The symmetric Gaussian and Wigner ensembles are fully invariant under permutation of eigenvectors, and the weaker assumption of near invariance appears reasonable for the Erdös–Rényi ensemble. In particular, the pair $(\frac{1}{{\lambda}_{i}+{\lambda}_{j}})$, $({M}^{\u22ba}{v}_{i}{v}_{i}^{\u22ba}{\zeta}_{k}{\zeta}_{k}^{\u22ba}{v}_{j}{v}_{j}^{\u22ba}M)$ appearing in the definition of ${R}_{k}$ (Lemma 1) are assumed to be approximately uncorrelated. This assumption is reasonable by virtue of the approximate rotational symmetry of the eigenvector distribution under our choice of random graph model, which we expect to be close (heuristically) to the eigenvector distribution of the symmetric Gaussian ensemble [19, 20].
A2. $E[{v}_{i}(l)]=0$ for any $i,l\in \{1,\dots ,n\}$ where ${v}_{i}(l)$ denotes the l th component of the i th eigenvector. Remark 2a: Note that $E[{v}_{i}{(l)}^{2}]=1/n$ by the ${l}_{2}$normalization of the eigenvectors because ${\parallel v\parallel}_{2}=\sqrt{{\sum}_{l=1}^{n}v{(l)}^{2}}=1$ for each eigenvector v. This normalization leaves a 2fold ambiguity in the choice of eigenvector v. Since +v and −v both have ${\parallel v\parallel}_{2}=1$, we choose randomly between them so that the first nonzero component is positive with probability $1/2$.^{Footnote 3} Remark 2b: By the symmetry of our random graph ensemble under the symmetric group acting on the change of labels, assumption A2 holds not just for the Gaussian and Wigner ensembles, but for any reasonable symmetric ensemble. In particular, it holds for the symmetric Erdös–Rényi random graph ensemble.
A3. For any $i,j\in \{2,\dots ,n\}$ and $l,{l}^{\prime}\in \{1,\dots ,n\}$,

a.
$$E[{v}_{i}(l){v}_{j}({l}^{\prime})]=O({n}^{3})$$
as $n\to \mathrm{\infty}$, for $i\ne j$.

b.
$$E[{v}_{i}(l){v}_{i}({l}^{\prime})]=O({n}^{2})$$
as $n\to \mathrm{\infty}$, for $l\ne {l}^{\prime}$.
Remark 3: Figure 4 provides numerical evidence for the plausibility of assumption A3 in the Erdös–Rényi case. As described in the figure, the empirical expectation of ${v}_{i}(l){v}_{i}({l}^{\prime})$ scales as $O({n}^{2})$ for $10\le n\le 1000$; over this range the empirical expectation of ${v}_{i}(l){v}_{j}({l}^{\prime})$, $i\ne j$, is within machine error (≤10^{−19}) of zero.
A4. For any $i\in \{2,\dots ,n\}$ and $l,{l}^{\prime}\in \{1,\dots ,n\}$,

a.
$$E[{v}_{i}{(l)}^{4}]=O({n}^{q})$$
as $n\to \mathrm{\infty}$, for some $q>1$.

b.
$$E[{v}_{i}{(l)}^{2}{v}_{i}{({l}^{\prime})}^{2}]=O({n}^{2})$$
as $n\to \mathrm{\infty}$, for $l\ne {l}^{\prime}$.
Remark 4: Assumption A4a holds for the Gaussian case for $q=2$. For the Erdös–Rényi case, empirically we see that assumption A4a holds for $q\approx 5/3$ as shown in Fig. 4. Specifically, empirical evidence suggests that $E[{v}_{i}{(l)}^{4}]\approx \sqrt{2}{n}^{5/3}$ in this case.
A5. Suppose that ${p}_{1}$, ${p}_{2}$, ${p}_{3}$, and ${p}_{4}$ are nonnegative integers with ${\sum}_{m=1}^{4}{p}_{m}=4$, at least three of which are nonzero. Then for any $i\in \{2,\dots ,n\}$ and for any distinct components $\{{l}_{1},{l}_{2},{l}_{3},{l}_{4}\}$
Remark 5: The reason for this assumption will become clear in the proof of Theorem 2. It is similar in spirit to the four moment theorem for eigenvector components of a Wigner or Gaussian random matrix, different versions have been established by Tao and Vu [20] and Knowles and Yin [19]. Figure 4 provides numerical evidence for the plausibility of assumption A5 in the Erdös–Rényi case.
In addition to assumptions A0–A5 on the random graph ensemble, the statement of Theorem 2 places an assumption on the measurement vector $M\in {\{0,1\}}^{n}$. This vector contains ${n}_{1}>0$ ones and ${n}_{0}>0$ zeros such that ${n}_{1}+{n}_{0}=n$. We assume ${n}_{1}=O(1)$ as $n\to \mathrm{\infty}$, that is, we exclude the case where ${n}_{1}$ grows without bound as n grows. (If M has the same value for all nodes, the output is constant and the error is identically zero. The expression in Theorem 2 holds trivially so we ignore this case.)
To motivate this assumption, Table 3 shows the total number of states (n) and the number of conducting states (${n}_{1}$) for representative ionchannel models. Model refinements driven by empirical evidence have tended to increase the total number of states relative to Hodgkin and Huxley’s original model, without significantly increasing the number of conducting states.
Although assuming that ${n}_{1}=O(1)$ is biologically plausible, we make this assumption mainly for technical reasons as indicated in the proof of Theorem 2. We note, however, that in the numerical example in Sect. 3.3, the conclusions of Theorem 2 appear to hold equally well when ${n}_{1}={n}_{2}=n/2$.
Lemma 3 If assumptions A0–A5 hold and $M\in {\{0,1\}}^{n}$ satisfies $0<{\sum}_{i}{M}_{i}\sim O(1)$ as $n\to \mathrm{\infty}$. Then as $n\to \mathrm{\infty}$,

A.
$$E[{M}^{\u22ba}{v}_{i}{v}_{i}^{\u22ba}{\zeta}_{k}]=E[{\sum}_{l\in {1}_{M}}{v}_{i}(l)({v}_{i}({l}_{+}){v}_{i}({l}_{}))]=\frac{1}{n}{M}^{\u22ba}{\zeta}_{k}+O({n}^{2})$$
.

B.
$$E{[{M}^{\u22ba}{v}_{i}{v}_{i}^{\u22ba}{\zeta}_{k}]}^{2}=E{[{\sum}_{l\in {1}_{M}}{v}_{i}(l)({v}_{i}({l}_{+}){v}_{i}({l}_{}))]}^{2}=\frac{1}{{n}^{2}}{M}^{\u22ba}{\zeta}_{k}+O({n}^{4})$$
.

C.
$$E[{({M}^{\u22ba}{v}_{i}{v}_{i}^{\u22ba}{\zeta}_{k})}^{2}]=E[{({\sum}_{l\in {1}_{M}}{v}_{i}(l))}^{2}{({v}_{i}({l}_{+}){v}_{i}({l}_{}))}^{2}]=O({n}^{q})$$
for some $q>1$.
Note that the exponent $q>1$ in part C is governed by the fourth moment of the eigenvector components of the graph Laplacian (see assumption A4a). The proof of Lemma 3 is given in Appendix C.3.
3.2 Proof of Main Theorem
Suppose assumptions A0–A5 hold and $M\in {\{0,1\}}^{n}$ satisfies $0<{\sum}_{i}{M}_{i}\sim O(1)$ as $n\to \mathrm{\infty}$. By Lemma 1, ${R}_{k}$ denotes the contribution of the k th reaction to the deficiency of the approximate process. Given the measurement vector M, we have (exactly)
This expectation is taken over the space of symmetric directed graphs $\mathcal{G}=(\mathcal{V},\mathcal{E})$ where edge k is chosen at random from the set of $\left(\begin{array}{c}n\\ 2\end{array}\right)$ possible bidirectional edges. If ${l}_{\pm}(k)\notin \mathcal{E}$, then $E[{R}_{k}M]=0$.
If the graph Laplacian were drawn from a symmetric Gaussian ensemble (or Wigner ensemble; see [19, 20]), then the eigenvalues and the eigenvectors would be independent. For other ensembles we impose the weaker condition of near independence (assumption A1a), which in this case means that for each $i\ge 2$ and $j\ge 2$, we assume
Under assumption A1b, the joint distribution of eigenvalues and eigenvectors is approximately separable into the product of two measures, one for the eigenvalues and a second for the eigenvectors. In this case the expectation $E[(\frac{1}{{\lambda}_{i}+{\lambda}_{j}})]$ in the sum (23) can be replaced by its average,
to obtain
As shown in [21], assumption A0 implies that the empirical eigenvalue distribution for the graph Laplacian L,
converges weakly (with probability one) as $n\to \mathrm{\infty}$ to the free convolution γ of the semicircle law, ${\rho}_{\mathrm{sc}}(x)=\frac{1}{2\pi}\sqrt{4{x}^{2}}I(x\le 2)$, with the standard Gaussian, $g(x)=exp[{x}^{2}/2]/\sqrt{2\pi}$. The measure γ becomes concentrated around ${\lambda}_{i}\approx n{\mu}_{A}$ as n grows. In particular, most terms in the sum (Eq. 25) concentrate around $1/(2n{\mu}_{A})$, as $n\to \mathrm{\infty}$. Therefore, by imposing assumption A0 and setting $C=2{\mu}_{A}$, we have $E[S]\to 1/(nC)$, as $n\to \mathrm{\infty}$, yielding in the limit
For the Erdös–Rényi ensemble with n nodes and edge probability p, we have $E[S]\to 1/(nC)$ for $C=2p$. Figure 5 shows that the sample mean of S over 10 realizations (i.e. 10 different Erdös–Rényi random graph configurations with the same parameters) rapidly approaches $1/(2pn)$, as n increases, for values of p ranging from 0.3 to 0.9. As the factor of $1/n$ is common across all k, it does not affect the stochastic shielding argument.
To prove Theorem 2, we will show that
for some $q>1$, corresponding to the parameter q appearing in assumption A4. This dichotomy is the basis for neglecting the edges k such that ${M}^{\u22ba}{\zeta}_{k}=0$, as in the stochastic shielding approximation. To do this, we will use assumption A3a and Lemma 3 to show the following:
It suffices to show that the first term in Eq. 32 is
and the second term is
Starting with the first term in Eq. 31, it follows from assumption A3a that, as $n\to \mathrm{\infty}$,
which means
We can expand the left hand side of Eq. 34 by using the definitions ${M}^{\u22ba}{v}_{i}={\sum}_{l\in {1}_{M}}{v}_{i}(l)$ and ${v}_{i}^{\u22ba}{\zeta}_{k}={v}_{i}({l}_{+}){v}_{i}({l}_{})$, which yield
By Lemma 3 part B, we have that $E{[{\sum}_{l\in {1}_{M}}{v}_{i}(l)({v}_{i}({l}_{+}){v}_{i}({l}_{}))]}^{2}=\frac{1}{{n}^{2}}{M}^{\u22ba}{\zeta}_{k}+O({n}^{4})$, as $n\to \mathrm{\infty}$. Continuing Eq. 40 above we have
as $n\to \mathrm{\infty}$, which establishes the first term (Eq. 34).
We now focus on the second term in Eq. 32. In Lemma 3 part C, we establish that as $n\to \mathrm{\infty}$
Hence, $(n1)E[{({\sum}_{l\in {1}_{M}}{v}_{i}(l))}^{2}{({v}_{i}({l}_{+}){v}_{i}({l}_{}))}^{2}]=O({n}^{1q})$ as $n\to \mathrm{\infty}$, which establishes the second term (Eq. 35). Therefore, we have established Theorem 2.
3.3 Symmetric Erdös–Rényi Random Graph Ensemble
Many varieties of random graphs have been used to describe biological systems [26, 27]. Here, we restrict attention to an ensemble of symmetric Erdös–Rényi random graphs $\mathcal{G}(n,p)$ on n nodes, for which each of $({n}^{2}n)/2$ possible bidirectional edges occurs independently with probability p [28, 29]. Consider a graph drawn from the Erdös–Rényi ensemble for $n=50$ and $p=0.5$. See Fig. 6 for an example. Take A to be the unweighted adjacency matrix (${\alpha}_{k}\in \{0,1\}$) and let ${\sigma}_{k}=1$ for all reactions k so that the k th column of the matrix B is exactly the stoichiometry vector for reaction k. Specifying any measurement vector $M\in {\{0,1\}}^{50}$ induces a partition of edges into “important” (type 0–1) or “unimportant” (types 0–0 or 1–1) classes. Let ${\mathcal{E}}_{I}$ be the set of important edges and ${\mathcal{E}}_{U}$ be the set of unimportant edges. Clearly, $\mathcal{E}={\mathcal{E}}_{I}\cup {\mathcal{E}}_{U}$. In the following example, we consider a vector M such that half the entries are 1 and other half are 0.
Theorem 2 says that if the matrix of eigenvector components of the Erdös–Rényi graph Laplacian is sufficiently similar to a random matrix drawn from the Gaussian ensemble (in terms of assumptions A0–A5) then one would expect the partitioning of the ${R}_{k}$ into two clusters. One cluster, containing the important edges, will be centered at $1/n$. A second cluster, containing the unimportant edges, will have smaller ${R}_{k}$ values ($O({n}^{q})$ where $q>1$ is governed by the fourth moment; see assumption A4a in Sect. 3.1). To the extent to which this similarity to the Gaussian ensemble holds, our calculation of ${R}_{k}$ involves projecting the measurement vector M and the vectors ${\zeta}_{k}$ onto randomly chosen subspaces of ${\mathbb{R}}^{n}$.
As shown in Fig. 4, assumptions A0–A5 appear to be satisfied for the symmetric Erdös–Rényi random graph ensemble. In particular, the fourth moment of the eigenvector components (assumption A4a) appears to hold empirically for $q\approx 5/3$; in particular, we find that, empirically, $E[{v}_{i}{(l)}^{4}]\approx \sqrt{2}{n}^{5/3}$. This behavior suggests that the unimportant edges should have a mean ${R}_{k}$ value $\lesssim \sqrt{2}{n}^{5/3}$. Setting $n=50$, for example, we would expect one cluster of ${R}_{k}$ values centered at $1/50=0.02$ for $k\in {\mathcal{E}}_{I}$ and another cluster close to $\sqrt{2}\cdot {50}^{5/3}\approx 0.0021$ for $k\in {\mathcal{E}}_{U}$. Figure 7 shows the rank order of edge importance values ${R}_{k}$ corresponding to the m reactions in the Erdös–Rényi random graph. The top cluster is centered at 0.02 (upper horizontal red line) and the bottom cluster is bounded above by 0.0021 (lower horizontal red line) consistent with Theorem 2 for the Erdös–Rényi random graph ensemble with 50 nodes and edge probability $p=0.5$. Since the measurement functional M is binary, we see a significant gap between the two clusters, as expected. If the components of M are graded, i.e. drawn uniformly from the unit interval, then this curve appears to be smooth (see discussion in Sect. 5).
Figure 8 illustrates the distribution of eigenvector components of the Erdös–Rényi graph Laplacian in comparison with a Gaussian random matrix (i.e., each entry has mean 0 and variance $1/n$). The quantile–quantile plots show good agreement within one standard deviation and begin to deviate in the second standard deviation. This is consistent with the observation that the fourth moment in the Erdös–Rényi case deviates from the Gaussian case ($q\approx 5/3$ for Erdös–Rényi and $q=2$ for Gaussian). Nevertheless, Theorem 2 predicts that there will be two clusters of ${R}_{k}$ values as described above and shown in Fig. 7 for the Erdös–Rényi case with $n=50$ and $p=0.5$.
4 Application: Stochastic Shielding of Hodgkin–Huxley Channels Under Voltage Clamp
Hodgkin and Huxley’s (HH) model for the generation and propagation of action potentials along the giant axon of the squid Loligo lies at the foundations of modern neuroscience [22, 30]. In the classic HH model, action potentials are generated through the interaction of a leak current and two voltagegated ionic currents, carried by a sodium ion specific channel and a potassium ion specific channel. The potassium channel comprises four identical subunits that open and close independently with voltagedependent rates. The channel carries a current when all four subunits are in the open state. At the molecular level, a single channel can be represented as a continuous time Markov jump process on a chain of five states, the fifth of which has nonzero conductance. Of the eight transitions connecting states along this chain, only the last two connect states with different conductances, therefore the stochastic shielding approximation would preserve the fluctuations of these transitions and not the other six.
The sodium channel involves two types of subunits, an activation subunit (“m”) present in three identical copies, and an inactivation subunit (“h”) present in a single copy.^{Footnote 4} The resulting graph has eight distinct states connected by 20 different transitions, each occurring with a voltagedependent rate [31–33]. Four of these 20 transitions connect states with differing conductance values (zero versus nonzero); the fluctuations of the remaining 16 transitions are ignored under the stochastic shielding approximation.
Schmandt and Galán compared simulations of a system comprising 5000 individual potassium channels and 25000 individual sodium channels, both with and without the stochastic shielding approximation. It is possible to construct an exact simulation scheme, analogous to Gillespie’s stochastic simulation algorithm [34], that takes into account the nonstationarity of the transition rates (propensities) arising from their voltage dependence [35]. However, Schmandt and Galán used a discrete time approximation to this process. Appendix A discusses Schmandt and Galán’s approach in more detail. Here we apply our analysis to evaluate the edge importance ${R}_{k}$ of each transition in the graph for the classic HH potassium and sodium channels, respectively. Rather than consider the case of timevarying transition rates, we restrict attention to the “voltage clamped” case. If the membrane potential is experimentally held constant for a given cell, the per capita transition rates remain constant and the fluctuating ionchannel population forms a stationary Markov process. In particular, our analysis approximates this stationary population process with a linear multidimensional Ornstein–Uhlenbeck process (see Appendix B); this approximation is reasonable given the large numbers of individual channels considered in Schmandt and Galán’s simulations.
In general, the ionchannel state graphs for the potassium and sodium channels in the HH model have graph Laplacians L that are not symmetric. Therefore, we need to modify our definition of the edge importance ${R}_{k}$ (Eq. 10) in order to apply our results. When L is not symmetric, we will assume that L is nevertheless diagonalizable, i.e. that there are eigenvalues ${\lambda}_{i}$ and a biorthogonal system of vectors ${v}_{i}$, ${w}_{i}$ (right and left eigenvectors) satisfying
In this case the decomposition of L becomes $L={\sum}_{i}{\lambda}_{i}{v}_{i}{w}_{i}^{\u22ba}$, and the definition of ${R}_{k}$ is modified as follows:
4.1 Hodgkin–Huxley Potassium Channel
The potassium channel state graph in the Hodgkin–Huxley model is a 5state chain with one conducting state. Following the tauleaping construction (Appendix B) we consider a stationary OU process $X(t)\in {\mathbb{R}}^{5}$, with linear measurement functional $M={[0,0,0,0,1]}^{\u22ba}$. See Fig. 9 for an illustration of this channel. The corresponding (weighted) adjacency matrix A is
which is evidently not symmetric. The voltagedependent transition rates are given by
Then the graph Laplacian $L={(AD)}^{\u22ba}$ is voltagedependent and is given by
since the entries in the diagonal matrix D are the weighted outdegrees of each node for a given voltage V, i.e. ${D}_{ii}(V)={\sum}_{j=1}^{5}{A}_{ij}(V)$. The matrix B is also voltagedependent. Recall that the k th column of B corresponds to the k th reaction, and this can be written as ${\sigma}_{k}(V){\zeta}_{k}$. If ${r}_{k}$ is the per capita rate of reaction k (transition from node $i(k)$ to $j(k)$), then ${\sigma}_{k}(V)=\sqrt{{r}_{k}(V){\overline{N}}_{i}(V)}$ where ${\overline{N}}_{i}(V)$ is the average number of channels at state i at equilibrium for voltage V. Hence, B is given by
Figure 10 shows the edge importance ${R}_{k}$ as a function of voltage for each reaction $k\in \{1,\dots ,8\}$ in the potassium channel state graph. Note that since the process is at steady state, and respects detailed balance, the mean flux due to the two reactions connecting the same pair of nodes will be equal and opposite. Thus, in this case, ${R}_{1}={R}_{2}$, ${R}_{3}={R}_{4}$, ${R}_{5}={R}_{6}$, and ${R}_{7}={R}_{8}$. The blue curve (${R}_{7}={R}_{8}$) corresponds to edges 7 and 8, the transitions between state 4 and conducting state 5, and has the largest edge importance value in the voltage range $[100,100]\text{mV}$. This says that if either or both of these reactions are neglected, they would have the highest contribution to the error.
Physically, it is the current rather than the state occupancy that holds the greatest interest. The current through a population of potassium channels with net conductance g is $I=g(V{V}_{k})$; here ${V}_{k}=77\text{mV}$ is the potassium reversal potential, and the conductance $g={g}^{o}{N}_{o}$ is the product of the unitary or single channel conductance ${g}^{o}$ with the total number of channels in the open state, ${N}_{o}$. The variance of the current is therefore ${({g}^{o}(V{V}_{k}))}^{2}$ times the variance of the occupancy number, meaning that near the reversal potential, the current can have low variance even if the channel state has high variance. For convenience we set ${g}^{o}=1$, which amounts to a change of nominal units for measuring the conductance.
Figure 11 shows the variance of the nominal current, ${R}_{k}\ast {(V{V}_{k})}^{2}$ as a function of voltage V for each reaction k for the potassium channel. In addition to having the highest edge importance curve, the blue curve ${R}_{7}={R}_{8}$ also has the highest variance (left panel). The right panel shows the probability of being in each state as a function of voltage.
4.2 Hodgkin–Huxley Sodium Channel
The sodium channel state graph in the Hodgkin–Huxley model consists of two linked 4state chains, for a total of eight states, including one conducting state, and 20 reactions. Again following the tauleaping construction (Appendix B) we consider a stationary OU process $X(t)\in {\mathbb{R}}^{8}$, with linear measurement functional $M={[0,0,0,0,0,0,0,1]}^{\u22ba}$. See Fig. 12 for an illustration.
The adjacency matrix in this case is
where the voltagedependent entries are defined by
The graph Laplacian $L={(AD)}^{\u22ba}$ is
where ${D}_{ii}(V)={\sum}_{j=1}^{8}{A}_{ij}(V)$ from the adjacency matrix above (Eq. 50). The matrix B is also voltagedependent and is given by the general expression in Eq. 49.
Figure 13 shows the edge importance ${R}_{k}$ as a function of voltage for each reaction $k\in \{1,\dots ,20\}$ for the sodium channel state graph. The sodium channel also satisfies detailed balance, so each pair of complementary reactions ${k}_{i}$, ${k}_{i+1}$ connecting the same pair of nodes will have equal edge importance values ${R}_{{k}_{i}}={R}_{{k}_{i+1}}$. The magenta curve corresponds to edges 11 and 12 and the yellow curve corresponds to edges 19 and 20, which are the transitions between state 7 and conducting state 8, and the transitions between state 4 and conducting state 8, respectively. Note that ${R}_{11}={R}_{12}>{R}_{k}$ (magenta) for all other reactions k in the voltage range $[100,25]\text{mV}$ and then it switches so that ${R}_{19}={R}_{20}>{R}_{k}$ (yellow) for all other reactions k in the range $[25,100]\text{mV}$. This means that if any of these four reactions are neglected, they would have the highest contribution to the error.
Figure 14 shows the variance of the nominal current ${R}_{k}\ast {(V{V}_{k})}^{2}$ as a function of voltage V for each reaction k where ${V}_{k}=45\text{mV}$ is the reversal potential for the sodium channel. Again, we choose units for conductance such that the unitary channel conductance equals 1. As before, we see that the edges with the highest edge importance have the largest variance (left panel). The switch between the dominant curves (magenta vs. yellow) agrees with the switch in Fig. 13 which occurs at −25 mV. The right panel in Fig. 14 shows the probability of being in each state and how that changes with voltage.
In summary, our analysis fully supports the accuracy of Schmandt and Galán’s stochastic shielding algorithm for the Hodgkin–Huxley system, at least for the voltage clamped case that we consider. More significantly, our analysis allows one to calculate the relative importance of each transition in a network of firstorder reactions, allowing a new quantitative basis for reduction of complexity of stochastic network models. In the case of a simple chain of states such as the Hodgkin–Huxley potassium channel, the rank ordering of transitions by importance ${R}_{k}$ is the same for all voltages. As shown in Fig. 13, however, for more complicated gating schemes, such as the Hodgkin–Huxley sodium channel, the rank ordering of transitions by importance can differ at different voltages.
For instance, the most important transition at subthreshold voltages ($V\lesssim 40\text{mV}$) is the transition connecting the $[m=(1,1,0),h=1]$ state (state 7 in Fig. 12) to the $[m=(1,1,1),h=1]$ state (state 8, the conducting state). This transition corresponds biophysically to the nonconductingtoconducting transition that occurs via activation or deactivation [22], that is, the opening (or closing) of the last of three mactivation gates in the ion channel. It is significant that this transition is the most “important” for subthreshold voltages, because the activation transition is typically the last subthreshold event during spike generation.
On the other hand, at suprathreshold voltages the most important transition is that connecting the $[m=(1,1,1),h=1]$ state (state 8) with the $[m=(1,1,1),h=0]$ state (state 4). Biophysically, this transition corresponds to inactivation and deinactivation, or the closing (and opening) of the hinactivation gate. During action potential generation this transition plays an essential role in terminating the voltage spike upstroke, and it is significant that it should be most “important” at suprathreshold voltages.
For more general channel schemes, and more elaborate stochastic processes in general, the identification of the relative quantitative importance of different transitions or edges to the observable behavior of the system is a powerful new tool for principled complexity reduction.
5 Discussion
In the ongoing race between growth of empirical data sets and growth of available computing power, conceptual understanding of complex dynamical systems can get left behind. Finding efficient lowerdimensional representations of highdimensional systems, that accurately capture relevant aspects of system behavior, not only takes better advantage of computational resources, but can provide insights into the essential components of a system. Hence, there has been a significant effort in recent years to develop principled complexity reduction techniques for naturally occurring complex networks.
Schmandt and Galán [14] developed a method for efficient simulation of stochastic ionchannel gating in the membrane of a neuron. The random gating of ion channels provides an important class of biological processes which are naturally represented as Markov chains on graphs [33, 35]. The graphs in this case arise from the different configurations of ionchannel subunits or “gates”. Typically each state carries one of two functional labels: open or closed. This coarsegrained representation of the ion channel corresponds to a linear measurement functional, in the sense that current flowing through open channels can be measured experimentally, and individual ion channels typically exhibit binary allornone conductance. Schmandt and Galán implemented a novel form of coarse graining technique that ignores fluctuations between indistinguishable transitions (opentoopen or closedtoclosed) while preserving fluctuations between distinguishable states. In order to gain a deeper understanding of why their “stochastic shielding approximation” works so well, we analyzed it in the context of a multidimensional Ornstein–Uhlenbeck process on a variety of networks. First, we showed that this form of model reduction can be represented as a mapping from a manydimensional sample space to a lowerdimensional sample space, rather than as a mapping from a manynode network to a fewnode network, and that one can formulate the problem as a search for the optimal such mapping. Second, we showed that for the specific 3state example presented in Schmandt and Galán’s paper, their approximation is indeed optimal in a specific sense. Third, we obtained a theoretical result showing that stochastic shielding works for an ensemble of random graphs with arbitrarily chosen binary measurement vectors, analogous to the identification of nodes as conducting versus nonconducting in ionchannel models. Finally, we evaluated the stochastic shielding approach for the graph representing the ionchannel states of the classical Hodgkin–Huxley model, and showed that this approach is optimal for a wide range of fixed voltages under “voltage clamped” conditions.
5.1 Relationship Between Different Levels of Modeling
The underlying description of Schmandt and Galán’s model [14] is given by the population process described in Sect. 2.1, a more general framework than the Ornstein–Uhlenbeck process that we study. The OU process connects to the population process via a tauleaping approximation, as described in Appendix B. The tauleaping method involves two key assumptions. First, assuming that the transition propensities ${\alpha}_{ij(k)}$ do not change dramatically in an interval of length τ, we can approximate the number of transitions in each interval by a collection of independent Poisson processes. This approach is closely related to the framework of Schmandt and Galán, except that they use a binomial distribution instead of a multinomial distribution (see Appendix A). Second, if the expected number of occurrences of each reaction is sufficiently large (i.e. 10 s or 100 s) in time τ, then it is reasonable to use a Gaussian approximation to the Poisson process. The resulting model comprises the standard chemical Langevin formulation, in which the size of the fluctuations associated with each transition is state dependent. These two constraints can always be satisfied by taking a sufficiently large number of individuals in the population. The Ornstein–Uhlenbeck process is obtained by linearizing about the mean field steady state distribution of the tauleaping model (see Appendix B). The intensity of the noise terms is determined by the mean steady state occupancy of each state, resulting in a linear OU process. A technical obstacle to extending our results beyond the linear OUP setting is the lack of an explicit closed form expression for the stationary covariance of the population process analogous to Eq. 6. Although our analysis is limited to the OU process version of the system, it is reasonable to expect that stochastic shielding will apply more broadly. For example, in the full population process one can decompose the fluxes in the model into a sum of a mean component and a mean zero fluctuating component. In this case, stochastic shielding amounts to setting the fluctuating component to zero while preserving the mean for those transitions connecting observationally equivalent states.
Limiting the investigation to voltage clamped conditions facilitated a more thorough mathematical analysis of the stochastic shielding approximation, but also restricted the biological applicability of the results. By approximating the population process with a closely related Ornstein–Uhlenbeck process we effectively linearized the system about a fixed point given by the mean field behavior. Therefore our analysis does not address important nonlinear dynamical behaviors arising in many physical and biological systems, such as noise driven transport between multiple quasiequilibria, fluctuation induced spiking in excitable systems (including noise induced spiking in nerve cells), or limit cycle oscillations (including regular spiking in nerve cells). On the one hand, we anticipate that transitions in a state graph corresponding to directly observable state changes, such as between conducting and nonconducting ionchannel states, will remain “important” under more general measures accounting for global, nonlinear behaviors. On the other hand, it is certainly possible that additional transitions may also become important with respect to more general measures, if the linear measurement vectors employed here fail to capture their contribution to global dynamics.
5.2 Broader Applications
The stochastic shielding approximation can be directly applied to various biological networks, not just ionchannel models. For instance, Lu et al. [13] describe a signal transduction network in which the phosphorylation and transport events are arranged with a ladder topology. The two sides of the ladder denote molecules in the nucleus and in the cytoplasm, respectively. On each side, there are $M+1$ species having different levels of phosphorylation (see Fig. 1 of [13] for an illustration). This is a more elaborate Markov process than a simple ionchannel state model, but it can still be described with a binary measurement vector. The readout is 1 if the system is both in the nucleus and in a specific phosphorylated state, and 0 otherwise. The application of stochastic shielding to such a system is quite natural.
Another broad class of examples includes calciuminduced calcium release Markov models. Nguyen, Mathias and Smith [36] studied a stochastic automata network description of instantaneously coupled intracellular calcium channels which they derived from Markov models of single channel gating that include calcium activation, inactivation, or both. This highdimensional system involves a large number of functional transitions; the transition probabilities of one channel depend on the local calcium concentration which is typically influenced in turn by the state of other channels in the population. Such models can easily become very high dimensional. For example, DeRemigio et al. [37] considered a discrete state continuous time Markov model of coupled calcium channels, taking explicit channel position in to account, which yields up to 1.6 million distinct states. Similarly, in order to investigate the relationship between singlemolecule stochastic events and wholecell behavior, Skupin et al. [38] implemented a multi scale calcium signaling and spike generation model. Their model connects channel state transitions on a millisecond time scale with interspike interval fluctuations on the scale of tens of seconds, and involves a large number of chemical states. For systems of such complexity, any reduction of the complexity of the stochastic process by stochastic shielding will likely be advantageous, both for simulation and for analysis.
We have focused here on discrete state ionchannel models with binary measurement vectors. However, it is possible that some ion channels may have a richer than binary readout structure. For example, Catterall [39] provides structural evidence that activation of a bacterial sodium channel may possess multiple nonequivalent conducting states, raising the possibility that conductance could be graded rather than binary. As another example which could lead to graded measurement vectors, adaptive evolution can be represented as a random walk on a graph representing genomic variants connected by possible mutation routes [40, 41]. While the stochastic process representing the evolution of a human pathogen such as influenza may have an enormous number of degrees of freedom [42, 43], the dynamics of interest may comprise a smaller number of dimensions, such as a strain’s virulence or fitness, which may naturally be graded rather than discrete quantities.
Stochastic shielding in a modified form would still apply even if the measurement functional were graded continuously. As an example, consider an Erdös–Rényi random graph on n nodes with edge probability p, with graded measurement vector $M\in {[0,1]}^{n}$ instead of binary $M\in {\{0,1\}}^{n}$. The left panel of Fig. 15 shows the edge importance distribution for the case $n=50$ and $p=0.5$ where the components of M are chosen uniformly at random from the unit interval. The right panel of Fig. 15 illustrates the difference in measurement between nodes connected by edge k, $x={M}^{\u22ba}{\zeta}_{k}$, versus the edge importance ${R}_{k}$, and shows good agreement with the curve $y\approx {x}^{2}/n$ for the case $n=50$.
This empirical result (Fig. 15, right panel) suggests the following generalization of Theorem 2:
for some $q>1$ (e.g., $q=2$ for the Gaussian unitary ensemble, and $q\approx 5/3$ for the Erdös–Rényi ensemble, empirically). In the case of a binary measurement vector, $M\in \{0,1\}$, this formula would revert to the result given in Theorem 2. A rigorous derivation of Eq. 53 is beyond the scope of the present paper.
The behavior of stochastic processes arising in firstorder reaction networks has been explored in broad generality by Cadgil, Lee and Othmer [44]. They used a spectral approach to analyze a general system of firstorder reaction networks, and studied the effect of changes in the network topology on the distribution of the number of reactant molecules, as well as the difference between conversion and catalytic networks with the same topology. Exploring sample space reductions conditioned on a linear measurement functional for such general classes of networks would be of interest.
5.3 Different Levels of Model Simplification
Model simplification is an important goal for Markov chain models in many scientific contexts, and complexity reduction has been pursued through a corresponding variety of approaches. Newman and others have extensively developed techniques based on community structure, aggregating or lumping nodes together based on topological considerations [45, 46]. When applied to a stochastic process on a graph, the aggregation of $N\gg n$ to n nodes is equivalent to a projection of the original process onto a subspace in which the process components on the aggregated finegrained nodes are averaged. In most cases, the resulting coarsened process is no longer Markov, although in some cases exact dimension reduction to a lowerdimensional Markov processes can be accomplished [47–49]. Other aggregation schemes, such as spectral coarse graining [50–52], have been proposed based on the spectral properties of the graph Laplacian. Approaches based on topological or abstract spectral properties do not necessarily take into account functional properties of the system to be simplified. Because stochastic shielding simplifies the representation of a stochastic process taking into account the function of the system, namely by distinguishing conducting versus nonconducting ionchannel states, it may provide insights not afforded by graph aggregation based on modularity or graph spectra.
As another example of simplification based on functional properties, Bruno, Yang and Pearson [53] used independent openclosed transitions to describe a canonical form that can express all possible reaction schemes for binary ion channels.
Not all prior approaches to simplification of random processes on graphs proceed by aggregating nodes. For instance, Ullah, Bruno and Pearson [54] proposed model simplification by the elimination of nodes with low equilibrium occupancy probability using time scale separation arguments. The reduced system has fewer parameters, and the dynamics of the reduced system are identical to those of the original system except on very fast time scales. Other simplifications based on graph sparsification have been proposed by Koutis, Levin and Peng [55].
In this paper we have investigated a novel form of simplification of stochastic processes on graphs. Stochastic shielding is based on replacing a highdimensional stochastic process defined on a graph with a lowerdimensional process on the same graph, rather than replacing a complex network with a simpler one. Specifically, we consider mappings from the original process to an approximate process defined on a significantly smaller sample space. In one sense, we can think of the full and a reduced system as two systems with partially shared stochastic input, and partially independent stochastic input of different magnitudes (magnitude zero, in one case). Structurally, this situation is analogous to the kind of mixed commonnoise and independentnoise scenario studied in the context of neuronal synchronization [56–58]. In another sense, stochastic shielding can be seen as a different kind of projection, vs. that induced by lumping or pruning nodes. The latter methods simplify the graph, whereas stochastic shielding leaves the graph unchanged and simplifies the sample space on which the approximate process lives.
Appendix A: Stochastic Shielding Construction of Schmandt and Galán
In [14], Schmandt and Galán considered discrete time simulations approximating a continuous time, finite state Markov chain
where ${N}_{i}(t)$ is the number of individuals in a population (of size ${N}_{\mathrm{tot}}$) in state i at time t, and ${\tilde{N}}_{ij}(t)$ counts the number of $i\to j$ transitions that have occurred as of time t. The transition counts ${\tilde{N}}_{ij}(t)$ may be written using the random time change representation [17] as
By convention we take ${N}_{ii}(t)\equiv 0$ and ${\alpha}_{ii}(t)\equiv 0$. The ${Y}_{ij}$ are independent unit rate Poisson processes driving the different statetostate transitions. The transition from state i to state j occurs with per capita rate ${\alpha}_{ij}$. In a conductancebased model, such as a discrete stochastic version of the Hodgkin–Huxley equations, the vector $({N}_{1}(t),\dots ,{N}_{K}(t))$ would represent the number of ion channels in each of K distinct states, and the transition rates could vary with time, e.g. through dependence on membrane potential or second messenger concentration. Although Schmandt and Galán consider both the stationary and timevarying case, we restrict attention to the stationary case, which corresponds experimentally to a voltage clamped preparation.
One may (approximately) simulate trajectories of the Markov chain using a discrete time step approach. Following [14], we fix a time step $h>0$ and define ${N}_{ij}$ as ${N}_{ij}(t)={\tilde{N}}_{ij}(t+h){\tilde{N}}_{ij}(t)$, that is, the number of $i\to j$ transitions occurring in the interval $(t,t+h]$. The net increments in the stateoccupancy numbers ${N}_{i}$ are then given by
To obtain a practical algorithm, Schmandt and Galán set ${N}_{ij}(t)\sim Binom[{N}_{i}(t),{\alpha}_{ij}(t)h]$. Since there is then a finite probability that ${N}_{i}(t+h)<0$, one must include an iterative resampling scheme to force ${N}_{i}(t+h)\ge 0$. As an alternative, we consider instead a multinomial representation of the destinations of all ${N}_{i}(t)$ individuals beginning the time step at node i. That is, for each i, $1\le i\le K$, we set
The multinomial distribution produces an integervalued random vector with mean and marginal distributions the same as that given by the binomial distribution; the only difference is that transitions emanating from a common node are not assumed to be independent.
The first and second moments arising from the multinomial transition distribution are
Here all expectations are conditioned on the current state of the system,
The mean increment given the current distribution of the population, ${\overline{\mathrm{\Delta}}}_{i}(t)\equiv E[{\mathrm{\Delta}}_{i}(t)\overrightarrow{N}(t)]$, is written in terms of the mean transitions as
The deviation of the actual number of $i\to j$ transitions from the expected number is
where $\delta {N}_{ij}(t)={N}_{ij}(t)E[{N}_{ij}(t)\overrightarrow{N}(t)]$ is the deviation of the number of $i\to j$ transitions from the expected number. The mean of $\delta {N}_{ij}(t)$ is zero for all i, j, and all t, by construction. The stochastic shielding approximation amounts to setting $\delta {N}_{ij}(t)$ to zero for selected $i\to j$ transitions, namely for those transitions between “unobservable states”, or (equivalently) between any two states with the same value of the measurement observable, i.e. the conductance. Since $E[\delta {N}_{ij}(t)\overrightarrow{N}(t)]=0$ already, the only error introduced by suppressing the fluctuations associated with the $i\to j$ transition comes from the propagation of the fluctuations through the network to the observable states. But the fluctuations in the transitions, ${N}_{ij}$, are only weakly correlated with the fluctuations in the occupancy numbers of observable states, ${N}_{k}(t)$, when i and j have the same conductance. To introduce this shielding effect, Schmandt and Galán calculate the second moments for the population increments $\delta {\mathrm{\Delta}}_{i}(t)$. As an example, in the three node case, for the multinomial transition model, the variances are given by
and the covariances are given by
Schmandt and Galán obtain similar expressions that agree up to order $O(h)$; the difference between the binomial and multinomial expressions only appears in the $O({h}^{2})$ terms. For example, they assert that $E[\delta {\mathrm{\Delta}}_{1}(t)\delta {\mathrm{\Delta}}_{3}(t)\overrightarrow{N}(t)]\equiv 0$, while under the multinomial model this covariance is equal to ${N}_{2}(t){\alpha}_{21}{\alpha}_{23}{h}^{2}$. Fortunately, this difference does not undermine the main argument.
From this point, Schmandt and Galán obtain an expression for the stationary covariance matrix of the reduced process (compare Eq. (8) in [14] with our Lemma 1) and decompose the covariance into a sum over direct and indirect connections to a single conducting or observable state. This situation corresponds, in our analysis, to the case where the measurement vector M contains a single nonzero entry. Schmandt and Galán argue that suppressing the fluctuations associated with transitions not directly affecting the observable state decrease their contribution to the variance of the observable state occupancy, while increasing the contribution of the direct transitions to the same variance. In addition, they show through numerical comparisons that Hodgkin–Huxley equations with a full Markov process and the reduced process are practically indistinguishable both under voltage clamp (stationary transition rates) and current clamp (timevarying transition rates) conditions.
Appendix B: Derivation of TauLeaping for an Arbitrary Finite Graph
B.1 TauLeaping: General Case
We will use standard tauleaping arguments [59–61] to derive the multidimensional Ornstein–Uhlenbeck process in Sect. 2.2 (Eq. 4). Given a symmetric directed graph $\mathcal{G}=(\mathcal{V},\mathcal{E})$ with n nodes, let $N(t)\in {\mathbb{N}}^{n}$ be the population process (Markov jump process) representing the number of individuals in each of n states at time t. Let ${N}_{\mathrm{tot}}\ge 1$ be the total number of individuals in the system. Recall the random time change representation in terms of Poisson processes given in Eq. 2:
Each ${Y}_{k}$ is an independent unit rate Poisson process counting the occurrence of reaction k (transition from state $i(k)$ to $j(k)$); ${\alpha}_{k}$ is the per capita transition rate of reaction k; ${N}_{i(k)}(s)$ is the number of individuals at state $i(k)$ at time s, and ${\zeta}_{k}$ is the stoichiometry vector for reaction k. For simplicity, we will suppress “k” in our notation so that state i means state $i(k)$.
In the case ${N}_{\mathrm{tot}}=1$, let ${p}_{i}(t)=P(X(t)=i)$ be the probability that a single random walker occupies state i at time t. Clearly, ${\sum}_{i=1}^{n}{p}_{i}(t)=1$ for each t. The time evolution of the probability vector $p(t)=[{p}_{1}(t),\dots ,{p}_{n}(t)]$ is given by the following master equation
where
is the graph Laplacian which can be represented as the sum over all undirected edges (denoted by the set ${\mathcal{E}}^{\ast}$) given in Eq. 75.
Let π represent the steady state distribution, i.e. the row vector satisfying $\pi L=0$ with entries such that ${\sum}_{i=1}^{n}{\pi}_{i}=1$. Suppose we represent $N(t)$ as the deviation from its mean, $\overline{N}=\pi {N}_{\mathrm{tot}}$, so that $N(t)=\overline{N}+X(t)$, where $X(t)$ is a mean zero stochastic process. Then
since ${N}_{i}(s)=\overline{{N}_{i}}(s)+{X}_{i}(s)$ and ${\alpha}_{k}$ and $\overline{{N}_{i}}$ are constants.
Now following standard tauleaping results [59–61],
which says that we can approximate Eq. 80 using an almost equivalent set of Poisson processes ${\tilde{Y}}_{k}$ where each ${\tilde{Y}}_{k}$ at time t is approximately Gaussian distributed with mean and variance $\tau {\alpha}_{k}[\overline{{N}_{i}}+{X}_{i}(t)]$. Note that if X is a stationary irreducible Markov process on a finite state space, then the occupancy probability of state i, ${\pi}_{i}>0$. By choosing ${N}_{\mathrm{tot}}\gg 1/(min\{{\pi}_{i}\})$, we may guarantee that the mean population ${\overline{X}}_{i}$ for each i is as large as necessary for the Gaussian approximation to hold. Since we are assuming that ${X}_{i}(t)\ll {\overline{N}}_{i}$ (uniformly in time), and since we want the noise amplitude to be independent of X, we further approximate
by dropping the dependence of the variance on X.
Dividing by τ and taking the limit as $\tau \to \mathrm{\infty}$ yields the SDE
Recalling that the k th reaction is from node $i(k)$ to $j(k)$, then the k th reaction in the first term in the RHS of Eq. 83 can be written as
Keeping track of components, we sum over the source and destination nodes for each reaction. Then for the l th component of X we have
which yields
where Q is the generator matrix. Note that we changed notation slightly to illustrate that ${\alpha}_{ij}$ is the transition rate from state i to j rather than indexing by reaction k. The graph Laplacian we consider in Eq. 4 is actually $L={Q}^{\u22ba}$ so we have $dX=L(\overline{N}+X)dt$. Since $\overline{N}$ is proportional to the stationary distribution π, we have that $L\overline{N}=0$, and hence the first term in the SDE is $dX=LXdt$.
Now the second term in the RHS of Eq. 83 can be written as
Keeping track of components, here we sum over all m reactions to find
where ${\sigma}_{k}=\sqrt{{\overline{N}}_{i(k)}{\alpha}_{k}}$ in the definition of matrix B.
Therefore, putting the first and second terms together, we have derived the OU process $dX=LXdt+BdW$ given in Eq. 4.
B.2 TauLeaping: 3State Example
Here we will explicitly derive the OU process from the population process given in Sect. 2.1 by using the tauleaping argument above for the 3state example in Sect. 2.3. We have $N(t)\in {\mathbb{N}}^{3}$ and by Eq. 73,
following the notation given in Sect. 2.3, specifically the labeling of reactions given in Table 1. Note that ${\alpha}_{k}$ could be time dependent ${\alpha}_{k}(t)$.
The tauleaping approximation above gives
Equivalently, we could write these integral equations in differential form
These equations are nonlinear since the noise intensity depends on ${X}_{i}$. Note that for any t, ${X}_{1}(t)+{X}_{2}(t)+{X}_{3}(t)={N}_{\mathrm{tot}}$ so that the total population is constant. The mean $\overline{X}$ satisfies
where the matrix above is the generator Q, or our ${L}^{\u22ba}$. In the case where Q is fixed, $\overline{X}$ is proportional to the null left eigenvector of Q; biologically, this is the voltage clamp case. Let $({\overline{X}}_{1},\dots ,{\overline{X}}_{n})$ be the corresponding stationary vector. Now we linearize Eqs. 96–98 around the stationary vector.
Let $V=X\overline{X}$ and assume that $\frac{V}{\overline{X}}\ll 1$. Then since $\sqrt{{X}_{i}{\alpha}_{k}}=\sqrt{({\overline{X}}_{i}+{V}_{i}){\alpha}_{k}}=\sqrt{{\overline{X}}_{i}{\alpha}_{k}}+O(\frac{{V}_{i}}{{\overline{X}}_{i}})$, we have
Neglecting the $O(\frac{V}{{N}_{\mathrm{tot}}})$ terms gives us the multidimensional Ornstein–Uhlenbeck process of Eq. 4 for the 3state example.
Appendix C: Proofs and Calculations
C.1 Stationary Covariance of a Multidimensional OU Process
The SDE for $X(t)$ in Eq. 4 has the explicit solution (see [18], Chap. 4.5)
Assuming the initial condition is either deterministic or Gaussian, then $X(t)$ is Gaussian with mean
and correlation function
where $t\wedge s$ means the minimum of t and s. Setting $s=t$ and taking the limit as $t\to \mathrm{\infty}$, we obtain the stationary covariance function
We exploit the fact that not only does B decompose into the sum $B={\sum}_{k=1}^{m}{B}_{k}$, but in the case of a firstorder reaction process, $B{B}^{\u22ba}$ also decomposes into the following sum:
and further, ${B}_{k}{B}_{k}^{\u22ba}={\sigma}_{k}^{2}{\zeta}_{k}{\zeta}_{k}^{\u22ba}$ for each edge (reaction) $k\in \mathcal{E}$. Therefore, the stationary covariance of the full process decomposes into a sum of the contributions from the m different reaction processes:
We note that the (left) eigenvector corresponding to the leading (0) eigenvalue of L has constant components, therefore it lies in the kernel of the matrix ${B}_{k}{B}_{k}^{\u22ba}$ for each k, which guarantees finite covariance in Eq. 108.
C.2 Computation of Edge Importance ${R}_{k}$ and Proof of Lemma 1
Using the spectral properties of the graph Laplacian L, we can rewrite the stationary covariance of $X(t)$ (Eq. 106) by replacing each expression involving a matrix exponential by the sum over the orthogonal eigendecomposition of L. Let ${v}_{i}$ be the i th eigenvector of L (written as a column vector), with eigenvalue ${\lambda}_{i}$, i.e. $L{v}_{i}={\lambda}_{i}{v}_{i}$. Summing over each eigenvalue, we can write $L={\sum}_{i=1}^{n}{\lambda}_{i}{v}_{i}{v}_{i}^{\u22ba}$. Note that this decomposition is only valid when L is symmetric; the nonsymmetric case is discussed in Sect. 4. Then we have the following expression from Eq. 106:
Using the decomposition of matrix B (Eqs. 5 and 107), it follows that
The covariance of the full process X is therefore given by
By construction of the graph Laplacian, its leading eigenvalue ${\lambda}_{1}\equiv 0$. The corresponding (right) eigenvector has constant components, ${v}_{1}={(1,\dots ,1)}^{\u22ba}/\sqrt{n}$. Therefore, for each stoichiometry vector we have ${\zeta}_{k}^{\u22ba}{v}_{1}\equiv 0$. Consequently the terms in the inner summation (114) with index $i=1$ or $j=1$ vanish, and may be omitted without changing the result. Taking the limit as $t\to \mathrm{\infty}$ of the covariance function gives us the stationary covariance
Recall that we are interested in the linear measurement functional $M\in {\mathbb{R}}^{n}$ projected onto $X(t)$, i.e. the projection $Y(t)={M}^{\u22ba}X(t)$. For edges $k\in {\mathcal{E}}^{\prime}$ neglected in the approximation $\tilde{Y}={M}^{\u22ba}\tilde{X}(t)$, we take the limit as $t\to \mathrm{\infty}$ of the mean squared error of $\tilde{Y}(t)Y(t)={M}^{\u22ba}U(t)$ to get
C.3 Proof of Lemma 3
Suppose that assumptions A0–A5 given in Sect. 3.1 hold. We assume that $M\in {\{0,1\}}^{n}$ is an arbitrary measurement vector consisting of ${n}_{1}>0$ ones and ${n}_{0}>0$ zeros such that ${n}_{1}+{n}_{0}=n$, and ${n}_{1}=O(1)$ as $n\to \mathrm{\infty}$. That is, we exclude the case where ${n}_{1}$ grows without bound as n grows. If we look at the corresponding measurement value of the ${l}_{}$th and ${l}_{+}$th components of ${\zeta}_{k}$ (see Eq. 17), we have three possible cases:

1.
$${l}_{\pm}\in {1}_{M}$$
, i.e. $M({l}_{})=M({l}_{+})=1$;

2.
$${l}_{\pm}\notin {1}_{M}$$
, i.e. $M({l}_{})=M({l}_{+})=0$;

3.
$${l}_{}\in {1}_{M}$$
and ${l}_{+}\notin {1}_{M}$, i.e. $M({l}_{})=1$ and $M({l}_{+})=0$ (respectively, $M({l}_{})=0$ and $M({l}_{+}$