Measuring Edge Importance: A Quantitative Analysis of the Stochastic Shielding Approximation for Random Processes on Graphs
- Deena R Schmidt^{1, 2}Email author and
- Peter J Thomas^{1, 2, 3}
https://doi.org/10.1186/2190-8567-4-6
© D.R. Schmidt, P.J. Thomas; licensee Springer 2014
Received: 23 August 2013
Accepted: 24 January 2014
Published: 17 April 2014
Abstract
Mathematical models of cellular physiological mechanisms often involve random walks on graphs representing transitions within networks of functional states. Schmandt and Galán recently introduced a novel stochastic shielding approximation as a fast, accurate method for generating approximate sample paths from a finite state Markov process in which only a subset of states are observable. For example, in ion-channel models, such as the Hodgkin–Huxley or other conductance-based neural models, a nerve cell has a population of ion channels whose states comprise the nodes of a graph, only some of which allow a transmembrane current to pass. The stochastic shielding approximation consists of neglecting fluctuations in the dynamics associated with edges in the graph not directly affecting the observable states. We consider the problem of finding the optimal complexity reducing mapping from a stochastic process on a graph to an approximate process on a smaller sample space, as determined by the choice of a particular linear measurement functional on the graph. The partitioning of ion-channel states into conducting versus nonconducting states provides a case in point. In addition to establishing that Schmandt and Galán’s approximation is in fact optimal in a specific sense, we use recent results from random matrix theory to provide heuristic error estimates for the accuracy of the stochastic shielding approximation for an ensemble of random graphs. Moreover, we provide a novel quantitative measure of the contribution of individual transitions within the reaction graph to the accuracy of the approximate process.
Keywords
1 Introduction
Many biological systems exhibit a combination of stochastic (chance, random, noisy) and deterministic dynamics [1–3]. For example, mathematical models involving stochastic processes arise in physiology [4–7], ecology [8–10], and genetic regulatory systems [11–13]. Such mathematical models often originate as intrinsically complex, high-dimensional systems with many degrees of freedom, and many sources of variability. This inherent complexity presents two related challenges. First, the essential dynamics of such systems may be hard to discern, and model reduction based on first principles for stochastic systems on complex networks is difficult. Second, in order to predict the behavior of such systems under normal, pathological or experimental conditions, one must usually resort to numerical simulation studies. Even with the tremendous progress in computing power over the last decades, intrinsically high-dimensional stochastic systems remain prohibitive to simulate exhaustively. Moreover, because of their dimensionality, the results of ensembles of stochastic simulations can be challenging to interpret. Therefore, there is demand for efficient dimension reduction methods, both to provide high quality approximate numerical solutions to the stochastic evolution equations arising in high-dimensional systems, and to provide an efficient conceptual framework for interpretation of the behavior of such systems.
In [14], Schmandt and Galán introduced a stochastic shielding approximation as a fast, accurate method for generating sample paths from a finite state Markov process in which only a subset of states are observable. For example, in ion-channel models, such as the Hodgkin–Huxley or other conductance-based neural models, a nerve cell has a population of ion channels whose configurational states comprise the nodes of a graph, only some of which allow a transmembrane current to pass. That is, each vertex of the ion-channel state graph is labeled with a scalar “conductance”, which is either zero (nonconducting) or one (conducting). In a population of ion channels, the flux of individual channels making the transition from a state i to a state j is a stochastic process with mean rate, and it has fluctuations around the mean rate that depend on the population at state i. The stochastic shielding approximation consists of neglecting fluctuations associated with edges in the graph not directly affecting the observable states. Specifically, the random fluxes along edges connecting identically labeled states are replaced by the mean fluxes along those edges, while the random fluxes associated with edges connecting distinguishable states are left unchanged. This approximation is an example of complexity reduction, in the sense of reducing a stochastic process generated by K independent processes to a process on a smaller sample space, i.e. generated by ${K}^{\prime}<K$ processes. Schmandt and Galán observe that, remarkably, the variance of the observable state (the membrane conductance) is almost identical in the reduced and the unreduced system.^{1} While the approximate process does not faithfully reproduce all aspects of the full process, it reproduces those features relevant to the neurophysiologist as well as to the larger biological system in which it is embedded.
Here we consider the problem of finding the optimal complexity reducing mapping from a stochastic process on a graph to an approximate process on a smaller sample space, as determined by the choice of a particular linear measurement functional on the graph. The partitioning of ion-channel states into conducting versus nonconducting states provides a case in point. In this paper we establish that Schmandt and Galán’s approximation is in fact optimal in a specific sense. We derive a quantitative measure of the contributions of individual edges in the graph to the accuracy of the approximation, relative to the chosen measurement functional. This approach allows quantitative comparison of edge importance, and sheds light on the parametric dependence of relative edge importance, for instance in a voltage-gated ion channel. In addition, we provide heuristic error estimates for the accuracy of the stochastic shielding approximation for an ensemble of symmetric random graphs.
Motivated by [14], we consider a multidimensional Ornstein–Uhlenbeck process on a graph $\mathcal{G}=(\mathcal{V},\mathcal{E})$ with n nodes and m edges (reactions), and a linear measurement functional $M\in {\mathbb{R}}^{n}$. We show that the stochastic shielding approximation is the most accurate dimension reduction possible among those neglecting fluctuations in the same number of underlying processes. Neglecting a set of reactions in the full stochastic process X creates an approximate process $\tilde{X}$ which matches the behavior of the full process in the mean but deviates from the full process in the fluctuations.
Extending this idea for an ensemble of symmetric directed graphs $\mathcal{G}=(\mathcal{V},\mathcal{E})$, we establish two main results. Lemma 1, our first main result, allows us to find the optimal complexity reducing mapping from a stochastic process on a graph to an approximate process on a smaller sample space, as determined by the measurement M. Neglecting the fluctuations associated with a subset ${\mathcal{E}}^{\prime}$ of the edge set ℰ defines a new process $\tilde{X}(t)$ that deviates from the full process $X(t)$ by an amount that we call the deficiency, $U(t)=\tilde{X}(t)-X(t)$. The observed error, given M, is then ${M}^{\u22ba}U$; its mean is zero by construction, and its variance is $R=E[{({M}^{\u22ba}U)}^{2}]$. In Lemma 1 we provide an exact formula for the contribution of the k th edge to this error. This formula, which arises from a spectral decomposition of the graph Laplacian associated with the full process, gives an explicit criterion for choosing the k most important edges in the graph, for any $0<k<m$.
Our second main result, Theorem 2, applies this criterion to networks generated from a broad class of random graph ensembles with a randomly chosen binary measurement vector M. We show that the importance measures of individual edges cluster tightly around one of two values. For moderately large graphs, these clusters correspond with very high accuracy to Schmandt and Galán’s stochastic shielding heuristic; an extremely accurate, reduced complexity approximation is obtained by neglecting fluctuations associated with edges connecting states that are indistinguishable under the measurement M. We illustrate this result with a sample from the Erdös–Rényi random graph ensemble in Sect. 3.3.
The analysis of Schmandt and Galán focused on an accurate, efficient approximation of Markov processes arising from ion-channel models. In Sect. 4 we apply our analysis to processes on two graphs arising from the classical Hodgkin–Huxley system of ion channels: the 5-state model for the voltage-gated potassium channel, and the 8-state model for the voltage-gated sodium channel. In a more general setting, the transition rates connecting adjacent states in these models are voltage-dependent. Here we restrict attention to the stationary case, corresponding biologically to the behavior of the channels under “voltage clamped” conditions. For both the voltage-gated potassium and voltage-gated sodium channel state graphs we show that our ranking reproduces the Schmandt–Galán stochastic shielding heuristic over all physiologically relevant voltages. This example also demonstrates that our results apply to graphs with non-symmetric adjacency matrices, as well as to the symmetric case.
In Sect. 5 we discuss possible extensions of our results to examples including signal transduction networks and calcium-induced calcium release models, as well as systems with graded rather than binary measurement functionals.
2 Model
2.1 Connection to the Population Process
Because each transition preserves the total number of individuals (i.e. the components of ${\zeta}_{k}$ sum to zero for each k), we have ${\sum}_{i}{N}_{i}(t)={N}_{\mathrm{tot}}={\sum}_{i}{N}_{i}(0)$ for all $t>0$.
2.2 Multidimensional Ornstein–Uhlenbeck Process
such that the k th column of matrix ${B}_{k}={\sigma}_{k}{\zeta}_{k}$ and all other columns of ${B}_{k}$ are zero.
The stochastic shielding approximation for a system of the form given in Eq. 4 amounts to preserving the mean, but neglecting the fluctuations, for the processes driving a subset of the reactions, i.e. replacing B with an alternative matrix $\tilde{B}$ obtained by replacing a subset of columns in B with null vectors. The trajectories of the resulting SDE, $\tilde{X}(t)$ (see Eq. 7), are approximations of the trajectories of the full system.
In order to compare different complexity reduction choices, we define the deficiency of an approximation to be the difference between the true and approximate trajectories, $U(t)=\tilde{X}(t)-X(t)$, when projected onto the measurement functional of interest M. As suggested by Schmandt and Galán, the stationary variance of the projection of the deficiency on the measurement vector provides an appropriate measure for comparing the quality of alternative reductions. That is, we use $R=Var[{M}^{\u22ba}U]=Var[{M}^{\u22ba}(\tilde{X}-X)]$ as our error measure. We focus on reductions that preserve the behavior of the system (Eq. 4) relative to a given linear measurement functional $M\in {\mathbb{R}}^{n}$. In the case of ion channels, $M\in {\{0,1\}}^{n}$ represents the conductance of each channel state. We consider the case of graded rather than binary measurements in Sect. 5. Whether binary or graded, the measurement vector identifies the stochastic process of interest as the projection $Y(t)={M}^{\u22ba}X(t)$.
Formally, we consider two processes $X(t)$ (full process) and $\tilde{X}(t)$ (reduced process) defined on a common probability space $(\Omega ,{\mathcal{F}}_{t},P)$. The sample space $\Omega =C{[0,\mathrm{\infty})}^{n}$, filtration ${\mathcal{F}}_{t}$, and Wiener measure P are those associated with m independent copies of the standard Brownian process. The approximate process $\tilde{X}(t)$ has the same sample space Ω and is measurable with respect to the same filtration ${\mathcal{F}}_{t}$, but also with respect to a smaller filtration ${\tilde{\mathcal{F}}}_{t}\subset {\mathcal{F}}_{t}$ generated by the Wiener processes associated with a subset of edges of the graph. The covariance of the deficiency, then, is well defined in terms of the underlying measure P on the full probability space.
Similarly, the variance of the projection $Y(t)={M}^{\u22ba}X(t)$ also decomposes into a sum, because $Var[Y]={M}^{\u22ba}Cov[X]M$.
Because the (left) eigenvector corresponding to the leading (0) eigenvalue of L has constant components, it is orthogonal to ${\zeta}_{k}$ for each k. (If L is symmetric, the right and left eigenvectors are interchangeable.) Therefore the corresponding eigenspace is contained in the kernel of the matrix ${B}_{k}{B}_{k}^{\u22ba}$, for each k, which guarantees that the limit on the RHS of Eq. 6 remains finite.
where $\tilde{B}={\sum}_{k\in \mathcal{E}\mathrm{\setminus}{\mathcal{E}}^{\prime}}{B}_{k}$ sums over the edges we keep. Given the linear measurement functional $M\in {\mathbb{R}}^{n}$ above, we define the approximate projection $\tilde{Y}(t)={M}^{\u22ba}\tilde{X}(t)$. Note that in the case of an ion-channel system, M is binary so Y and $\tilde{Y}$ just pull out the observable (i.e., conducting) states of each system. In Sect. 2.3, for instance, we consider a 3-state chain with one observable state (state 3) as a simple model of an ion-channel system. In that case, $M={[0,0,1]}^{\u22ba}$ and $Y(t)={M}^{\u22ba}X(t)={X}_{3}(t)$.
It is important to note that the noise source dW that appears in Eqs. 4 and 7 refers to the same noise process W in both cases. The deficiency of the approximation relative to the full process is given by taking the limit of the mean squared error (MSE) of $\tilde{Y}-Y$ (equivalent to the stationary variance of $\tilde{Y}-Y$), which, as shown in the proof of Lemma 1, is an expression of the sum over all neglected reactions.
We can rank the error terms ${R}_{k}$ in descending order, thereby ordering the corresponding reactions in terms of their “importance”. The most important reaction is the one with the largest value of ${R}_{k}$; if neglected, it would introduce the largest error. See Appendix C.2 for the proof of Lemma 1. Note that an individual term in the sum (10) will be zero if either ${\zeta}_{k}\perp {v}_{i}$ or if $M\perp {v}_{i}$ for a given eigenvector ${v}_{i}$. Typically, however, these vectors will not be orthogonal. Therefore, it is of interest to know how the values of ${R}_{k}$ are distributed for different examples: graphs of actual ion-channel states such as those in the classical Hodgkin–Huxley model, and more generally, ensembles of random graphs. In Sect. 4, we compute the distribution of ${R}_{k}$ for the graphs of the potassium and sodium channel states in the Hodgkin–Huxley model. In Sect. 3, we consider an ensemble of random graphs such as the Erdös–Rényi ensemble with randomly assigned binary measurement vector M and prove our main result, which is a statement about the expected value of ${R}_{k}$. Should our random graph ensemble produce a graph that does not consist of a single connected component, then we may apply Lemma 1 to each isolated component of the graph separately. However, for the random graph ensembles we consider, the probability of drawing a disconnected graph decays very rapidly as $n\to \mathrm{\infty}$. We discuss this point further in Appendix D.
For a random graph ensemble, the eigenvectors of the graph Laplacian are distributed randomly on the unit sphere [19, 20]. Hence, they are unlikely to be exactly orthogonal to either ${\zeta}_{k}$ or M. Given a series of assumptions (see Sect. 3.1) that are true for naturally occurring random ensembles such as the symmetric Gaussian and Erdös–Rényi ensembles, we state our main result.
where the constant C depends on the mean edge weight.
This result shows that the edges in the graph naturally decompose into two classes, distinguished by their asymptotic behavior for large n. The first class of edges represents connections between differently labeled nodes, in terms of the measurement vector M. The first class comprises the “important” edges in the graph, in the sense that these edges have mean ${R}_{k}$ values that scale as order $1/n$. The second class of edges connects identically labeled nodes. These edges have mean ${R}_{k}$ values of order less than ${n}^{-q}$, where $q>1$ is driven by the fourth moment of the eigenvector components (see assumption A4a in Sect. 3.1 for details). As n increases, these edges become relatively “unimportant” and, hence, can be neglected under the stochastic shielding approximation with minimal loss of accuracy. For the case of the Gaussian ensemble, $q=2$. Empirically, for the Erdös–Rényi random graph ensemble, $q\approx 5/3$ (see discussion in Sect. 3.3 and also Fig. 4). The proof of Theorem 2 is given in Sect. 3.2. Before discussing more complicated examples, we illustrate the decomposition of the full process into approximate subprocesses for a simple 3-state example in the next subsection.
2.3 3-State Example
Indexing of nodes and edges for the 3-state process, cf. Eq. 1 and Fig. 1. The first column gives the reaction number, the middle column gives the direction of the reaction, and the last column gives the contribution of the reaction to the measurement $Y={M}^{\u22ba}X$
k | i(k)→j(k) | ${M}^{\u22ba}{\zeta}_{k}$ |
---|---|---|
1 | 1→2 | 0 |
2 | 2→1 | 0 |
3 | 2→3 | +1 |
4 | 3→2 | −1 |
Since we are assuming ${\sigma}_{k}=1$ for all k, the k th column of B is exactly the stoichiometry vector associated with the k th reaction, and in particular, ${B}_{k}{B}_{k}^{\u22ba}={\zeta}_{k}{\zeta}_{k}^{\u22ba}$.
The full process $X(t)$ has four stochastic transitions and a reduced process $\tilde{X}(t)$ is defined by keeping a subset of the four stochastic transitions. We use the notation $\tilde{X}={X}_{(i,j,k)}$ to explicitly define which columns of the full matrix B are neglected in the approximate process, i.e. which stochastic transitions are neglected. We are interested in the accuracy of the approximation of the trajectory itself.
If we fix a point in the underlying sample space (a choice of four Poisson processes ${Y}_{k}(t)$ in the system $N(t)$ or a choice of four white noise processes $d{W}_{k}(t)$ in the system $X(t)$) and then choose to neglect the fluctuations in two of the four, i.e. by replacing ${Y}_{k}(t)$ with $E[{Y}_{k}(t)]$ or $d{W}_{k}(t)$ with zero, respectively, then the question is: which choice leads to the most accurate representation of the process as seen by the measurement?
Table of discrepancies ${M}^{\u22ba}{U}_{(i,j,k)}={M}^{\u22ba}({X}_{(i,j,k)}-X)$ for the 3-state Markov process. The discrepancy ${M}^{\u22ba}{U}_{(1,2)}$ (marked by ∗) corresponds to reduced process ${X}_{(1,2)}$ projected onto the third component, which is the optimal two-edge-neglecting approximation of X for this example, in agreement with Schmandt and Galán [14]
${M}^{\u22ba}{U}_{(i,j,k)}$ | $\sum {R}_{{k}^{\prime}}$ | Value |
---|---|---|
${M}^{\u22ba}{U}_{(1)}$ | ${R}_{1}$ | 0.0417 |
${M}^{\u22ba}{U}_{(2)}$ | ${R}_{2}$ | 0.0417 |
${M}^{\u22ba}{U}_{(3)}$ | ${R}_{3}$ | 0.2917 |
${M}^{\u22ba}{U}_{(4)}$ | ${R}_{4}$ | 0.2917 |
${M}^{\u22ba}{U}_{(1,2)}$ | ${R}_{1}+{R}_{2}$ | 0.0833* |
${M}^{\u22ba}{U}_{(3,4)}$ | ${R}_{3}+{R}_{4}$ | 0.583 |
${M}^{\u22ba}{U}_{(1,3)}$ | ${R}_{1}+{R}_{3}$ | 0.3333 |
${M}^{\u22ba}{U}_{(1,4)}$ | ${R}_{1}+{R}_{4}$ | 0.3333 |
${M}^{\u22ba}{U}_{(2,3)}$ | ${R}_{3}+{R}_{3}$ | 0.3333 |
${M}^{\u22ba}{U}_{(2,4)}$ | ${R}_{2}+{R}_{4}$ | 0.3333 |
${M}^{\u22ba}{U}_{(1,2,3)}$ | ${R}_{1}+{R}_{2}+{R}_{3}$ | 0.375 |
${M}^{\u22ba}{U}_{(1,2,4)}$ | ${R}_{1}+{R}_{2}+{R}_{4}$ | 0.375 |
${M}^{\u22ba}{U}_{(1,3,4)}$ | ${R}_{1}+{R}_{3}+{R}_{4}$ | 0.625 |
${M}^{\u22ba}{U}_{(2,3,4)}$ | ${R}_{2}+{R}_{3}+{R}_{4}$ | 0.625 |
3 Analysis of Stochastic Shielding for a Random Graph Ensemble
Because our methods combine heuristic numerical evidence with probabilistic calculations, we use “≈” to represent “heuristic equality”. Where precise order estimates are available, we use “O” notation. For the reader’s convenience, we restate Theorem 2.
where the constant C depends on the mean edge weight.
reactions connecting nodes with identical values of M have a small contribution to the error, so these reactions can be neglected under the stochastic shielding approximation. This result relies on a list of assumptions which are described in detail below. The proof of this theorem requires Lemma 3, which is stated after the assumptions and proved in Appendix C.3.
3.1 Assumptions on the Random Graph Ensemble
We state a sequence of assumptions on the random graph ensemble needed to establish our main result. Each assumption is reasonable for a broad class of graphs of interest, for reasons articulated in the Remarks following each assumption. In several instances we impose on our random graph ensemble, as assumptions, properties that are known to hold for broad classes of random matrices, such as the Wigner ensemble [19, 20]. The ensemble we consider is not equivalent to a generalized Wigner ensemble. Nevertheless, for the reasons detailed below, it appears reasonable, that certain aspects of the eigenvector and eigenvalue distribution may be similar in the two cases.
We consider an ensemble of symmetric directed graphs $\mathcal{G}=(\mathcal{V},\mathcal{E})$ with $|\mathcal{V}|=n$. Let ${\zeta}_{k}$ be the stoichiometry vector corresponding to the k th reaction (Eq. 17) and let $({\lambda}_{i},{v}_{i})$ denote the eigenpairs of the graph Laplacian $L={(A-D)}^{\u22ba}$ listed with eigenvalues in descending order. We assume that the eigenvector components are ${l}_{2}$-normalized with mean 0 and variance $1/n$, and we assume the following:
A0. (Following [21].) Let ${a}_{ij}\ge 0$, the entries of the adjacency matrix, be random variables defined on a common probability space, with $\{{a}_{ij},1\le i<j\le n\}$ independent (but not necessarily identically distributed), with ${a}_{ij}={a}_{ji}$, $E[{a}_{ij}]={\mu}_{A}$, $V[{a}_{ij}]={\sigma}_{A}^{2}>0$ for all $1\le i<j\le n$, and ${sup}_{1\le i<j\le n}E|({a}_{ij}-{\mu}_{A})/{\sigma}_{A}{|}^{\kappa}<\mathrm{\infty}$ for some $\kappa >0$.
Remark 1a: Assumption A1 holds for the symmetric Gaussian ensemble as well as for the more general Wigner ensemble [19, 20]. Indeed for these ensembles the eigenvalues and eigenvectors are independent. The weaker assumption, that they are at most weakly correlated, appears reasonable for e.g. the ensemble of graph Laplacians obtained from the symmetric Erdös–Rényi random graph ensemble.
Remark 1b: The symmetric Gaussian and Wigner ensembles are fully invariant under permutation of eigenvectors, and the weaker assumption of near invariance appears reasonable for the Erdös–Rényi ensemble. In particular, the pair $(\frac{-1}{{\lambda}_{i}+{\lambda}_{j}})$, $({M}^{\u22ba}{v}_{i}{v}_{i}^{\u22ba}{\zeta}_{k}{\zeta}_{k}^{\u22ba}{v}_{j}{v}_{j}^{\u22ba}M)$ appearing in the definition of ${R}_{k}$ (Lemma 1) are assumed to be approximately uncorrelated. This assumption is reasonable by virtue of the approximate rotational symmetry of the eigenvector distribution under our choice of random graph model, which we expect to be close (heuristically) to the eigenvector distribution of the symmetric Gaussian ensemble [19, 20].
A2. $E[{v}_{i}(l)]=0$ for any $i,l\in \{1,\dots ,n\}$ where ${v}_{i}(l)$ denotes the l th component of the i th eigenvector. Remark 2a: Note that $E[{v}_{i}{(l)}^{2}]=1/n$ by the ${l}_{2}$-normalization of the eigenvectors because ${\parallel v\parallel}_{2}=\sqrt{{\sum}_{l=1}^{n}v{(l)}^{2}}=1$ for each eigenvector v. This normalization leaves a 2-fold ambiguity in the choice of eigenvector v. Since +v and −v both have ${\parallel v\parallel}_{2}=1$, we choose randomly between them so that the first non-zero component is positive with probability $1/2$.^{3} Remark 2b: By the symmetry of our random graph ensemble under the symmetric group acting on the change of labels, assumption A2 holds not just for the Gaussian and Wigner ensembles, but for any reasonable symmetric ensemble. In particular, it holds for the symmetric Erdös–Rényi random graph ensemble.
- a.as $n\to \mathrm{\infty}$, for $i\ne j$.$E[{v}_{i}(l){v}_{j}({l}^{\prime})]=O({n}^{-3})$
- b.as $n\to \mathrm{\infty}$, for $l\ne {l}^{\prime}$.$E[{v}_{i}(l){v}_{i}({l}^{\prime})]=O({n}^{-2})$
- a.as $n\to \mathrm{\infty}$, for some $q>1$.$E[{v}_{i}{(l)}^{4}]=O({n}^{-q})$
- b.as $n\to \mathrm{\infty}$, for $l\ne {l}^{\prime}$.$E[{v}_{i}{(l)}^{2}{v}_{i}{({l}^{\prime})}^{2}]=O({n}^{-2})$
Remark 4: Assumption A4a holds for the Gaussian case for $q=2$. For the Erdös–Rényi case, empirically we see that assumption A4a holds for $q\approx 5/3$ as shown in Fig. 4. Specifically, empirical evidence suggests that $E[{v}_{i}{(l)}^{4}]\approx \sqrt{2}{n}^{-5/3}$ in this case.
Remark 5: The reason for this assumption will become clear in the proof of Theorem 2. It is similar in spirit to the four moment theorem for eigenvector components of a Wigner or Gaussian random matrix, different versions have been established by Tao and Vu [20] and Knowles and Yin [19]. Figure 4 provides numerical evidence for the plausibility of assumption A5 in the Erdös–Rényi case.
In addition to assumptions A0–A5 on the random graph ensemble, the statement of Theorem 2 places an assumption on the measurement vector $M\in {\{0,1\}}^{n}$. This vector contains ${n}_{1}>0$ ones and ${n}_{0}>0$ zeros such that ${n}_{1}+{n}_{0}=n$. We assume ${n}_{1}=O(1)$ as $n\to \mathrm{\infty}$, that is, we exclude the case where ${n}_{1}$ grows without bound as n grows. (If M has the same value for all nodes, the output is constant and the error is identically zero. The expression in Theorem 2 holds trivially so we ignore this case.)
Total number of states (n) and number of conducting states (${n}_{1}$) for different ion-channel models. Empirically based model refinements have led to increasing numbers of channel states, without dramatically increasing the number of conducting states
Although assuming that ${n}_{1}=O(1)$ is biologically plausible, we make this assumption mainly for technical reasons as indicated in the proof of Theorem 2. We note, however, that in the numerical example in Sect. 3.3, the conclusions of Theorem 2 appear to hold equally well when ${n}_{1}={n}_{2}=n/2$.
- A..$E[{M}^{\u22ba}{v}_{i}{v}_{i}^{\u22ba}{\zeta}_{k}]=E[{\sum}_{l\in {1}_{M}}{v}_{i}(l)({v}_{i}({l}_{+})-{v}_{i}({l}_{-}))]=\frac{1}{n}{M}^{\u22ba}{\zeta}_{k}+O({n}^{-2})$
- B..$E{[{M}^{\u22ba}{v}_{i}{v}_{i}^{\u22ba}{\zeta}_{k}]}^{2}=E{[{\sum}_{l\in {1}_{M}}{v}_{i}(l)({v}_{i}({l}_{+})-{v}_{i}({l}_{-}))]}^{2}=\frac{1}{{n}^{2}}|{M}^{\u22ba}{\zeta}_{k}|+O({n}^{-4})$
- C.for some $q>1$.$E[{({M}^{\u22ba}{v}_{i}{v}_{i}^{\u22ba}{\zeta}_{k})}^{2}]=E[{({\sum}_{l\in {1}_{M}}{v}_{i}(l))}^{2}{({v}_{i}({l}_{+})-{v}_{i}({l}_{-}))}^{2}]=O({n}^{-q})$
Note that the exponent $q>1$ in part C is governed by the fourth moment of the eigenvector components of the graph Laplacian (see assumption A4a). The proof of Lemma 3 is given in Appendix C.3.
3.2 Proof of Main Theorem
This expectation is taken over the space of symmetric directed graphs $\mathcal{G}=(\mathcal{V},\mathcal{E})$ where edge k is chosen at random from the set of $\left(\begin{array}{c}n\\ 2\end{array}\right)$ possible bidirectional edges. If ${l}_{\pm}(k)\notin \mathcal{E}$, then $E[{R}_{k}|M]=0$.
as $n\to \mathrm{\infty}$, which establishes the first term (Eq. 34).
Hence, $(n-1)E[{({\sum}_{l\in {1}_{M}}{v}_{i}(l))}^{2}{({v}_{i}({l}_{+})-{v}_{i}({l}_{-}))}^{2}]=O({n}^{1-q})$ as $n\to \mathrm{\infty}$, which establishes the second term (Eq. 35). Therefore, we have established Theorem 2.
3.3 Symmetric Erdös–Rényi Random Graph Ensemble
Theorem 2 says that if the matrix of eigenvector components of the Erdös–Rényi graph Laplacian is sufficiently similar to a random matrix drawn from the Gaussian ensemble (in terms of assumptions A0–A5) then one would expect the partitioning of the ${R}_{k}$ into two clusters. One cluster, containing the important edges, will be centered at $1/n$. A second cluster, containing the unimportant edges, will have smaller ${R}_{k}$ values ($O({n}^{-q})$ where $q>1$ is governed by the fourth moment; see assumption A4a in Sect. 3.1). To the extent to which this similarity to the Gaussian ensemble holds, our calculation of ${R}_{k}$ involves projecting the measurement vector M and the vectors ${\zeta}_{k}$ onto randomly chosen subspaces of ${\mathbb{R}}^{n}$.
4 Application: Stochastic Shielding of Hodgkin–Huxley Channels Under Voltage Clamp
Hodgkin and Huxley’s (HH) model for the generation and propagation of action potentials along the giant axon of the squid Loligo lies at the foundations of modern neuroscience [22, 30]. In the classic HH model, action potentials are generated through the interaction of a leak current and two voltage-gated ionic currents, carried by a sodium ion specific channel and a potassium ion specific channel. The potassium channel comprises four identical subunits that open and close independently with voltage-dependent rates. The channel carries a current when all four subunits are in the open state. At the molecular level, a single channel can be represented as a continuous time Markov jump process on a chain of five states, the fifth of which has non-zero conductance. Of the eight transitions connecting states along this chain, only the last two connect states with different conductances, therefore the stochastic shielding approximation would preserve the fluctuations of these transitions and not the other six.
The sodium channel involves two types of subunits, an activation subunit (“m”) present in three identical copies, and an inactivation subunit (“h”) present in a single copy.^{4} The resulting graph has eight distinct states connected by 20 different transitions, each occurring with a voltage-dependent rate [31–33]. Four of these 20 transitions connect states with differing conductance values (zero versus non-zero); the fluctuations of the remaining 16 transitions are ignored under the stochastic shielding approximation.
Schmandt and Galán compared simulations of a system comprising 5000 individual potassium channels and 25000 individual sodium channels, both with and without the stochastic shielding approximation. It is possible to construct an exact simulation scheme, analogous to Gillespie’s stochastic simulation algorithm [34], that takes into account the nonstationarity of the transition rates (propensities) arising from their voltage dependence [35]. However, Schmandt and Galán used a discrete time approximation to this process. Appendix A discusses Schmandt and Galán’s approach in more detail. Here we apply our analysis to evaluate the edge importance ${R}_{k}$ of each transition in the graph for the classic HH potassium and sodium channels, respectively. Rather than consider the case of time-varying transition rates, we restrict attention to the “voltage clamped” case. If the membrane potential is experimentally held constant for a given cell, the per capita transition rates remain constant and the fluctuating ion-channel population forms a stationary Markov process. In particular, our analysis approximates this stationary population process with a linear multidimensional Ornstein–Uhlenbeck process (see Appendix B); this approximation is reasonable given the large numbers of individual channels considered in Schmandt and Galán’s simulations.
4.1 Hodgkin–Huxley Potassium Channel
Physically, it is the current rather than the state occupancy that holds the greatest interest. The current through a population of potassium channels with net conductance g is $I=g(V-{V}_{k})$; here ${V}_{k}=-77\text{mV}$ is the potassium reversal potential, and the conductance $g={g}^{o}{N}_{o}$ is the product of the unitary or single channel conductance ${g}^{o}$ with the total number of channels in the open state, ${N}_{o}$. The variance of the current is therefore ${({g}^{o}(V-{V}_{k}))}^{2}$ times the variance of the occupancy number, meaning that near the reversal potential, the current can have low variance even if the channel state has high variance. For convenience we set ${g}^{o}=1$, which amounts to a change of nominal units for measuring the conductance.
4.2 Hodgkin–Huxley Sodium Channel
where ${D}_{ii}(V)={\sum}_{j=1}^{8}{A}_{ij}(V)$ from the adjacency matrix above (Eq. 50). The matrix B is also voltage-dependent and is given by the general expression in Eq. 49.
In summary, our analysis fully supports the accuracy of Schmandt and Galán’s stochastic shielding algorithm for the Hodgkin–Huxley system, at least for the voltage clamped case that we consider. More significantly, our analysis allows one to calculate the relative importance of each transition in a network of first-order reactions, allowing a new quantitative basis for reduction of complexity of stochastic network models. In the case of a simple chain of states such as the Hodgkin–Huxley potassium channel, the rank ordering of transitions by importance ${R}_{k}$ is the same for all voltages. As shown in Fig. 13, however, for more complicated gating schemes, such as the Hodgkin–Huxley sodium channel, the rank ordering of transitions by importance can differ at different voltages.
For instance, the most important transition at subthreshold voltages ($V\lesssim -40\text{mV}$) is the transition connecting the $[m=(1,1,0),h=1]$ state (state 7 in Fig. 12) to the $[m=(1,1,1),h=1]$ state (state 8, the conducting state). This transition corresponds biophysically to the nonconducting-to-conducting transition that occurs via activation or deactivation [22], that is, the opening (or closing) of the last of three m-activation gates in the ion channel. It is significant that this transition is the most “important” for subthreshold voltages, because the activation transition is typically the last subthreshold event during spike generation.
On the other hand, at suprathreshold voltages the most important transition is that connecting the $[m=(1,1,1),h=1]$ state (state 8) with the $[m=(1,1,1),h=0]$ state (state 4). Biophysically, this transition corresponds to inactivation and deinactivation, or the closing (and opening) of the h-inactivation gate. During action potential generation this transition plays an essential role in terminating the voltage spike upstroke, and it is significant that it should be most “important” at suprathreshold voltages.
For more general channel schemes, and more elaborate stochastic processes in general, the identification of the relative quantitative importance of different transitions or edges to the observable behavior of the system is a powerful new tool for principled complexity reduction.
5 Discussion
In the ongoing race between growth of empirical data sets and growth of available computing power, conceptual understanding of complex dynamical systems can get left behind. Finding efficient lower-dimensional representations of high-dimensional systems, that accurately capture relevant aspects of system behavior, not only takes better advantage of computational resources, but can provide insights into the essential components of a system. Hence, there has been a significant effort in recent years to develop principled complexity reduction techniques for naturally occurring complex networks.
Schmandt and Galán [14] developed a method for efficient simulation of stochastic ion-channel gating in the membrane of a neuron. The random gating of ion channels provides an important class of biological processes which are naturally represented as Markov chains on graphs [33, 35]. The graphs in this case arise from the different configurations of ion-channel subunits or “gates”. Typically each state carries one of two functional labels: open or closed. This coarse-grained representation of the ion channel corresponds to a linear measurement functional, in the sense that current flowing through open channels can be measured experimentally, and individual ion channels typically exhibit binary all-or-none conductance. Schmandt and Galán implemented a novel form of coarse graining technique that ignores fluctuations between indistinguishable transitions (open-to-open or closed-to-closed) while preserving fluctuations between distinguishable states. In order to gain a deeper understanding of why their “stochastic shielding approximation” works so well, we analyzed it in the context of a multidimensional Ornstein–Uhlenbeck process on a variety of networks. First, we showed that this form of model reduction can be represented as a mapping from a many-dimensional sample space to a lower-dimensional sample space, rather than as a mapping from a many-node network to a few-node network, and that one can formulate the problem as a search for the optimal such mapping. Second, we showed that for the specific 3-state example presented in Schmandt and Galán’s paper, their approximation is indeed optimal in a specific sense. Third, we obtained a theoretical result showing that stochastic shielding works for an ensemble of random graphs with arbitrarily chosen binary measurement vectors, analogous to the identification of nodes as conducting versus nonconducting in ion-channel models. Finally, we evaluated the stochastic shielding approach for the graph representing the ion-channel states of the classical Hodgkin–Huxley model, and showed that this approach is optimal for a wide range of fixed voltages under “voltage clamped” conditions.
5.1 Relationship Between Different Levels of Modeling
The underlying description of Schmandt and Galán’s model [14] is given by the population process described in Sect. 2.1, a more general framework than the Ornstein–Uhlenbeck process that we study. The OU process connects to the population process via a tau-leaping approximation, as described in Appendix B. The tau-leaping method involves two key assumptions. First, assuming that the transition propensities ${\alpha}_{ij(k)}$ do not change dramatically in an interval of length τ, we can approximate the number of transitions in each interval by a collection of independent Poisson processes. This approach is closely related to the framework of Schmandt and Galán, except that they use a binomial distribution instead of a multinomial distribution (see Appendix A). Second, if the expected number of occurrences of each reaction is sufficiently large (i.e. 10 s or 100 s) in time τ, then it is reasonable to use a Gaussian approximation to the Poisson process. The resulting model comprises the standard chemical Langevin formulation, in which the size of the fluctuations associated with each transition is state dependent. These two constraints can always be satisfied by taking a sufficiently large number of individuals in the population. The Ornstein–Uhlenbeck process is obtained by linearizing about the mean field steady state distribution of the tau-leaping model (see Appendix B). The intensity of the noise terms is determined by the mean steady state occupancy of each state, resulting in a linear OU process. A technical obstacle to extending our results beyond the linear OUP setting is the lack of an explicit closed form expression for the stationary covariance of the population process analogous to Eq. 6. Although our analysis is limited to the OU process version of the system, it is reasonable to expect that stochastic shielding will apply more broadly. For example, in the full population process one can decompose the fluxes in the model into a sum of a mean component and a mean zero fluctuating component. In this case, stochastic shielding amounts to setting the fluctuating component to zero while preserving the mean for those transitions connecting observationally equivalent states.
Limiting the investigation to voltage clamped conditions facilitated a more thorough mathematical analysis of the stochastic shielding approximation, but also restricted the biological applicability of the results. By approximating the population process with a closely related Ornstein–Uhlenbeck process we effectively linearized the system about a fixed point given by the mean field behavior. Therefore our analysis does not address important nonlinear dynamical behaviors arising in many physical and biological systems, such as noise driven transport between multiple quasiequilibria, fluctuation induced spiking in excitable systems (including noise induced spiking in nerve cells), or limit cycle oscillations (including regular spiking in nerve cells). On the one hand, we anticipate that transitions in a state graph corresponding to directly observable state changes, such as between conducting and nonconducting ion-channel states, will remain “important” under more general measures accounting for global, nonlinear behaviors. On the other hand, it is certainly possible that additional transitions may also become important with respect to more general measures, if the linear measurement vectors employed here fail to capture their contribution to global dynamics.
5.2 Broader Applications
The stochastic shielding approximation can be directly applied to various biological networks, not just ion-channel models. For instance, Lu et al. [13] describe a signal transduction network in which the phosphorylation and transport events are arranged with a ladder topology. The two sides of the ladder denote molecules in the nucleus and in the cytoplasm, respectively. On each side, there are $M+1$ species having different levels of phosphorylation (see Fig. 1 of [13] for an illustration). This is a more elaborate Markov process than a simple ion-channel state model, but it can still be described with a binary measurement vector. The readout is 1 if the system is both in the nucleus and in a specific phosphorylated state, and 0 otherwise. The application of stochastic shielding to such a system is quite natural.
Another broad class of examples includes calcium-induced calcium release Markov models. Nguyen, Mathias and Smith [36] studied a stochastic automata network description of instantaneously coupled intracellular calcium channels which they derived from Markov models of single channel gating that include calcium activation, inactivation, or both. This high-dimensional system involves a large number of functional transitions; the transition probabilities of one channel depend on the local calcium concentration which is typically influenced in turn by the state of other channels in the population. Such models can easily become very high dimensional. For example, DeRemigio et al. [37] considered a discrete state continuous time Markov model of coupled calcium channels, taking explicit channel position in to account, which yields up to 1.6 million distinct states. Similarly, in order to investigate the relationship between single-molecule stochastic events and whole-cell behavior, Skupin et al. [38] implemented a multi scale calcium signaling and spike generation model. Their model connects channel state transitions on a millisecond time scale with interspike interval fluctuations on the scale of tens of seconds, and involves a large number of chemical states. For systems of such complexity, any reduction of the complexity of the stochastic process by stochastic shielding will likely be advantageous, both for simulation and for analysis.
We have focused here on discrete state ion-channel models with binary measurement vectors. However, it is possible that some ion channels may have a richer than binary readout structure. For example, Catterall [39] provides structural evidence that activation of a bacterial sodium channel may possess multiple non-equivalent conducting states, raising the possibility that conductance could be graded rather than binary. As another example which could lead to graded measurement vectors, adaptive evolution can be represented as a random walk on a graph representing genomic variants connected by possible mutation routes [40, 41]. While the stochastic process representing the evolution of a human pathogen such as influenza may have an enormous number of degrees of freedom [42, 43], the dynamics of interest may comprise a smaller number of dimensions, such as a strain’s virulence or fitness, which may naturally be graded rather than discrete quantities.
for some $q>1$ (e.g., $q=2$ for the Gaussian unitary ensemble, and $q\approx 5/3$ for the Erdös–Rényi ensemble, empirically). In the case of a binary measurement vector, $M\in \{0,1\}$, this formula would revert to the result given in Theorem 2. A rigorous derivation of Eq. 53 is beyond the scope of the present paper.
The behavior of stochastic processes arising in first-order reaction networks has been explored in broad generality by Cadgil, Lee and Othmer [44]. They used a spectral approach to analyze a general system of first-order reaction networks, and studied the effect of changes in the network topology on the distribution of the number of reactant molecules, as well as the difference between conversion and catalytic networks with the same topology. Exploring sample space reductions conditioned on a linear measurement functional for such general classes of networks would be of interest.
5.3 Different Levels of Model Simplification
Model simplification is an important goal for Markov chain models in many scientific contexts, and complexity reduction has been pursued through a corresponding variety of approaches. Newman and others have extensively developed techniques based on community structure, aggregating or lumping nodes together based on topological considerations [45, 46]. When applied to a stochastic process on a graph, the aggregation of $N\gg n$ to n nodes is equivalent to a projection of the original process onto a subspace in which the process components on the aggregated fine-grained nodes are averaged. In most cases, the resulting coarsened process is no longer Markov, although in some cases exact dimension reduction to a lower-dimensional Markov processes can be accomplished [47–49]. Other aggregation schemes, such as spectral coarse graining [50–52], have been proposed based on the spectral properties of the graph Laplacian. Approaches based on topological or abstract spectral properties do not necessarily take into account functional properties of the system to be simplified. Because stochastic shielding simplifies the representation of a stochastic process taking into account the function of the system, namely by distinguishing conducting versus nonconducting ion-channel states, it may provide insights not afforded by graph aggregation based on modularity or graph spectra.
As another example of simplification based on functional properties, Bruno, Yang and Pearson [53] used independent open-closed transitions to describe a canonical form that can express all possible reaction schemes for binary ion channels.
Not all prior approaches to simplification of random processes on graphs proceed by aggregating nodes. For instance, Ullah, Bruno and Pearson [54] proposed model simplification by the elimination of nodes with low equilibrium occupancy probability using time scale separation arguments. The reduced system has fewer parameters, and the dynamics of the reduced system are identical to those of the original system except on very fast time scales. Other simplifications based on graph sparsification have been proposed by Koutis, Levin and Peng [55].
In this paper we have investigated a novel form of simplification of stochastic processes on graphs. Stochastic shielding is based on replacing a high-dimensional stochastic process defined on a graph with a lower-dimensional process on the same graph, rather than replacing a complex network with a simpler one. Specifically, we consider mappings from the original process to an approximate process defined on a significantly smaller sample space. In one sense, we can think of the full and a reduced system as two systems with partially shared stochastic input, and partially independent stochastic input of different magnitudes (magnitude zero, in one case). Structurally, this situation is analogous to the kind of mixed common-noise and independent-noise scenario studied in the context of neuronal synchronization [56–58]. In another sense, stochastic shielding can be seen as a different kind of projection, vs. that induced by lumping or pruning nodes. The latter methods simplify the graph, whereas stochastic shielding leaves the graph unchanged and simplifies the sample space on which the approximate process lives.
Appendix A: Stochastic Shielding Construction of Schmandt and Galán
By convention we take ${N}_{ii}(t)\equiv 0$ and ${\alpha}_{ii}(t)\equiv 0$. The ${Y}_{ij}$ are independent unit rate Poisson processes driving the different state-to-state transitions. The transition from state i to state j occurs with per capita rate ${\alpha}_{ij}$. In a conductance-based model, such as a discrete stochastic version of the Hodgkin–Huxley equations, the vector $({N}_{1}(t),\dots ,{N}_{K}(t))$ would represent the number of ion channels in each of K distinct states, and the transition rates could vary with time, e.g. through dependence on membrane potential or second messenger concentration. Although Schmandt and Galán consider both the stationary and time-varying case, we restrict attention to the stationary case, which corresponds experimentally to a voltage clamped preparation.
The multinomial distribution produces an integer-valued random vector with mean and marginal distributions the same as that given by the binomial distribution; the only difference is that transitions emanating from a common node are not assumed to be independent.
Schmandt and Galán obtain similar expressions that agree up to order $O(h)$; the difference between the binomial and multinomial expressions only appears in the $O({h}^{2})$ terms. For example, they assert that $E[\delta {\mathrm{\Delta}}_{1}(t)\delta {\mathrm{\Delta}}_{3}(t)|\overrightarrow{N}(t)]\equiv 0$, while under the multinomial model this covariance is equal to $-{N}_{2}(t){\alpha}_{21}{\alpha}_{23}{h}^{2}$. Fortunately, this difference does not undermine the main argument.
From this point, Schmandt and Galán obtain an expression for the stationary covariance matrix of the reduced process (compare Eq. (8) in [14] with our Lemma 1) and decompose the covariance into a sum over direct and indirect connections to a single conducting or observable state. This situation corresponds, in our analysis, to the case where the measurement vector M contains a single non-zero entry. Schmandt and Galán argue that suppressing the fluctuations associated with transitions not directly affecting the observable state decrease their contribution to the variance of the observable state occupancy, while increasing the contribution of the direct transitions to the same variance. In addition, they show through numerical comparisons that Hodgkin–Huxley equations with a full Markov process and the reduced process are practically indistinguishable both under voltage clamp (stationary transition rates) and current clamp (time-varying transition rates) conditions.
Appendix B: Derivation of Tau-Leaping for an Arbitrary Finite Graph
B.1 Tau-Leaping: General Case
Each ${Y}_{k}$ is an independent unit rate Poisson process counting the occurrence of reaction k (transition from state $i(k)$ to $j(k)$); ${\alpha}_{k}$ is the per capita transition rate of reaction k; ${N}_{i(k)}(s)$ is the number of individuals at state $i(k)$ at time s, and ${\zeta}_{k}$ is the stoichiometry vector for reaction k. For simplicity, we will suppress “k” in our notation so that state i means state $i(k)$.
is the graph Laplacian which can be represented as the sum over all undirected edges (denoted by the set ${\mathcal{E}}^{\ast}$) given in Eq. 75.
since ${N}_{i}(s)=\overline{{N}_{i}}(s)+{X}_{i}(s)$ and ${\alpha}_{k}$ and $\overline{{N}_{i}}$ are constants.
by dropping the dependence of the variance on X.
where Q is the generator matrix. Note that we changed notation slightly to illustrate that ${\alpha}_{ij}$ is the transition rate from state i to j rather than indexing by reaction k. The graph Laplacian we consider in Eq. 4 is actually $L={Q}^{\u22ba}$ so we have $dX=L(\overline{N}+X)dt$. Since $\overline{N}$ is proportional to the stationary distribution π, we have that $L\overline{N}=0$, and hence the first term in the SDE is $dX=LXdt$.
where ${\sigma}_{k}=\sqrt{{\overline{N}}_{i(k)}{\alpha}_{k}}$ in the definition of matrix B.
Therefore, putting the first and second terms together, we have derived the OU process $dX=LXdt+BdW$ given in Eq. 4.
B.2 Tau-Leaping: 3-State Example
following the notation given in Sect. 2.3, specifically the labeling of reactions given in Table 1. Note that ${\alpha}_{k}$ could be time dependent ${\alpha}_{k}(t)$.
where the matrix above is the generator Q, or our ${L}^{\u22ba}$. In the case where Q is fixed, $\overline{X}$ is proportional to the null left eigenvector of Q; biologically, this is the voltage clamp case. Let $({\overline{X}}_{1},\dots ,{\overline{X}}_{n})$ be the corresponding stationary vector. Now we linearize Eqs. 96–98 around the stationary vector.
Neglecting the $O(\frac{|V|}{{N}_{\mathrm{tot}}})$ terms gives us the multidimensional Ornstein–Uhlenbeck process of Eq. 4 for the 3-state example.
Appendix C: Proofs and Calculations
C.1 Stationary Covariance of a Multidimensional OU Process
We note that the (left) eigenvector corresponding to the leading (0) eigenvalue of L has constant components, therefore it lies in the kernel of the matrix ${B}_{k}{B}_{k}^{\u22ba}$ for each k, which guarantees finite covariance in Eq. 108.