 Research
 Open Access
 Published:
Estimating Fisher discriminant error in a linear integrator model of neural population activity
The Journal of Mathematical Neuroscience volume 11, Article number: 6 (2021)
Abstract
Decoding approaches provide a useful means of estimating the information contained in neuronal circuits. In this work, we analyze the expected classification error of a decoder based on Fisher linear discriminant analysis. We provide expressions that relate decoding error to the specific parameters of a population model that performs linear integration of sensory input. Results show conditions that lead to beneficial and detrimental effects of noise correlation on decoding. Further, the proposed framework sheds light on the contribution of neuronal noise, highlighting cases where, counterintuitively, increased noise may lead to improved decoding performance. Finally, we examined the impact of dynamical parameters, including neuronal leak and integration time constant, on decoding. Overall, this work presents a fruitful approach to the study of decoding using a comprehensive theoretical framework that merges dynamical parameters with estimates of readout error.
Introduction
In recent years, neuronal decoding has emerged as a key aspect of understanding the neural code [1]. The aim of decoding algorithms is to read out the sensorydriven responses of a neuronal population and classify them following a given criterion. Popular criteria include Fisher information [2, 3], mutual information [4], and machine learning approaches [5, 6]. While many types of decoders exist [7], a linear readout of neural activity has often been employed to perform sensory classification [8, 9] and predict motor decisions [10, 11]. Further, different classes of linear readouts are amenable to mathematical analysis and capture biological learning rules such as Hebbian learning [12].
In this work, we formally analyze the optimal decoding error of a linear decoder based on Fisher linear discriminant analysis (LDA). Assuming discrete classes of stimuli, LDA provides an upper bound on linear decoding capacity [13]. In addition, LDA shows good agreement with decisionmaking tasks and offers a bridge between cortical activity and behavioral performance [14, 15].
Importantly, most theoretical approaches based on neural decoding are not concerned with how linear decoders would be influenced by specific dynamical parameters of modeled neural systems [16]. Here, we address this concern by providing expressions that relate decoding error to the adjustable parameters of a ratebased population model with a linear neural integrator [17, 18]. This model captures the average spiking activity of neuronal populations [19–21] and the quasilinear responses of neurons found in many experimental contexts [22]. Preliminary results have been presented in previous work [13, 14], yet the full analytical solution had remained incomplete and limited to positive noise correlation; we now present the complete solution.
The framework relies on the simplifying assumption that signal and noise correlations originate from independent sources. While this assumption does not hold in biological circuits, where signal and noise are related [1], it allows us to systematically explore a wide range of scenarios that describe the impact of neuronal inputs, noise, correlations, and dynamical parameters on linear decoding, where the contribution of each parameter can be examined independently.
This paper begins by describing the neural integrator model and the LDA readout. Then, we provide expressions for LDA error that rely on the parameters of the integrator model. Finally, we consider the effect of correlation, noise, and dynamical parameters on neuronal decoding using both analytical expressions and numerical simulations.
Linear population model
As a starting point, we assume two independent neuronal populations, each projecting in a feedforward manner to a readout discriminating amongst two inputs, \(\nu _{1}\) and \(\nu _{2}\), that are constant over time (Fig. 1(A)). Each population’s mean firing rate in response to stimuli is conceptualized by a tuning curve where a stimulus feature, for instance visual orientation, generates a graded response. This scenario is analogous to analyses that examine population responses after performing a dimensionality reduction to generate a “population tuning curve” [23]. While a more complex model could account for a heterogeneity of responses within each population, we choose to limit our model to two homogeneous populations in order for the classification problem to remain tractable.
The activity of each population is described by a linear neural integrator
where \(x_{i}\) and \(y_{i}\) are the firing rates of each population in response to a given stimulus i, τ is a time constant, α is a leak term, \(\xi (t)\) is Gaussian white noise (\(\mathcal{N} (0,1)\)), and β is the gain of the noise. Network parameters τ, α, and β are bound to \(\mathbb{R}_{>0}\). We make no distinction between noise induced by stimuli and noise generated through intrinsic neural activity. While their effect on mean rate activity is similar [24], their impact on noise correlations differs [1]; in the model, we explicitly separate the effect of firing rate and noise correlation. This will be done by controlling noise correlation through a tunable parameter, as detailed in Sect. 4. An advantage of this formalism is that the effect of noise correlation can be systematically isolated from changes in firing rates and signal correlation that would be induced through reciprocal connections between the two populations. Further, depending on the choice of parameters, the addition of recurrent weights to Eq. (1) may prevent the system from reaching a stationary state, which is a fundamental assumption of LDA.
Fisher linear discriminant decoder
A linear decoder based on LDA reads out the activity of the population model in order to perform a binary discrimination (Fig. 1(B)). Discrimination error generated by LDA provides an estimate of the statistical confidence in distinguishing pairs of stimuli based on network activity. We focus on pairwise discrimination given that error rates obtained from more than two stimuli are well approximated by values obtained from all combinations of pairwise comparisons [25].
LDA assumes that neural activity is sampled from a multivariate Gaussian distribution with class covariance matrix \(\Sigma _{i}\) and class mean vector \(\boldsymbol{\mu }_{i}\). Further, LDA assumes equal class covariance, therefore \(\Sigma _{1} = \Sigma _{2} = \Sigma \). LDA attempts to find a projection line w, perpendicular to the decision boundary, onto which the input space is projected. The optimal projection line maximizes the Fisher criterion J (w) defined as the ratio of the projected between to withinclass variance:
Given the assumption of equal class covariance, we set \(\Sigma _{W} =2 \Sigma \). By taking the derivative of J (w) with respect to w and setting it to zero, one finds the closedform solution for the optimal projection line to be
Formulating a modelbased linear decoder
To analytically derive means (\(\boldsymbol{\mu }_{1}\) and \(\boldsymbol{\mu }_{2}\)) and covariance (Σ) from the neural population model, we rearrange Eq. (1) as follows, using population x as example:
Given that a white noise process is by definition the time derivative of a Weiner process, \(\xi ( t ) =d W_{t} /dt\), we can rewrite Eq. (3) as
with \(\theta _{x} = \alpha _{x} / \tau _{x}\), \(\mu _{ix} = {\nu _{ix}}/{\alpha _{x}}\), and \(\lambda _{x} = \beta _{x} / \tau _{x}\). Equation (4) is an Orstein–Uhlenbeck process with known solution
Equation (5) is a mean reverting process whose stable state follows a Gaussian distribution. A full derivation of this process is found in Sections A.1–A.2. To summarize this derivation, the expected mean and variance are
The stationary mean and variance of Eq. (5) are
With the assumption that the mean of x is much larger than the variance, there is negligible probability that x would fall below zero. Imposing strictly positive values of x could be achieved by the addition of a constant and would not alter the results obtained from the linear classifier.
The readout of neural activity depends on the following feature space:
where Z is obtained from the probability distribution of a multivariate Gaussian with mean \(\boldsymbol{\mu }_{i}\) and covariance Σ. Setting the parameter \(\rho =0\) would be equivalent to a socalled “diagonal decoder” where offdiagonal elements of the covariance matrix are neglected, thus ignoring noise correlations altogether [16].
The closed form solution of LDA (Eq. (2)) can be expressed using the parameters of the population model (Eq. (1)) as follows. First, the total withinclass scatter \(S_{w}\) is
To alleviate the notation, we define \(\Delta \boldsymbol{\mu }^{T} = [ \Delta \mu _{x}, \Delta \mu _{y} ]^{T}= \boldsymbol{\mu }_{0}  \boldsymbol{\mu }_{1}\), where \(\Delta \mu _{u} = \Delta \nu _{u} / \alpha _{u}\), and \(\Delta \nu _{u}\) is the absolute difference between the two stimuli, given an index u that stands for either population x or y. In this way, Eq. (2) becomes
From the law of total probability, the error rate of classification is given by
where \(P [ k=1 ]\) is the probability that a randomly sampled point from any distribution belongs to class j and \(P [ y=i \vert k=j ]\) is the probability that a point is classified as belonging to class i when it belongs to class j. Given that the classifier is unbiased towards each of the two neural populations, \(P [ k=0 ] =P [ k=1 ] =0.5\). To calculate conditional probabilities \(P [ y=i \vert k=j ]\), one must define a threshold c that serves as a boundary between the two distributions. The value of c is chosen to be the midpoint between the means of the projected distributions.
We calculate \(P [ y=i \vert k=j ]\) as the area under the curve of the density function for j in the region where i is the correct class. As a first step, we shift the projected distributions by a factor c, so that the threshold becomes zero to simplify the integration. More specifically, the unshifted threshold c, the means of the shifted distributions \(\eta _{i}\), and their variance \(\zeta ^{2}\) are
with bias term b. The error rate from Eq. (6) then becomes
Details of the full integration of error can be found in Section A.3. The final expression is
This expression is further simplified by introducing the squared Mahalanobis distance \(d^{2}\)
where
Because of equal class covariance, the above expression has the property that
Using Eq. (8), we rewrite \(d^{2}\) from the network parameters:
As the ratio \(\Delta \mu _{u} / \sqrt{\sigma _{u}^{2}}\) appears often in the above solution, we simplify our notation by introducing
This expression simplifies the Mahalanobis distance to
The full derivation of expected error using Mahalanobis distance is found in Sections A.3–A.4. The above analysis provides a relationship between classification error and the network parameters of the population model. In the sections to follow, we explore the various links between these quantities.
Noise correlation
Neurons that are in close physical proximity exhibit correlations in their activity. An extensive body of work has examined the impact of these noise correlations on behavioral tasks [26] and the activity of brain circuits [27–35]. Noise correlations may be advantageous or detrimental to cognitive and sensory processing; however, the specific networklevel properties that give rise to these effects have not been fully elucidated.
In the proposed model, the effect of noise correlation on classification error is highly dependent upon the sensory inputs (\(\nu _{1}\) and \(\nu _{2}\)). We distinguish four main cases that lead to qualitatively different conclusions on the impact of noise correlations. Details of these analyses are provided in Sections A.5–A.6.
A first case arises when the tuning curves of populations x and y are identical in terms of both their orientation preference and their gain (Fig. 2(A)). In this case, \(r_{x} \rightarrow r_{y}\), leading to monotonically increasing error as a function of correlation. Intuitively, this happens because correlation forces the firing rate distributions to “stretch” towards each other. We verified the analytical solution by comparing it to numerical estimates of the error rate as a function of noise correlation. These numerical estimates were obtained with Eq. (1), where populations x and y both received inputs \(\nu _{1} =11\) and \(\nu _{2} =14\) in order for the model to mimick a scenario where the two populations have identical tuning properties. The goal here is not to capture the model’s response to a continuum of stimulus values along the tuning curves, but rather to illustrate the behavior of the model using discrete stimuli. We set \(\tau =1\), \(\beta =1\), and \(\alpha =1\) for both populations. We then numerically generated 5000 points per stimulus class. A subset of 80% of the total number of data points were randomly selected to train the LDA classifier. The proportion of misclassified points was calculated based on the remaining data points. We found good agreement between the numerical estimates and analytical solution (Fig. 2(A)).
Note that the range of error may be increased by moving the firing rate distributions closer to each other without altering the overall shape of the function relating error and noise correlation. While the goal here was to show the distribution of readout error across a broad range of correlation values, we acknowledge that not all combinations of tuning curves and noise correlations are physiologically plausible. In fact, while noise correlations in cortex vary across experimental conditions, regions, and behavioral states, they are typically reported to be on the order of 0.1–0.3 for nearby cells [26]. Therefore, extreme values (both positive and negative) are unlikely to arise in living circuits.
In a second scenario, the two populations are offset in terms of their orientation preference (Fig. 2(B)). We examined classification error in this scenario by setting the input of population x to \(\nu _{1} =11\) and \(\nu _{2} =14\), while population y was set to \(\nu _{1} =14\) and \(\nu _{2} =11\). Analytically, this scenario leads to \(r_{x} \rightarrow  r_{y}\), resulting in a monotonically decreasing error as correlation increases from −1 to 1. Intuitively, this scenario arises because correlation stretches the distributions of responses along parallel lines, decreasing the overlap between them.
A third case arises when the tuning curve of one of the two populations yields the same response for two stimuli (Fig. 2(C)). This happens if the tuning curve of population x exhibits a broad region where firing rate remains constant despite changes in stimulus orientation. Analytically, this would lead to \(r_{x} =0\). We illustrate this scenario by setting \(\nu _{1} =11\) and \(\nu _{2} =11\) for population x, and \(\nu _{1} =11\) and \(\nu _{2} =14\) for population y. This case yields a “symmetrical” effect of correlation on readout error, where maximum error is found at \(\rho _{*} =0\) and error tends towards zero as ρ approaches either 1 or −1.
Finally, a fourth scenario occurs when the two populations have tuning curves that are aligned in terms of orientation preference, but where one population has higher response gain (Fig. 2(D)). This case is defined by \(\vert r_{x} \vert \neq \vert r_{y} \vert \). Error tends to zero as noise correlation (ρ) goes to either −1 or 1. The correlation associated with maximum error is found somewhere in between these extremes and is given by
To illustrate this scenario, we set \(\nu _{1} =11\) and \(\nu _{2} = \)13 for population x, and \(\nu _{1} =11\) and \(\nu _{2} =14\) for population y. Graphically, this scenario arises when noise correlation “stretches” the distribution of responses along parallel lines and their centroids do not align on either dimension. Starting from a correlation of zero, as correlation increases, the distributions will stretch towards each other, thus increasing overlap and error. After a maximum overlap defined by \(\rho _{*}\), further stretching of the distributions will force them to spread too thinly for them to overlap, until the extreme case of a correlation of one, where both distributions would appear as perfectly parallel lines, leading to zero error.
A continuum of cases exists between the different scenarios illustrated in Fig. 2(A)–(D). For instance, the peak error (\(\rho _{*}\)) in Fig. 2(D) can shift to lower correlation values by offsetting one of the tuning curves, yielding a curve closer to Fig. 2(B).
In sum, the above results show that, depending upon the structure of the input delivered to the two neural populations, noise correlations produce widely different effects on classification error. While insights into these results can be obtained without the full formalism described here [34], such formalism becomes pivotal when examining the effect of specific network parameters, as described next.
Impact of noise gain on classification error
To explore the effect of network parameters on error, we first modify Eq. (9) as follows:
where the ratio \(r_{x} / r_{y}\) can be expressed using network parameters
We define a set containing all network parameters \(G_{u}=\{\alpha _{u}, \tau _{u}, \beta _{u}, \Delta \nu _{u}\}\). If g is a subset of these parameters, we can manipulate them using a function f (g) while setting the other parameters to a constant \(c_{g}\). In this way, we can rewrite Eq. (10) as
We can investigate the effect of network parameters on \(\rho _{*} \). For example, the effect of noise gain (\(\beta _{x}\) and \(\beta _{y}\)) on \(\rho _{*}\) when keeping all other parameters constant except for the input is expressed as
for \(\vert r_{x} \vert < \vert r_{y} \vert \).
For illustration purposes, we explored the scenario described in Fig. 2(A), where two populations have equivalent tuning properties. Keeping all parameters constant while altering both \(\beta _{x}\) and \(\beta _{y}\) simultaneously has no effect on \(\rho _{*}\) (Fig. 3(A)). The main impact is an increase in the amount of classification error (Fig. 3(B)). This result is not surprising: increasing the gain of the noise worsens readout performance.
However, markedly different results emerge in a scenario where tuning curves are offset (Fig. 2(B)) and \(\beta _{x}\) is altered while keeping \(\beta _{y}\) unchanged. In this case, \(\rho _{*} =f ( \beta _{x} ) c_{\beta _{x}}\) with \(c_{\beta _{x}}\) given by
and \(f ( \beta _{x} ) =1/ \beta _{x}\). Alterations in \(\beta _{x}\) impact \(\rho _{*}\) in a nonmonotonic fashion (Fig. 3(C)). A small increase from \(\beta _{x} =1\) to \(\beta _{x} = \)2 shifts \(\rho _{*}\) towards a more negative value. However, further increasing to \(\beta _{x} =3\) and \(\beta _{x} =4\) increases \(\rho _{*}\) and alters the relationship between correlation and readout error (Fig. 3(D)).
Hidden in these results is a counterintuitive finding: under certain circumstances, increasing \(\beta _{x}\) leads to a decrease in classification error. This can be seen with \(\beta _{x} =10\) (Fig. 3(D), dashed line), leading to lower error than \(\beta _{x} =3\) (green line) and \(\beta _{x} =4\) (red line) for negative correlations. Intuitively, this can happen when increasing \(\beta _{x}\) stretches the distribution of activity for population x along a single dimension away from the classification boundary [13]. Similar findings are borne out of graphical explanations where noise covariance stretches the distribution of firing rates [36].
The benefits of noise gain are even more pronounced in a scenario where one population has higher gain than the other, as in Fig. 2(D). In this case, \(\beta _{x}\) monotonically shifts \(\rho _{*}\) towards decreasing values (Fig. 3(E)). For a broad range of positive correlation values, a high noise gain (\(\beta _{x} >1\)) leads to lower classification error (Fig. 3(F)).
Impact of dynamical parameters
The approach described in the previous section can be applied to study the impact of the model’s dynamical parameters on readout error. The two parameters of interest are the leak term (α) and the time constant (τ).
The effect of the time constants \(\tau _{x}\) and \(\tau _{y}\) on \(\rho _{*}\) can be expressed as
for \(\vert r_{x} \vert < \vert r_{y} \vert \). To study the effect of a single term (e.g., \(\tau _{x}\)), we set \(\rho _{*} =f ( \tau _{x} ) c_{ \tau _{x}}\) with \(\mathrm{c}_{\tau _{x}}\) given by
and \(f ( \tau _{x} ) = \tau _{x}\). Similarly, the role of leak terms \(\alpha _{x}\) and \(\alpha _{y}\) on \(\rho _{*}\) is
For a single term (\(\alpha _{x}\)), we have \(\rho _{*} =f ( \alpha _{x} ) c_{ \alpha _{x}}\) with
and \(f ( \alpha _{x} ) =1/ \alpha _{x}\). Taking one scenario as illustration, we examined the case where tuning curves are offset by a fixed orientation (\(r_{x} \rightarrow \) \(r_{y}\)). In this case, the time constant affects the relation between noise correlation and readout error, with larger values of \(\tau _{x}\) shifting \(\rho _{*}\) towards smaller negative values of correlation (Fig. 4(A)). The reason for this shift follows from an earlier example (Fig. 2(D)), where an increased correlation resulted in greater overlap between the firing rate distributions, but only up to a point beyond which these distributions became too narrow to overlap. With larger values of \(\tau _{x}\), a given correlation does not create as much overlap as it would for smaller values of \(\tau _{x}\), thus leading to a shift in \(\rho _{*}\).
The overall impact of a larger time constant is a decrease in classification error (Fig. 4(B)): as \(\tau _{x}\) increases, there is less overlap between the distributions of firing rate across stimuli (Fig. 4(B), inset). By contrast, shifting the leak term \(\alpha _{x}\) towards higher values decreases \(\rho _{*}\) (Fig. 4(C)) and increases overall readout error (Fig. 4(D)). The impact of increasing \(\alpha _{x}\) on error is due to an increase in the overlap between firing rate distributions (Fig. 4(D), inset). The inverse effects of \(\tau _{x}\) and \(\alpha _{x}\) on these distributions explain their opposite impact on \(\rho _{*}\).
More complex, nonmonotonic relations between \(\rho _{*}\) and values of \(\tau _{x}\) and \(\alpha _{x}\) are found in different scenarios where tuning curves of the two populations are aligned (Fig. 5(A)) or when the gain of one population is larger (Fig. 5(B)).
Together, these results show that the integration time constant and leak term of the population model mediate the impact of noise correlation on classification error by shifting the value \(\rho _{*}\) at which correlation reaches maximal error. The impact of network parameters on readout error is therefore not straightforward to describe but is brought to light using a framework that derives error estimates from the dynamical parameters of a population model.
Discussion
This work described an analytical framework for performing Fisher linear decoding in a ratebased neural model. With this formalism, we began by capturing welldocumented findings on the role of noise gain and correlations on discrimination error. Going further, the framework allowed us to analytically examine the mediating role of dynamical parameters (neuronal leak and time constant) on the relation between noise correlation and error. Overall, this framework suggests that linear decoding is highly sensitive to dynamical model parameters as well as the characteristics of the sensory input.
One surprising finding was the presence of conditions where increased neuronal noise led to reduced classification error. This result was especially prominent when the gain of the two population tuning curves was unmatched (Fig. 3(E)–(F)). Taken together, our findings cover all possible qualitative scenarios where noise correlations have either a beneficial, detrimental, or null effect on decoding [36].
A related approach termed the leaky competing accumulator model was proposed in order to account for perceptual decision making [37]. Some key differences exist between this model and ours. Firstly, our framework assumes a steadystate of neural activity that is characteristic of a decision point and does not capture the timecourse of deliberation. Our framework assumes an optimal bound on decision accuracy given a linear decoder, representing a ceiling in accuracy that would be associated with long response times (typically >500 ms in human subjects). Secondly, the accumulator model provides explicit connections, through lateral inhibition, that modulate correlations. These lateral connections, however, may also impact firing rates. By comparison, our framework isolates analytically the contribution of firing rates and correlations, and examines their relative role on perceptual discrimination.
It would be challenging to speculate on whether the analytical results provided would generalize to other classes of neural network models, particularly those that include a nonlinear transfer function [38]. However, our work opens the door to such analyses by describing a framework for linking neuronal readout and dynamical modeling.
Limitations and future work. While the framework described here strived to cover all possible scenarios involving firing rates, noise correlations, and network parameters, it is important to emphasize that not all such scenarios are plausible from a physiological standpoint. In particular, the framework treats firing rates and noise correlations as independent contributors to decoding error and allows for implausible cases where increases in firing rate would lead to an increase, a decrease, or no impact on correlations. Interactions between stimulus and noise correlations are a crucial factor limiting the coding capacity of neural circuits [1, 23] and should be considered alongside the dynamical parameters discussed in this work.
Several future directions based on the proposed framework will be worth exploring. First, the assumption of equal class covariances in LDA is challenged by experimental work showing inputdependent neuronal variance [39]. This assumption could be relaxed by replacing LDA with quadratic discriminant analysis, albeit at the cost of a more complex solution when relating readout error to model parameters.
An extension of the current framework could consider the impact of pooling more than two neural populations, as well as more than two stimuli, when performing decoding. This extension would be helpful in examining the interactions between several populations of neurons, each with a unique tuning curve. Going further, one could examine decoding error at the limit of a large number of neurons with heterogeneous tuning curves that vary in both orientation preference and gain [2].
Conclusion. In summary, this work described a theoretical framework that merges Fisher linear decoding with a population model of sensory integration. This approach highlighted the role of correlation, neuronal noise, and network parameters, revealing a broad range of potential outcomes where different conditions generated either detrimental, beneficial, or null impacts on classification performance. These results motivate further developments in theoretical work that systematically link neural network models to optimal decoders in order to reveal the impact of key neurophysiological variables on sensory information processing.
Availability of data and materials
Data sharing not applicable to this article as no datasets were generated or analyzed during the current study.
Abbreviations
 LDA:

Linear discriminant analysis
References
 1.
MorenoBote R, Beck J, Kanitscheider I, Pitkow X, Latham P, Pouget A. Informationlimiting correlations. Nat Neurosci. 2014;17:1410–7.
 2.
Ecker AS, Berens P, Tolias AS, Bethge M. The effect of noise correlations in populations of diversely tuned neurons. J Neurosci. 2011;31:14272–83.
 3.
Shamir M, Sompolinsky H. Implications of neuronal diversity on population coding. Neural Comput. 2006;18:1951–86.
 4.
Quian Quiroga R, Panzeri S. Extracting information from neuronal populations: information theory and decoding approaches. Nat Rev Neurosci. 2009;10:173–85.
 5.
Wen H, Shi J, Zhang Y, Lu KH, Cao J, Liu Z. Neural encoding and decoding with deep learning for dynamic natural vision. Cereb Cortex. 2018;28:4136–60.
 6.
Glasser JI, Benjamin AS, Chowdhury RH, Perich MG, Miller LE, Kording KP. Machine learning for neural decoding. ENeuro. 2020;7:1–16.
 7.
Kriegeskorte N, Douglas PK. Interpreting encoding and decoding models. Curr Opin Neurobiol. 2019;55:167–79.
 8.
Klampfl S, David SV, Yin P, Shamma SA, Maass W. A quantitative analysis of information about past and present stimuli encoded by spikes of A1 neurons. J Neurophysiol. 2012;108:1366–80.
 9.
Meyers EM, Freedman DJ, Kreiman G, Miller EK, Poggio T. Dynamic population coding of category information in inferior temporal and prefrontal cortex. J Neurophysiol. 2008;100:1407–19.
 10.
Nienborg H, Cumming B. Correlations between the activity of sensory neurons and behavior: how much do they tell us about a neuron’s causality? Curr Opin Neurobiol. 2010;20:376–81.
 11.
Shadlen MN, Britten KH, Newsome WT, Movshon JA. A computational analysis of the relationship between neuronal and behavioral responses to visual motion. J Neurosci. 1996;16:1486–510.
 12.
Buonomano DV, Maass W. Statedependent computations: spatiotemporal processing in cortical networks. Nat Rev Neurosci. 2009;10:113–25.
 13.
Calderini M, Zhang S, Berberian N, Thivierge JP. Optimal readout of correlated neural activity in a decisionmaking circuit. Neural Comput. 2018;30:1573–611.
 14.
Berberian N, MacPherson A, Giraud E, Richardson L, Thivierge JP. Neuronal pattern separation of motionrelevant input in LIP activity. J Neurophysiol. 2017;117:738–55.
 15.
Rich EL, Wallis JD. Decoding subjective decisions from orbitofrontal cortex. Nat Neurosci. 2016;19:973–80.
 16.
Averbeck BB, Lee D. Effects of noise correlations on information encoding and decoding. J Neurophysiol. 2006;95:3633–44.
 17.
Cain N, Barreiro AK, Shadlen M, SheaBrown E. Neural integrators for decision making: a favorable tradeoff between robustness and sensitivity. J Neurophysiol. 2013;109:2542–59.
 18.
Goldman MS. Memory without feedback in a neural network. Neuron. 2009;61:621–34.
 19.
Ganguli S, Bisley JW, Roitman JD, Shadlen MN, Goldberg ME, Miller KD. Onedimensional dynamics of attention and decision making in LIP. Neuron. 2008;58:15–25.
 20.
Miri A, Daie K, Arrenberg AB, Baier H, Aksay E, Tank DW. Spatial gradients and multidimensional dynamics in a neural integrator circuit. Nat Neurosci. 2011;14:1150–9.
 21.
Murphy BK, Miller KD. Balanced amplification: a new mechanism of selective amplification of neural activity patterns. Neuron. 2009;61:635–48.
 22.
Chance FS, Abbott LF, Reyes AD. Gain modulation from background synaptic input. Neuron. 2002;35:773–82.
 23.
Rumyantsev OI, Lecoq JA, Hernandez O, Zhang Y, Savall J, Chrapkiewicz R, Li J, Zeng H, Ganguli S, Schnitzer MJ. Fundamental bounds on the fidelity of sensory cortical coding. Nature. 2020;580:100–5.
 24.
Burak Y, Fiete IR. Fundamental limits on persistent activity in networks of noisy neurons. Proc Natl Acad Sci. 2012;109:17645–50.
 25.
Cover TM, Thomas JA. Elements of information theory. 2nd ed. New York: Wiley; 2006.
 26.
Cohen MR, Kohn A. Measuring and interpreting neuronal correlations. Nat Neurosci. 2011;14:811–9.
 27.
Zohary E, Shadlen MN, Newsome WT. Correlated neuronal discharge rate and its implications for psychophysical performance. Nature. 1994;370:140–3.
 28.
Brunel N. Dynamics of sparsely connected networks of excitatory and inhibitory spiking neurons. J Comput Neurosci. 2000;8:183–208.
 29.
Bujan AF, Aertsen A, Kumar A. Role of input correlations in shaping the variability and noise correlations of evoked activity in the neocortex. J Neurosci. 2015;35:8611–25.
 30.
de la Rocha J, Doiron B, SheaBrown E, Josić K, Reyes A. Correlation between neural spike trains increases with firing rate. Nature. 2007;448:802–6.
 31.
Graupner M, Reyes AD. Synaptic input correlations leading to membrane potential decorrelation of spontaneous activity in cortex. J Neurosci. 2013;33:15075–85.
 32.
Renart A, de la Rocha J, Bartho P, Hollender L, Parga N, Reyes A, Harris KD. The asynchronous state in cortical circuits. Science. 2010;327:587–90.
 33.
Salinas E, Sejnowski TJ. Impact of correlated synaptic input on output firing rate and variability in simple neuronal models. J Neurosci. 2000;20:6193–209.
 34.
Hu Y, Zylberberg J, SheaBrown E. The sign rule and beyond: boundary effects, flexibility, and noise correlations in neural population codes. PLoS Comput Biol. 2014;10:e1003469.
 35.
Yim MY, Kumar A, Aertsen A, Rotter S. Impact of correlated inputs to neurons: modeling observations from in vivo intracellular recordings. J Comput Neurosci. 2014;37:293–304.
 36.
Averbeck BB, Latham PE, Pouget A. Neural correlations, population coding and computation. Nat Rev Neurosci. 2006;7:358–66.
 37.
Usher M, McClelland JL. The time course of perceptual choice: the leaky, competing accumulator model. Psychol Rev. 2001;108:550–92.
 38.
Ostojic S, Brunel N. From spiking neuron models to linearnonlinear models. PLoS Comput Biol. 2011;7:e1001056.
 39.
Churchland AK, Kiani R, Chaudhuri R, Wang XJ, Pouget A, Shadlen MN. Variance as a signature of neural computations during decision making. Neuron. 2011;69:818–31.
Acknowledgements
This work benefited from discussions with Brent Doiron and Richard Naud.
Funding
This work was supported by a Discovery grant to J.P.T. from the Natural Sciences and Engineering Council of Canada (NSERC Grant No. 210977).
Author information
Affiliations
Contributions
JPT and MC conceptualized the study, designed the work, performed the analyses, interpreted the data, wrote software, drafted the work, and revised the final version of the manuscript. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
Not applicable.
Competing interests
The authors declare that they have no competing interests.
Consent for publication
Not applicable.
Appendix
Appendix
A.1 Solving the integrator model as a linear differential equation
To solve the integrator model, we began by dropping the unit indices to alleviate the notation:
with \(p ( t ) =\alpha /\tau \) and \(r ( t ) =(\nu +\beta \xi ( t ) )/\tau \). We defined
Then,
Applying the chain rule,
A.2 Expected value and variance
We sought to find the expected mean and variance of the random variable x such that
The expected mean is
Given the zeromean property of Ito integrals,
we have
The expected variance is
By Ito isometry,
Hence, the expected variance can be concisely written as
A.3 Classification error
The classification error as a function of neural activity is given by
Substituting the mean and variance from the previous section, this becomes
A.4 Mahalanobis distance
We began with the following definitions:
Expanding \(\eta _{i}\) yields
Given
we expanded W using the property \(u\boldsymbol{\cdot }\upsilon = u^{T} \upsilon \),
Hence, the squared Mahalanobis distance between means is
We can rewrite \(\eta _{i}\) as
Similarly, for the variance \(\varsigma ^{2}\),
A.5 Derivation of error
We analyzed the extrema of the error function in relation to noise correlation by taking its first derivative through the chain rule
with
The first derivative is given by
The second derivative is
The third derivative is
A.6 Extrema of error
We evaluated the extrema of error by finding the points where Eqs. (14)–(16) are equal to zero,
We assumed that the ratios \(r_{x}\) and \(r_{y}\) are finite and the Euclidean distance between the distribution means is finite and nonnull. In other words, if \(d^{2} \rightarrow \infty \) it is exclusively due to the correlation coefficient. Then,
We proceeded in a similar fashion for the second derivative (Eq. (15)):
The third derivative (Eq. (16)) is
Depending on network parameters, two cases are possible. One case arises if one of the ratios, either \(r_{x}\) or \(r_{y}\), is zero. This happens if the mean activity of one population is equal across inputs. If the mean activity of both units remained unchanged, the resulting multivariate distributions would overlap, thus breaking the basic assumptions justifying the choice of LDA. In this first case,
The second case occurs when neither \(r_{x}\) nor \(r_{y}\) is zero:
The last expression can be decomposed into four distinct cases. First, when \(r_{x}\rightarrow r_{y}\),
Second, when \(r_{x} \rightarrow  r_{y}\),
Third, when \(r_{x} \neq r_{y}\), we examined the positive and negative roots of ρ. The positive root is
Because \(\vert \max (r_{x}^{2}, r_{y}^{2} ) \vert >\vert r_{x} r_{y} \vert \) from the assumption that one ratio is smaller than the other (or unequal, nonnull), this means that \(\vert \rho _{+} \vert >1\ \forall r_{x}, r_{y}\). Since the correlation is bound in the range \(\mathopen{]}1,1[\), the positive root must be rejected. The negative root does not suffer from the same problem,
Fourth, when either \(r_{x} =0\) or \(r_{y} =0\), \(\rho =0\).
A.7 Minima and maxima
We determined upward and downward trends of the error curve by calculating the sign of the derivative between the potential maxima (considering that they are mutually exclusive). Taking Eqs. (14)–(16) and substituting into Eq. (12),
For the condition where \(r_{x} \rightarrow r_{y}\),
For the condition where either \(r_{x} =0\) or \(r_{y} =0\), we have already found the zeros of \(\rho ^{2} r_{x} r_{y} \rho ( r_{x}^{2} + r_{y}^{2} ) + r_{x} r_{y}\) to be \(\rho _{}\) and \(\rho _{+}\). To determine \(\operatorname{sign} ( d\varepsilon /d\rho )\), we need to know whether the extremum of the parabola is a minimum or a maximum,
Given \(\rho _{+} >1\),
Regardless of the conditions for \(r_{x}\) and \(r_{y}\), following Eqs. (27)–(29), the error curve as a function of correlation increases from \(\rho =1\) until its maximum, found at a value of \(\rho _{*} =0\) or \(\rho _{*} = \rho _{}\), and then decreases until \(\rho =1\).
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Calderini, M., Thivierge, JP. Estimating Fisher discriminant error in a linear integrator model of neural population activity. J. Math. Neurosc. 11, 6 (2021). https://doi.org/10.1186/s13408021001044
Received:
Accepted:
Published:
Keywords
 Linear model
 Fisher linear discriminant analysis
 Noise correlation