Responses of Leaky IntegrateandFire Neurons to a Plurality of Stimuli in Their Receptive Fields
 Kang Li^{1, 2},
 Claus Bundesen^{2} and
 Susanne Ditlevsen^{1}Email author
DOI: 10.1186/s1340801600402
© Li et al. 2016
Received: 21 November 2015
Accepted: 30 April 2016
Published: 23 May 2016
Abstract
A fundamental question concerning the way the visual world is represented in our brain is how a cortical cell responds when its classical receptive field contains a plurality of stimuli. Two opposing models have been proposed. In the responseaveraging model, the neuron responds with a weighted average of all individual stimuli. By contrast, in the probabilitymixing model, the cell responds to a plurality of stimuli as if only one of the stimuli were present. Here we apply the probabilitymixing and the responseaveraging model to leaky integrateandfire neurons, to describe neuronal behavior based on observed spike trains. We first estimate the parameters of either model using numerical methods, and then test which model is most likely to have generated the observed data. Results show that the parameters can be successfully estimated and the two models are distinguishable using model selection.
Keywords
Probabilitymixing Responseaveraging Parameter estimation Model selection Visual attention1 Introduction
The receptive field of a neuron in the visual system can be defined as the spatial area in which stimulation changes the firing pattern of the neuron. In primary visual cortex, receptive fields are small, with typical values of, for example, 0.5–2 deg of visual angle near the fovea. Moving up the hierarchy of extrastriate visual areas along either the dorsal [1] or the temporal [2] pathway, receptive field sizes grow substantially [3, 4], reaching, for example, a value of about 30 deg in the inferotemporal cortex. A plausible explanation is that since these areas process more complex aspects of the visual environment, information has to be integrated over larger spatial areas, such as when encoding faces [5] in the ventral pathway or optic flow patterns [6] in the dorsal one. Typically, receptive fields that are so big will contain a plurality of distinct stimulus objects rather than just a single stimulus object [7]. The way a cortical cell responds when its classical receptive field contains a plurality of stimuli is a basic question concerning the way the visual world is represented in our brain.
1.1 ProbabilityMixing and ResponseAveraging
In a pioneering study, Reynolds et al. [8] found that a typical cell in visual area V2 or V4 in monkeys responded to a pair of objects in its classical receptive field by adopting a rate of firing which, averaged across trials, equaled a weighted average of the responses to the individual objects when these were presented one at a time, with greater weight on an object the more attention was directed to the object. Reynolds et al. accounted for their data by proposing that on each individual trial, the firing rate of a cell to a plurality of stimulus objects equaled a weighted average of the firing rates to the individual objects when these were presented alone. Bundesen et al. [9, 10] proposed an alternative explanation of the data of Reynolds et al. by pointing out that the effects observed in firing rates that were averaged across trials could be explained by assuming that on each individual trial, when a plurality of objects were presented, the cell responded as if just one of the objects was presented alone, so that across trials, the response of the cell was a probability mixture of the responses to the individual objects when these were presented alone.
In the responseaveraging model proposed by Reynolds et al. [8] (see also [11–18]), the neuron responds with a weighted average of the responses to single stimuli. By contrast, in the probabilitymixing model proposed by Bundesen et al. [9], the neuron responds at any given time to only one of the single stimuli with certain probabilities. Suppose the stimulus \(S(t)\) presented to the neuron consists of K separated single stimuli, denoted by \(S_{1}(t) , \ldots, S_{K}(t)\). In the responseaveraging model, the neuron responds with a weighted average of responses to single stimuli, \(\sum_{k} \beta_{k} I_{k}(t)\), with \(\beta_{k}\) being the weights, and \(\sum_{k}\beta_{k}=1\). Here \(I_{k}(t)\) denotes the effects that \(S_{k}\) has on the spiking neuron model, which we set to be the stimulus current. In the probabilitymixing model, the response of the neuron equals one of the responses the neuron would have had if only a single stimulus was presented according to a probability mixture with probabilities \(\alpha_{1}, \ldots, \alpha_{K}\), and \(\sum_{k} \alpha_{k}=1\).
In our previous study [19], we compared the abilities of the probabilitymixing model and the responseaveraging model to account for spike trains (i.e., times of action potentials obtained from extracellular recordings) recorded from single cells in the middle temporal visual area (MT) of rhesus monkeys. Point processes were employed to model the spike trains. Results supported the probabilitymixing model.
In this article, we combine the probabilitymixing and the responseaveraging model with the leaky integrateandfire (LIF) model, to describe neuronal behavior based on observed spike trains. This is cast in a general setting, where the stimulus \(S(t)\) is represented as an input current to the neuron. The spike train data are simulated using the LIF model, responding either to a single stimulus or to a stimulus pair. In the case of stimulus pair, both response averaging and probability mixing are used. The first goal of the paper is to estimate parameters of either of the two models from spike train data. The second goal is to test which of the two models are most likely to have generated the observed data.
1.2 The Leaky IntegrateandFire Model
The LIF models have been extensively applied to model the membrane potential evolution in single neurons in computational neuroscience (for reviews, see [20, 21]). The model has some biophysical realism while still maintaining mathematical simplicity. The simplest LIF model is an Ornstein–Uhlenbeck (OU) process with constant conductance, leak potential, and diffusion coefficient. More biophysical realism can be obtained by allowing for postspike currents generated by past spikes [22]. Here we use postspike currents generated via three types of kernels [23, 24]: bursting, decaying, and delaying kernel, all modeled by the difference between two decaying exponentials, but any kernel could be used.
1.3 Temporal Stimulus
Constant stimuli are simple to handle and are widely used in both experiments and modeling work. However, real world stimuli are generally time varying. If they for example contain oscillatory components, the generated spike trains might also contain oscillations in the firing rates. Here we use three types of stimuli: oscillatory stimuli described by sinusoidal functions, pulsing stimuli modeled by piecewise constant functions, and stochastic stimuli described by OU processes.
1.4 Method Summary
We combine the models describing neuronal response to a plurality of stimuli, namely the probabilitymixing model and the responseaveraging model, with the LIF framework, for different types of stimuli and response kernels. Parameter estimation is done by maximum likelihood using firstpassage time probabilities of diffusion processes [25]. We solve the firstpassage time problem by numerically solving either a partial differential equation (PDE), the Fokker–Planck equation, or an integral equation (IE), the Volterra integral equation. Numerical solutions of these equations have been extensively explored and applied in the computations of neuronal spike trains [26–28]. Inspired by these previous studies, we apply four numerical methods, including two Fokker–Planck related PDEs and two kinds of Volterra IEs, and compare the performance of the four methods. We also describe and compare two alternative methods for maximizing the likelihood function of the probabilitymixing model, which are direct maximization of the marginal likelihood and the expectation–maximization (EM) algorithm. Finally, we show that the probabilitymixing model and the responseaveraging model can be distinguished in the LIF framework, by comparing parameter estimates and through uniform residual tests.
2 Leaky IntegrateandFire Model with Stimuli Mixtures
The stimulus current \(I(t)\) is shaped from the external stimulus current through a stimulus kernel \(k_{s}(t)\) as \(I(t)=\int_{\infty}^{t}k_{s}(ts)S(s)\, ds\), where \(S(s)\) denotes the external current at time s. Similarly, the postspike current arises from past spikes through a response kernel \(k_{h}(t)\) by \(H(t)=\int_{\infty}^{t}k_{h}(ts)\mathbb{I}(s)\, ds\). Here \(\mathbb{I}(s)=\sum_{\tau\in d}\delta(s\tau)\) describes the spike train, where \(\delta(\cdot)\) denotes the Dirac delta function.
2.1 Current from Stimulus Mixture
3 Maximum Likelihood Estimation Using FirstPassage Time Probabilities
Our objective here is to estimate the parameters μ and σ from (1), the response kernel function \(k_{h}\) in (2) represented by the parameter vector η, and either the probability vector of the stimuli in the mixture, \(\alpha= (\alpha_{1}, \ldots, \alpha_{K})\), under the probabilitymixing model, or the vector of weights in the average, \(\beta= (\beta_{1}, \ldots, \beta_{K})\), in the responseaveraging model. The estimation of the decay rate γ is difficult when there is no access to the membrane potential, but only spike times are observable, as discussed in [29, 30]. We therefore assume γ is known. The vector of all parameters in the model is thus θ, where \(\theta=(\mu, \sigma, \eta, \alpha)\) in the probabilitymixing model, and \(\theta =(\mu, \sigma, \eta, \beta)\) in the responseaveraging model. The stimulus is assumed known and the stimulus parameter vector s is therefore not estimated.
A similar LIF model with different stimulus and response kernels on single piecewise constant stimuli was used in Paninski et al. [24]. They showed that parameters can be estimated using MLE by solving the Fokker–Planck equation, covering also discussion of nonwhite noise and interneuronal interactions. The model was later applied to experimental data collected from retina of macaque monkeys [31]. Here we estimate parameters in the LIF model for various temporal stimuli and different response kernels, using four different numerical methods to calculate the likelihood function, within the framework of either the probabilitymixing or the responseaveraging model.
Suppose we observe N spike trains, \(D=(d_{1}, \ldots, d_{N})\), all responding to the same stimulus mixture, where the ith spike train consists of \(N_{i}\) spike times, \(d_{i}=(t_{1}^{i}, \ldots, t_{N_{i}}^{i})\). The jth ISI of the ith spike train is then given by \(t^{i}_{j+1}t^{i}_{j}\). Assume that each measured spike train, i.e., each trial, is sufficiently short, such that, under the probabilitymixing model, the neuron is only responding to one stimulus within the stimulus mixture, not switching the response within the trial.
3.1 FirstPassage Times and Probability Distributions
Modeling the spike train data as threshold crossings of the underlying diffusion process representing the unobserved membrane potential belongs to the socalled firstpassage time problem [32, 33]. For models with no effects from past spikes, such that ISIs are assumed i.i.d., one approach is to build loss functions using the Fortet equation [29, 30]; see also [34]. A more general method, which allows for the postspike effects in model (1), is to use maximum likelihood estimation (MLE) from numerical solutions of PDEs or IEs for the conditional distribution of the spike times or equivalently, the ISIs.
The solution of the Fokker–Planck equation provides \(f(x,t)\) and \(F(x,t)\), and therefore also \(g(t)\). The solution of the Volterra integral equation directly provides \(g(t)\) [36]. Calculating \(g(t)\) enables us to do MLE, as explained in Sects. 3.5 and 3.6 below.
3.2 Fokker–Planck Equation
Both PDEs are solved numerically using the Crank–Nicholson finite difference method, together with the Thomas algorithm efficiently solving tridiagonal systems [37]. Whichever method we use, we can always obtain the PDF (CDF) from the CDF (PDF) by numerical differentiation (integration).
3.3 Volterra Integral Equation
The initial condition for the IE is \(g(0)=0\). Using this, we can solve the equation recursively and obtain \(g(t)\).
3.4 Computational Time Complexity
For both the Fokker–Planck PDE and the Volterra IE methods, the time complexity is directly related to the grid size for the numerical solution. Specifically, suppose that the grid size of the time discretization is n and the size of the space discretization is m. Then the Fokker–Planck method has complexity on the order of \(O(m n)\) and the Volterra method is on the order of \(O(n^{2})\) (native implementation requires \(O(n^{3})\), but techniques are applied to reduce the complexity to \(O(n^{2})\); see [36]). Furthermore, the computation is largely affected by the response kernel used. A discretization is applied to approximate the nonlinear kernel by a piecewise constant function with sufficiently small segmentation length. The values of the constant segments are calculated and stored in a data vector when the parameters are updated. Then inside an optimization loop, the kernel function is evaluated by referring to this data vector.
3.5 Marginal Likelihood of the ProbabilityMixing Model
3.5.1 Optimizing the Likelihood Using the ExpectationMaximization Algorithm
As an alternative to optimizing directly the loglikelihood function (19), the EM algorithm [40] is well suited to solve optimization problems for mixture models and is simple to implement. The EM algorithm treats the unknown stimulus mixture component which the neuron responds to as unobserved data, or latent variables. We write \(Y=(y_{1}, \ldots, y_{N})\) where \(y_{i} \in\{1, 2, \ldots, K\}\), for the latent variables indicating which single stimulus each spike train is responding to. The full data then include both the observed spike trains D and the unobserved stimuli response Y.
The EM algorithm is an iterative procedure. In each iteration, the expectation of the full data loglikelihood conditional on the parameters from the previous iteration, is maximized to obtain the optimal parameters for the current iteration. The algorithm runs until convergence, i.e., the difference of parameter estimates is sufficiently small between two adjacent iterations. We use the notation θ for the current parameter to estimate, and \(\theta_{1}\) for the parameter estimated in the previous iteration, and likewise for the components of the probability vector α, i.e., \(\alpha_{k}\) and \((\alpha_{k})_{1}\).
3.6 Likelihood of the ResponseAveraging Model
3.7 Model Checking: Uniformity Test
4 Simulation Study
To illustrate the approach, we first detail the simulation study of the bursting kernel and the sinusoidal stimulus. Then results using the other types of kernels and stimuli are briefly illustrated and summarized.
Parameter values used in the simulation study
Category  Parameter  Value  Explanation 

Sinusoidal stimulus  \(s_{1}\)  (10,12,1,50)  First stimulus 
\(s_{2}\)  (20,8,0,50)  Second stimulus  
Unknowns to estimate  η  (50,25,40,15)  Bursting response kernel 
α  (0.4,0.6)  Probability mixing  
β  (0.4,0.6)  Response averaging  
μ  0.5  Reversal potential  
σ  1  Diffusion parameter  
Numerical computation  Δt  0.002  Time discretization 
Δx  0.02  Space discretization  
\(x^{}\)  0  Lower reflecting boundary  
Neuronal characteristics  \(x_{0}\)  0.4  Reset potential 
\(x_{\mathrm{th}}\)  1  Spike threshold  
γ  100  Decay rate 
Parameter estimation was split in two, in agreement with how a typical experiment would be conducted. First we simulated spike trains responding to single stimuli. Note that in this case the probabilitymixing and the responseaveraging models are the same, and \(\alpha=\beta=1\) are onedimensional. The data set contains 10 spike trains, with five attending the first single stimulus and the other five attending the second single stimulus. Using this data set, we estimated parameters of the response kernel, η, and parameters of the diffusion model, μ and σ.
Second, we simulated spike trains using a mixture of the two sinusoidal stimuli. Two data sets were simulated, one data set consisting of 10 spike trains following the probabilitymixing model, and another data set consisting of 10 spike trains following the responseaveraging model. To check if the two models could be distinguished, we fitted the data using the probabilitymixing model and the responseaveraging model on both data sets, resulting in four combinations. During this stage, we fixed the response kernel parameters η to values estimated in the first step, and estimated again μ, σ, as well as α or β, depending on the model. There are therefore two sets of estimates of μ and σ for each trial. The purpose is threefold; first of all, these parameters might slightly drift in a real neuron when changing the stimulus (even if we do not change them in the simulation); second, it is of interest to understand the statistical accuracy and uncertainty of these parameter estimates when inferred in the two experimental settings; and third, comparing estimates from both single stimulus and stimulus mixtures can serve as model control, as explained below. When fitting the probabilitymixing model on the data generated from this same model, we used both the marginal MLE and the EM algorithm. The above simulation and estimation procedure was repeated 100 times, generating 100 sets of estimates.
The simulation study serves different purposes. First, the four numerical methods to obtain the PDFs of the spike times, namely the first Volterra, second Volterra, Fokker–Planck PDF, and Fokker–Planck CDF, should be evaluated and compared. This is done on single stimulus spike train data. Second, the quality of the parameter estimates should be assessed, as well as how important it is to use the correct model for the estimation. This is conducted using spike trains simulated from stimulus mixtures. Also the performance of the marginal MLE and the EM algorithm in the case of the probabilitymixing models should be compared. Third, it should be evaluated if it is possible to detect which of the two models generated the data. Results from these three analyses are presented in the following.
4.1 Numerical Solutions of the Partial Differential and Integral Equations
Figure 4(c) shows in the upper panels three examples of spike times PDFs, \(g(t)\), and the lower panels show a corresponding example trace for each, plotted on top of their timeevolving PDFs of \(X(t)\), \(f(x,t)\), as heatimages. The three ISIs are taken from the left, middle left, and middle right panels of Fig. 3.
4.2 Results from Single Stimulus Trials
Average ± standard deviation of 100 parameter estimates from single stimulus data
μ  σ  

True value  0.5  1 
First Volterra  0.4800 ± 0.01095  1.076 ± 0.06913 
Second Volterra  0.5066 ± 0.01287  1.020 ± 0.07281 
Fokker–Planck PDF  0.4981 ± 0.00730  1.060 ± 0.04567 
Fokker–Planck CDF  0.4889 ± 0.00698  1.065 ± 0.04442 
4.3 Distinguishing Between ResponseAveraging and ProbabilityMixing
Average ± standard deviation of 100 parameter estimates using the responseaveraging (RA) model and the probabilitymixing (PM) model on data sets simulated according to the two models
μ  σ  \(\alpha_{1}\) (PM)/\(\beta_{1}\) (RA)  

True value  0.5  1  0.4 
RA on RA data  0.4876 ± 0.00658  1.067 ± 0.04441  0.3888 ± 0.01564 
PM on RA data  0.3553 ± 0.01087  2.077 ± 0.06482  0.0017 ± 0.00467 
RA on PM data  0.3288 ± 0.01191  2.429 ± 0.09216  0.3098 ± 0.02161 
PM on PM data (Marginal)  0.4891 ± 0.00844  1.062 ± 0.05609  0.4013 ± 0.01636 
PM on PM data (EM)  0.4889 ± 0.00813  1.063 ± 0.05410  0.3988 ± 0.01012 
Rejection ( \(\pmb{p<0.05}\) ) rate based on the Kolmogorov–Smirnov test for uniformity done on each repetition
Method  Low accuracy^{*}  High accuracy^{**} 

RA on RA data  32/100  1/20 
RA on PM data  100/100  20/20 
PM on RA data  100/100  20/20 
PM on PM data  32/100  0/20 
4.4 ProbabilityMixing with EM
4.5 Generalizations
In this section we only apply the Fokker–Planck CDF method and analyze the model for different types of response kernels and stimuli.
Parameter values for all response kernels and stimuli used in the single stimulus study for the generalized analysis
Category  Parameter value  

Stimulus, s  Sinusoidal  (10,12,1,50) 
Piecewise constant  (50,70,50,30,50,60,0,1.3,1.7,2.3,2.7,3.8,5)  
OU process  (50,20)  
Response kernel, η  Bursting  (50,25,40,15) 
Decay  (0,0,2,0.5)  
Delay  (20,8,50,15) 
Rejection ( \(\pmb{p<0.05}\) ) rate based on the Kolmogorov–Smirnov test for uniformity, using different response kernels with the mixture of stochastic stimuli
RARA  RAPM  PMRA  PMPM  

OU  Burst  22/100  99/100  100/100  19/100 
Decay  1/100  100/100  83/100  1/100  
Delay  30/100  77/100  97/100  34/100  
Feller  Burst  23/100  100/100  95/100  22/100 
Decay  0/100  100/100  81/100  1/100  
Delay  30/100  84/100  100/100  37/100 
4.6 Model Selection Accuracy
The results above show that parameters can be inferred and the correct model can be determined for the specific parameter choices used in the simulations. Here we explore the model selection accuracy for varying parameter values including the weight, stimulus dissimilarity, stimulus strength and number of spike trains. In the following analysis, we use the bursting response kernel, a mixture of two stochastic stimuli and the Fokker–Planck CDF method. To introduce a stimulus dissimilarity, a sinusoidal perturbation is added to one of two identical OU processes, \(\tilde{S}(t) = S(t) + a\sin(10t)\), where t is measured in seconds and a is the perturbation size. To change the stimulus strength, the OU processes are linearly scaled using \(\tilde{S}(t) = bS(t)\) where b denotes the scaling size.
5 Discussion
5.1 Estimation of the Decay Rate
We have shown that parameter inference can be successfully conducted for the probabilitymixing and the responseaveraging model on corresponding data incorporating different response kernels for LIF neurons. The decay rate γ has been assumed known. We also attempted to estimate all parameters including γ (results not shown), but the optimization often finds local minima and leads to low accuracy. The estimation of γ seems to suffer from identifiability problems, due to only observing spike times and not the underlying membrane potential. Nevertheless, to estimate γ we may fix it at different values and run the optimization procedure for the rest of the parameters, and then compare the model fit for the different γ values. This is not pursued here.
5.2 Bias of the Numerical Methods
We found that the parameter estimates and the QQ plots from the four methods suffer from over and underestimation issues. The MLE is based on the firstpassage time probabilities, which we obtain using four numerical methods, Fokker–Planck PDF, Fokker–Planck CDF, first Volterra and second Volterra. Because of the intrinsic differences between these methods, discretization leads to different biases of the calculated spike time PDFs. As seen from Fig. 3, when increasing the grid size, the first Volterra and the Fokker–Planck CDF methods tend to increase the PDF value in the beginning of the ISI, while the second Volterra tends to slightly decrease it. The low accuracy of the first Volterra method arises from a singularity of \(f^{*}(x,tv,s)\) when \(v=x\) and \(t\to s\). However, by removing the singularity the second Volterra is more accurate for numerical computations.
5.3 Efficiency of Numerical Methods
We choose the Fokker–Planck CDF method for estimation of mixtures, because it achieves a wellbehaved balance between accuracy and computational burden. Table 2 also shows that this method has the smallest variance on parameter estimates.
Although the first Volterra method is the computationally fastest, it has poor convergence, as seen from the number of loops in the bottom right panel in Fig. 5. Overall, the PDE methods tend to converge faster than the IE methods.
The performance is affected by the grid size. The estimates in Fig. 5 uses \(\Delta t=0.002~\mbox{s}\) and \(\Delta x = 0.02\). This discretization setting generally achieves acceptable computation times and statistical accuracy, but as shown in Sect. 4.3, a finer grid is needed for model selection. One may tweak the grid sizes in order to obtain separate settings for each of the four methods to obtain comparable efficiency and accuracy. However, considering that in practical data the errors come from many sources like measurement errors and approximate modeling, the optimal discretization on simulated data is of less importance and interest. Thus, we suggest the current setting as providing a generally good balance, and we will not investigate this further.
5.4 EM for Better Estimation of Mixture Probabilities
Figure 10 shows that the estimation of the mixture probability parameter α is slightly less stable for the marginal MLE than for the EM algorithm. The EM algorithm implicitly enlarges the data size by using latent variables for the mixture probability, referred to as data augmentation [45]. The completedata loglikelihood function used in the M step does not contain logarithms of sums, making the estimation more stable. By iteratively updating the expectation in the E step and obtaining stable estimation in the M step, the EM algorithm improves the stability when inferring the probabilitymixing model, and in general, mixture models.
Although the EM algorithm performs better, it is only slightly better for α and the improvement is negligible or nonexistent for μ and σ. This is because we only use two components in the mixture, which does not generate notable differences between the marginal MLE and the EM algorithm. A larger advantage of the EM algorithm can be expected under more complex stimulus mixtures. Furthermore, the response kernel is fixed, and the two methods use the same initial values for μ and σ (obtained from the single stimulus trials) in the optimization procedure, which also contributes to the similarity of results between the two methods.
5.5 Extension of Noise
In this paper a onedimensional stochastic differential equation model driven by a Wiener process for the membrane potential has been considered, which arises as an approximation to Stein’s model [46], leading to the OU model, or to the extended model including reversal potentials, proposed by Tuckwell [41], leading to the Feller model [42]. The model does not take into account specific dynamics of synaptic input or ion channels, which affects the dynamics, see, e.g., [47–49], where the autocorrelations of the synaptic input is shown to be an important factor. This is partially accounted for in our model through the memory kernels. Incorporating autocorrelated synaptic input or ion channel dynamics would lead to a multidimensional model. In principle, the firstpassage time probabilities could then be obtained by solving multidimensional Fokker–Planck equations [24]. However, the statistical problem is further complicated by the incomplete observations, since typically only the membrane potential is measured, as studied in [50]. In even more realistic models nonGaussian noise can be included, for example combining the diffusion process with discrete stochastic synaptic stimulus arrivals, leading to a jumpdiffusion process, whose Fokker–Planck equation is generalized as an integrodifferential equation [51]. Solving multidimensional or generalized Fokker–Planck equations are significantly more expensive and exact MLE becomes less appealing. This is not pursued here.
5.6 The ResponseAveraging Model
The responseaveraging model used here is slightly different from the responseaveraging model by Reynolds et al. [8]. In our model the average is calculated over the currents for each stimulus, while in their model the average is calculated over the firing rates for each stimulus. The reason is as follows. In a spiking neuron model like the LIF model, the generation of each single spike rather than the firing rate is modeled. Whether in the probabilitymixing model, the responseaveraging model or any other model, the spiking is affected by stimuli only through currents. Our model is formulated based on this idea, using a unified spikegenerating mechanism for both the probabilitymixing and the responseaveraging model. The resulting firing rate averaged over a time window from a weighted average of single stimuli, will also be a weighted average of firing rates from single stimuli but with different weights. Our responseaveraging model therefore provides the same consequence in terms of firing rates as the model by Reynolds et al.
5.7 Model Selection of ProbabilityMixing and ResponseAveraging
We finish by addressing the possible model selection methods for probability mixing and response averaging on real data. We have shown that the probabilitymixing and the responseaveraging models can be clearly distinguished if fitted on simulated data. However, real data will likely not follow exactly one of the two models, but one of the models might give a better description of the data than the other. We might need to design more sophisticated methods for model checking and model selection. Apart from conducting uniformity tests based on the uniform residuals from the transformation (23), such as the KStest as we have done, we can compare the Akaike information criterion (AIC) and Bayesian information criterion (BIC) between the two models. We have used a unified DIC method due to equal number of parameters, but AIC and BIC should be used if two models have differing numbers of parameters. Furthermore, the model can also be checked by evaluating the performance of prediction (of spikes) and decoding (of stimuli), using methods such as root mean squared deviation (RMSD) between empirical and predicted values. See [19] for the use of these approaches to distinguish between the two models on experimental data from the middle temporal visual area of rhesus monkeys.
Abbreviations
 LIF:

Leaky integrateandfire
 PDE:

Partial differential equation
 IE:

Integral equation
 EM:

Expectationmaximization
 ISI:

Interspike interval
 MLE:

Maximum likelihood estimation/estimator
 PDF:

Probability density function
 CDF:

Cumulative distribution function
 QQ:

Quantile–quantile
 KS:

Kolmogorov–Smirnov
 DIC:

Deviance information criterion
 AIC:

Akaike information criterion
 BIC:

Bayesian information criterion
Declarations
Acknowledgements
The work is part of the Dynamical Systems Interdisciplinary Network, University of Copenhagen.
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
Authors’ Affiliations
References
 Gilmore RO, Hou C, Pettet MW, Norcia AM. Development of cortical responses to optic flow. Vis Neurosci. 2007;24:845–56. View ArticleGoogle Scholar
 Kanwisher N, Yovel G. The fusiform face area: a cortical region specialized for the perception of faces. Philos Trans R Soc Lond B. 2006;361:2109–28. View ArticleGoogle Scholar
 Smith AT, Singh KD, Williams AL, Greenlee MW. Estimating receptive field size from fMRI data in human striate and extrastriate visual cortex. Cereb Cortex. 2001;11:1182–90. View ArticleGoogle Scholar
 Gattass R, NascimentoSilva S, Soares JGM, Lima B, Jansen AK, Diogo ACM, Farias MF, Marcondes M, Botelho EP, Mariani OS, Azzi J, Fiorani M. Cortical visual areas in monkeys: location, topography, connections, columns, plasticity and cortical dynamics. Philos Trans R Soc Lond B. 2005;360:709–31. View ArticleGoogle Scholar
 Kanwisher N, Yovel G. The fusiform face area: a cortical region specialized for the perception of faces. Philos Trans R Soc Lond B, Biol Sci. 2006;361(1476):2109–28. View ArticleGoogle Scholar
 Gilmore RO, Hou C, Pettet MW, Norcia AM. Development of cortical responses to optic flow. Vis Neurosci. 2007;24(6):845–56. View ArticleGoogle Scholar
 Freeman J, Simoncelli EP. Metamers of the ventral stream. Nat Neurosci. 2011;14(9):1195–201. View ArticleGoogle Scholar
 Reynolds JH, Chelazzi L, Desimone R. Competitive mechanisms subserve attention in macaque areas V2 and V4. J Neurosci. 1999;19:1736–53. Google Scholar
 Bundesen C, Habekost T, Kyllingsbæk S. A neural theory of visual attention: bridging cognition and neurophysiology. Psychol Rev. 2005;112(2):291–328. View ArticleGoogle Scholar
 Bundesen C, Habekost T. Principles of visual attention: linking mind and brain. Oxford: Oxford University Press; 2008. View ArticleGoogle Scholar
 Reynolds JH, Heeger DJ. The normalization model of attention. Neuron. 2009;61(2):168–85. View ArticleGoogle Scholar
 Zoccolan D, Cox DD, DiCarlo JJ. Multiple object response normalization in monkey inferotemporal cortex. J Neurosci. 2005;25(36):8150–64. View ArticleGoogle Scholar
 Recanzone GH, Wurtz RH, Schwarz U. Responses of MT and MST neurons to one and two moving objects in the receptive field. J Neurophysiol. 1997;78(6):2904–15. Google Scholar
 Britten KH, Heuer HW. Spatial summation in the receptive fields of MT neurons. J Neurosci. 1999;19(12):5074–84. Google Scholar
 Nandy AS, Sharpee TO, Reynolds JH, Mitchell JF. The fine structure of shape tuning in area V4. Neuron. 2013;78(6):1102–15. View ArticleGoogle Scholar
 Busse L, Wade AR, Carandini M. Representation of concurrent stimuli by population activity in visual cortex. Neuron. 2009;64(6):931–42. View ArticleGoogle Scholar
 MacEvoy SP, Tucker TR, Fitzpatrick D. A precise form of divisive suppression supports population coding in the primary visual cortex. Nat Neurosci. 2009;12(5):637–45. View ArticleGoogle Scholar
 Lee J, Maunsell JH. A normalization model of attentional regulation of single unit responses. PLoS ONE. 2009;4:e4651. View ArticleGoogle Scholar
 Li K, Kozyrev V, Kyllingsbæk S, Treue S, Ditlevsen S, Bundesen C. Neurons in primate visual cortex alternate between responses to multiple stimuli in their receptive field. Submitted. 2016.
 Burkitt AN. A review of the integrateandfire neuron model: I. Homogeneous synaptic input. Biol Cybern. 2006;95(1):1–19. MathSciNetView ArticleMATHGoogle Scholar
 Sacerdote L, Giraudo MT. Stochastic integrate and fire models: a review on mathematical methods and their applications. In: Bachar B, Batzel JJ, Ditlevsen S, editors. Stochastic biomathematical models with applications to neuronal modeling. New York: Springer; 2013. p. 99–148. (Lecture notes in mathematics, vol. 2058). View ArticleGoogle Scholar
 Gerstner W, Kistler WM. Spiking neuron models: single neurons, populations, plasticity. Cambridge: Cambridge University Press; 2002. View ArticleMATHGoogle Scholar
 Gerstner W, Van Hemmen JL, Cowan JD. What matters in neuronal locking? Neural Comput. 1996;8(8):1653–76. View ArticleGoogle Scholar
 Paninski L, Pillow JW, Simoncelli EP. Maximum likelihood estimation of a stochastic integrateandfire neural encoding model. Neural Comput. 2004;16(12):2533–61. View ArticleMATHGoogle Scholar
 Sirovich L, Knight B. Spiking neurons and the first passage problem. Neural Comput. 2011;23(7):1675–703. MathSciNetView ArticleMATHGoogle Scholar
 Russell A, Orchard G, Dong Y, Mihalas S, Niebur E, Tapson J, EtienneCummings R. Optimization methods for spiking neurons and networks. IEEE Trans Neural Netw. 2010;21(12):1950–62. View ArticleGoogle Scholar
 Iolov A, Ditlevsen S, Longtin A. Fokker–Planck and Fortet equationbased parameter estimation for a leaky integrateandfire model with sinusoidal and stochastic forcing. J Math Neurosci. 2014;4(1):4. MathSciNetView ArticleMATHGoogle Scholar
 Dong Y, Mihalas S, Russell A, EtienneCummings R, Niebur E. Parameter estimation of historydependent leaky integrateandfire neurons using maximumlikelihood methods. Neural Comput. 2011;23(11):2833–67. View ArticleMATHGoogle Scholar
 Ditlevsen S, Lansky P. Parameters of stochastic diffusion processes estimated from observations of firsthitting times: application to the leaky integrateandfire neuronal model. Phys Rev E. 2007;76(4):041906. View ArticleGoogle Scholar
 Ditlevsen S, Ditlevsen O. Parameter estimation from observations of firstpassage times of the Ornstein–Uhlenbeck process and the Feller process. Probab Eng Mech. 2008;23(2):170–9. View ArticleGoogle Scholar
 Pillow JW, Paninski L, Uzzell VJ, Simoncelli EP, Chichilnisky EJ. Prediction and decoding of retinal ganglion cell responses with a probabilistic spiking model. J Neurosci. 2005;25(47):11003–13. View ArticleGoogle Scholar
 Redner S. A guide to firstpassage processes. Cambridge: Cambridge University Press; 2001. View ArticleMATHGoogle Scholar
 Karlin S, Taylor HM. A second course in stochastic processes. vol. 2. Houston: Gulf Pub; 1981. MATHGoogle Scholar
 Lansky P, Ditlevsen S. A review of the methods for signal estimation in stochastic diffusion leaky integrateandfire neuronal models. Biol Cybern. 2008;99:253–62. MathSciNetView ArticleMATHGoogle Scholar
 Hurn AS, Jeisman J, Lindsay K. ML estimation of the parameters of SDEs by numerical solution of the Fokker–Planck equation. In: MODSIM 2005: international congress on modelling and simulation: advances and applications for management and decision making. 2005. p. 849–55. Google Scholar
 Paninski L, Haith A, Szirtes G. Integral equation methods for computing likelihoods and their derivatives in the stochastic integrateandfire model. J Comput Neurosci. 2008;24(1):69–79. MathSciNetView ArticleGoogle Scholar
 Press WH. Numerical recipes: the art of scientific computing. 3rd ed. Cambridge: Cambridge University Press; 2007. MATHGoogle Scholar
 Ditlevsen S, Lansky P. Estimation of the input parameters in the Ornstein–Uhlenbeck neuronal model. Phys Rev E. 2005;71:011907. MathSciNetView ArticleGoogle Scholar
 Buonocore A, Nobile AG, Ricciardi LM. A new integral equation for the evaluation of firstpassagetime probability densities. Adv Appl Probab. 1987;19:784–800. MathSciNetView ArticleMATHGoogle Scholar
 Dempster AP, Laird NM, Rubin DB. Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc, Ser B, Methodol. 1977;39:1–38. MathSciNetMATHGoogle Scholar
 Tuckwell HC. Synaptic transmission in a model for neuronal activity. J Theor Biol. 1979;77:65–81. MathSciNetView ArticleGoogle Scholar
 Lansky P, Lanska V. Diffusion approximations of the neuronal model with synaptic reversal potentials. Biol Cybern. 1987;56:19–26. MathSciNetView ArticleMATHGoogle Scholar
 Ditlevsen S, Lansky P. Estimation of the input parameters in the Feller neuronal model. Phys Rev E. 2006;73:061910. MathSciNetView ArticleMATHGoogle Scholar
 Burnham KP, Anderson DR. Model selection and multimodel inference: a practical informationtheoretic approach. New York: Springer; 2003. MATHGoogle Scholar
 Hastie T, Tibshirani R, Friedman J, Hastie T, Friedman J, Tibshirani R. The elements of statistical learning. vol. 2. New York: Springer; 2009. View ArticleMATHGoogle Scholar
 Stein RB. A theoretical analysis of neuronal variability. Biophys J. 1965;5:173–95. View ArticleGoogle Scholar
 Brunel N, Sergi S. Firing frequency of leaky integrateandfire neurons with synaptic current dynamics. J Theor Biol. 1998;195(1):87–95. View ArticleGoogle Scholar
 Moreno R, de la Rocha J, Renart A, Parga N. Response of spiking neurons to correlated inputs. Phys Rev Lett. 2002;89:288101. View ArticleGoogle Scholar
 MorenoBote R, Parga N. Role of synaptic filtering on the firing response of simple model neurons. Phys Rev Lett. 2004;92:028102. View ArticleGoogle Scholar
 Ditlevsen S, Samson A. Estimation in the partially observed stochastic Morris–Lecar neuronal model with particle filter and stochastic approximation methods. Ann Appl Stat. 2014;8(2):674–702. MathSciNetView ArticleMATHGoogle Scholar
 Hanson FB. Applied stochastic processes and control for jumpdiffusions: modeling, analysis, and computation. vol. 13. Philadelphia: SIAM; 2007. View ArticleMATHGoogle Scholar