 Research
 Open
 Published:
Responses of Leaky IntegrateandFire Neurons to a Plurality of Stimuli in Their Receptive Fields
The Journal of Mathematical Neurosciencevolume 6, Article number: 8 (2016)
Abstract
A fundamental question concerning the way the visual world is represented in our brain is how a cortical cell responds when its classical receptive field contains a plurality of stimuli. Two opposing models have been proposed. In the responseaveraging model, the neuron responds with a weighted average of all individual stimuli. By contrast, in the probabilitymixing model, the cell responds to a plurality of stimuli as if only one of the stimuli were present. Here we apply the probabilitymixing and the responseaveraging model to leaky integrateandfire neurons, to describe neuronal behavior based on observed spike trains. We first estimate the parameters of either model using numerical methods, and then test which model is most likely to have generated the observed data. Results show that the parameters can be successfully estimated and the two models are distinguishable using model selection.
Introduction
The receptive field of a neuron in the visual system can be defined as the spatial area in which stimulation changes the firing pattern of the neuron. In primary visual cortex, receptive fields are small, with typical values of, for example, 0.5–2 deg of visual angle near the fovea. Moving up the hierarchy of extrastriate visual areas along either the dorsal [1] or the temporal [2] pathway, receptive field sizes grow substantially [3, 4], reaching, for example, a value of about 30 deg in the inferotemporal cortex. A plausible explanation is that since these areas process more complex aspects of the visual environment, information has to be integrated over larger spatial areas, such as when encoding faces [5] in the ventral pathway or optic flow patterns [6] in the dorsal one. Typically, receptive fields that are so big will contain a plurality of distinct stimulus objects rather than just a single stimulus object [7]. The way a cortical cell responds when its classical receptive field contains a plurality of stimuli is a basic question concerning the way the visual world is represented in our brain.
ProbabilityMixing and ResponseAveraging
In a pioneering study, Reynolds et al. [8] found that a typical cell in visual area V2 or V4 in monkeys responded to a pair of objects in its classical receptive field by adopting a rate of firing which, averaged across trials, equaled a weighted average of the responses to the individual objects when these were presented one at a time, with greater weight on an object the more attention was directed to the object. Reynolds et al. accounted for their data by proposing that on each individual trial, the firing rate of a cell to a plurality of stimulus objects equaled a weighted average of the firing rates to the individual objects when these were presented alone. Bundesen et al. [9, 10] proposed an alternative explanation of the data of Reynolds et al. by pointing out that the effects observed in firing rates that were averaged across trials could be explained by assuming that on each individual trial, when a plurality of objects were presented, the cell responded as if just one of the objects was presented alone, so that across trials, the response of the cell was a probability mixture of the responses to the individual objects when these were presented alone.
In the responseaveraging model proposed by Reynolds et al. [8] (see also [11–18]), the neuron responds with a weighted average of the responses to single stimuli. By contrast, in the probabilitymixing model proposed by Bundesen et al. [9], the neuron responds at any given time to only one of the single stimuli with certain probabilities. Suppose the stimulus $S(t)$ presented to the neuron consists of K separated single stimuli, denoted by $S_{1}(t) , \ldots, S_{K}(t)$. In the responseaveraging model, the neuron responds with a weighted average of responses to single stimuli, $\sum_{k} \beta_{k} I_{k}(t)$, with $\beta_{k}$ being the weights, and $\sum_{k}\beta_{k}=1$. Here $I_{k}(t)$ denotes the effects that $S_{k}$ has on the spiking neuron model, which we set to be the stimulus current. In the probabilitymixing model, the response of the neuron equals one of the responses the neuron would have had if only a single stimulus was presented according to a probability mixture with probabilities $\alpha_{1}, \ldots, \alpha_{K}$, and $\sum_{k} \alpha_{k}=1$.
In our previous study [19], we compared the abilities of the probabilitymixing model and the responseaveraging model to account for spike trains (i.e., times of action potentials obtained from extracellular recordings) recorded from single cells in the middle temporal visual area (MT) of rhesus monkeys. Point processes were employed to model the spike trains. Results supported the probabilitymixing model.
In this article, we combine the probabilitymixing and the responseaveraging model with the leaky integrateandfire (LIF) model, to describe neuronal behavior based on observed spike trains. This is cast in a general setting, where the stimulus $S(t)$ is represented as an input current to the neuron. The spike train data are simulated using the LIF model, responding either to a single stimulus or to a stimulus pair. In the case of stimulus pair, both response averaging and probability mixing are used. The first goal of the paper is to estimate parameters of either of the two models from spike train data. The second goal is to test which of the two models are most likely to have generated the observed data.
The Leaky IntegrateandFire Model
The LIF models have been extensively applied to model the membrane potential evolution in single neurons in computational neuroscience (for reviews, see [20, 21]). The model has some biophysical realism while still maintaining mathematical simplicity. The simplest LIF model is an Ornstein–Uhlenbeck (OU) process with constant conductance, leak potential, and diffusion coefficient. More biophysical realism can be obtained by allowing for postspike currents generated by past spikes [22]. Here we use postspike currents generated via three types of kernels [23, 24]: bursting, decaying, and delaying kernel, all modeled by the difference between two decaying exponentials, but any kernel could be used.
Temporal Stimulus
Constant stimuli are simple to handle and are widely used in both experiments and modeling work. However, real world stimuli are generally time varying. If they for example contain oscillatory components, the generated spike trains might also contain oscillations in the firing rates. Here we use three types of stimuli: oscillatory stimuli described by sinusoidal functions, pulsing stimuli modeled by piecewise constant functions, and stochastic stimuli described by OU processes.
Method Summary
We combine the models describing neuronal response to a plurality of stimuli, namely the probabilitymixing model and the responseaveraging model, with the LIF framework, for different types of stimuli and response kernels. Parameter estimation is done by maximum likelihood using firstpassage time probabilities of diffusion processes [25]. We solve the firstpassage time problem by numerically solving either a partial differential equation (PDE), the Fokker–Planck equation, or an integral equation (IE), the Volterra integral equation. Numerical solutions of these equations have been extensively explored and applied in the computations of neuronal spike trains [26–28]. Inspired by these previous studies, we apply four numerical methods, including two Fokker–Planck related PDEs and two kinds of Volterra IEs, and compare the performance of the four methods. We also describe and compare two alternative methods for maximizing the likelihood function of the probabilitymixing model, which are direct maximization of the marginal likelihood and the expectation–maximization (EM) algorithm. Finally, we show that the probabilitymixing model and the responseaveraging model can be distinguished in the LIF framework, by comparing parameter estimates and through uniform residual tests.
Leaky IntegrateandFire Model with Stimuli Mixtures
The evolution of the membrane potential is described by the solution to the following stochastic differential equation:
where $t_{j}^{+}$ denotes the right limit taken at $t_{j}$. The drift term $b(\cdot)$ contains three currents: the leak current $\gamma(X(t)\mu)$, where γ is the decay rate and μ is the reversal potential, the stimulusdriven current $I(t)$, and the postspike current $H(t)$. The potential $X(t)$ evolves until it reaches the threshold, $x_{\mathrm{th}}$, where it resets to $x_{0}$. Since the membrane potential $X(t)$ is not observed, but only the spike times $d=(t_{1}, t_{2}, \ldots)$, we can use any values for threshold and reset suitable for the numerical calculation. The noise is described by the standard Wiener process, $W(t)$, and the diffusion parameter, σ. The interspike intervals (ISIs) are defined by $t_{j+1}t_{j}$.
The stimulus current $I(t)$ is shaped from the external stimulus current through a stimulus kernel $k_{s}(t)$ as $I(t)=\int_{\infty}^{t}k_{s}(ts)S(s)\, ds$, where $S(s)$ denotes the external current at time s. Similarly, the postspike current arises from past spikes through a response kernel $k_{h}(t)$ by $H(t)=\int_{\infty}^{t}k_{h}(ts)\mathbb{I}(s)\, ds$. Here $\mathbb{I}(s)=\sum_{\tau\in d}\delta(s\tau)$ describes the spike train, where $\delta(\cdot)$ denotes the Dirac delta function.
In this work, the stimulus kernel is assumed without memory, such that $k_{s}(t) = \delta(t)$. Then the stimulus current $I(t)$ is completely determined by the stimulus at time t, e.g., $I(t)=S(t)$. The response kernel is assumed to be the difference of two exponentials decaying over time,
with four positive parameters, $\eta= (\eta_{1},\eta_{2},\eta_{3},\eta_{4})$. By adjusting the parameters, different kernels are obtained. Note that in practice the four parameters are not identifiable, because different parameter sets can result in very similar kernels. Therefore, when we later verify parameter estimates we will not check each individual estimate, but only plot the estimated shape of the kernel function, which is the quantity of interest.
Three types of kernels are used, shown in the left panels of Fig. 1. The bursting kernel is characterized by being positive in the beginning, then turning negative, and finally converging toward 0, which happens when $\eta_{1}>\eta_{3}$ and $\eta_{2}>\eta_{4}$. It follows that the most recent spikes have excitatory effects for the current spike probability, but the accumulation of past spikes has inhibitory effects, resulting in rhythmic spiking with bursts. The decaying kernel only has one negative exponential by setting $\eta_{1} = 0$. The parameters $\eta_{3}$ and $\eta_{4}$ are small such that the inhibitory effects are small but longlasting, making the firing rate decay slowly over time. The delaying kernel has parameters $\eta _{1} < \eta_{3}$ and $\eta_{2} < \eta_{4}$. It is negative in the beginning, then turns positive, and finally converges to 0. The most recent spikes have inhibitory effects, neutralized later on by the accumulation of excitatory effects, resulting in delaying the immediate formation of a new spike after a spike, preventing short ISIs, which models the refractory period. In the center panels example spike trains for the different kernels and different stimuli are illustrated.
Current from Stimulus Mixture
Suppose that inside the receptive field of the neuron there are at least two separated nonoverlapping stimuli, which we will call a stimulus mixture. According to the probabilitymixing model [9], the neuron responds to only one stimulus at any given time with certain probabilities. Thus, for a total of K stimuli, the stimulusdriven current, $I(t)$, follows a probability mixture:
for $k=1,\ldots,K$ and $\sum_{k=1}^{K}\alpha_{k}=1$. Recall that the stimulus kernel $k_{s}(t) = \delta(t)$ and thus, the current caused by the kth stimulus $I_{k}(t)=S_{k}(t)$. According to the responseaveraging model [11], the current is a weighted average of all stimuli currents:
The leak current and the spike response current do not depend on the stimuli.
In the top panels of Fig. 1 three types of stimuli are illustrated. A sinusoidal stimulus is defined by
with four parameters $s_{\mathrm{sin}}=(s_{1},s_{2},s_{3},s_{4})$ describing the stimulus. Note that it also covers a constant stimulus for $s_{1}=0$. A piecewise constant stimulus is defined by
with parameters $s_{pw} = (s_{1}, s_{2}, \ldots, s_{n}, t_{1}, t_{2}, \ldots, t_{n+1}) $. A stochastic stimulus is given by an OU process described by the SDE:
with two parameters $s_{\mathrm{OU}} = (s_{1}, s_{2})$. We assume throughout that the stimuli currents are known. Spike patterns from combinations of different types of stimuli and response kernels are shown in Fig. 1. Clear bursting, decaying and delaying effects can be seen.
Two example spiking patterns together with their voltage traces generated from either a sinusoidal or a constant stimulus together with a bursting postspike kernel are shown in Fig. 2. There are bursts of spikes occasionally even under constant stimulus caused by the bursting response kernel. A sinusoidal stimulus causes long bursts, and in addition, the bursting kernel causes a clear separation of small burst periods also within the long bursting period.
Maximum Likelihood Estimation Using FirstPassage Time Probabilities
Our objective here is to estimate the parameters μ and σ from (1), the response kernel function $k_{h}$ in (2) represented by the parameter vector η, and either the probability vector of the stimuli in the mixture, $\alpha= (\alpha_{1}, \ldots, \alpha_{K})$, under the probabilitymixing model, or the vector of weights in the average, $\beta= (\beta_{1}, \ldots, \beta_{K})$, in the responseaveraging model. The estimation of the decay rate γ is difficult when there is no access to the membrane potential, but only spike times are observable, as discussed in [29, 30]. We therefore assume γ is known. The vector of all parameters in the model is thus θ, where $\theta=(\mu, \sigma, \eta, \alpha)$ in the probabilitymixing model, and $\theta =(\mu, \sigma, \eta, \beta)$ in the responseaveraging model. The stimulus is assumed known and the stimulus parameter vector s is therefore not estimated.
A similar LIF model with different stimulus and response kernels on single piecewise constant stimuli was used in Paninski et al. [24]. They showed that parameters can be estimated using MLE by solving the Fokker–Planck equation, covering also discussion of nonwhite noise and interneuronal interactions. The model was later applied to experimental data collected from retina of macaque monkeys [31]. Here we estimate parameters in the LIF model for various temporal stimuli and different response kernels, using four different numerical methods to calculate the likelihood function, within the framework of either the probabilitymixing or the responseaveraging model.
Suppose we observe N spike trains, $D=(d_{1}, \ldots, d_{N})$, all responding to the same stimulus mixture, where the ith spike train consists of $N_{i}$ spike times, $d_{i}=(t_{1}^{i}, \ldots, t_{N_{i}}^{i})$. The jth ISI of the ith spike train is then given by $t^{i}_{j+1}t^{i}_{j}$. Assume that each measured spike train, i.e., each trial, is sufficiently short, such that, under the probabilitymixing model, the neuron is only responding to one stimulus within the stimulus mixture, not switching the response within the trial.
FirstPassage Times and Probability Distributions
Modeling the spike train data as threshold crossings of the underlying diffusion process representing the unobserved membrane potential belongs to the socalled firstpassage time problem [32, 33]. For models with no effects from past spikes, such that ISIs are assumed i.i.d., one approach is to build loss functions using the Fortet equation [29, 30]; see also [34]. A more general method, which allows for the postspike effects in model (1), is to use maximum likelihood estimation (MLE) from numerical solutions of PDEs or IEs for the conditional distribution of the spike times or equivalently, the ISIs.
We use the following notation for the probability density functions (PDFs) and cumulative distribution functions (CDFs) of interest:
All the above distributions depend on the spike history up to time t, denoted by $\mathcal{H}_{t}$, the parameter vector θ and the stimulus $S(t)$. In the following, we sometimes suppress these dependencies in the notation for readability. We write $g_{k} (t;\theta)=g(t  \mathcal{H}_{t},\theta ,S_{k}(t)) $ for the probability density of the spike time when the neuron is only presented with the single stimulus k.
The probability that the neuron has not yet fired at time t, $1G(t)$, is equal to the probability that the membrane potential has not yet reached $x_{\mathrm{th}}$, $F(x_{\mathrm{th}}, t)$. Thus, the probability density of a spike time is given by [24, 27, 35]
The solution of the Fokker–Planck equation provides $f(x,t)$ and $F(x,t)$, and therefore also $g(t)$. The solution of the Volterra integral equation directly provides $g(t)$ [36]. Calculating $g(t)$ enables us to do MLE, as explained in Sects. 3.5 and 3.6 below.
Fokker–Planck Equation
The PDF of $X_{t}$ in Eq. (1) with a resetting threshold, $f(x,t)$, solves the Fokker–Planck equation, defined by the following PDE [21, 27, 33]:
with absorbing boundary condition $f(x_{\mathrm{th}}, t)=0$ and initial condition $f(x, 0)=\delta(xx_{0})$. To solve the equation numerically we also impose a reflecting boundary condition at a small value $x=x^{}$, where the flux equals 0: $J(x^{}, t) = b(x^{},t)f(x^{},t) + \sigma^{2} \partial_{x}f(x^{},t) / 2=0$. We call this method the Fokker–Planck PDF method.
Another approach is to formulate the PDE for the CDF, i.e., $F(x,t)$ [27, 35] (see Appendix A.2):
with equivalent boundary conditions: $\partial_{x}F(x_{\mathrm{th}}, t)=0$, $F(x^{}, t)=0$, and initial condition: $F(x, 0)=H(xx_{0})$, where $H(\cdot)$ is the Heaviside step function. This is then called the Fokker–Planck CDF method.
Both PDEs are solved numerically using the Crank–Nicholson finite difference method, together with the Thomas algorithm efficiently solving tridiagonal systems [37]. Whichever method we use, we can always obtain the PDF (CDF) from the CDF (PDF) by numerical differentiation (integration).
Volterra Integral Equation
The firstkind Volterra IE (Fortet equation) combines the firstpassage time PDF $g(t)$ with the thresholdfree membrane potential PDF $f^{*}(x, t  v, s)$ using the law of total probability [29, 30]:
For the OU model (1), the thresholdfree PDF $f^{*}(x, t  v, s)$ is Gaussian [33, 38]:
with mean
and variance
The total current is denoted by $I_{\mathrm{total}}(t) = \gamma\mu+I(t)+H(t)$.
The initial condition for the IE is $g(0)=0$. Using this, we can solve the equation recursively and obtain $g(t)$.
The secondkind Volterra IE is defined by [39]
where
A modification of $\psi(x, t  v, s)$ is proposed to avoid a singularity when $t \to s$ [36, 39] (see Appendix A.3):
The second Volterra IE can also be solved numerically. It requires more computation time than the firstkind, but has higher accuracy.
Computational Time Complexity
For both the Fokker–Planck PDE and the Volterra IE methods, the time complexity is directly related to the grid size for the numerical solution. Specifically, suppose that the grid size of the time discretization is n and the size of the space discretization is m. Then the Fokker–Planck method has complexity on the order of $O(m n)$ and the Volterra method is on the order of $O(n^{2})$ (native implementation requires $O(n^{3})$, but techniques are applied to reduce the complexity to $O(n^{2})$; see [36]). Furthermore, the computation is largely affected by the response kernel used. A discretization is applied to approximate the nonlinear kernel by a piecewise constant function with sufficiently small segmentation length. The values of the constant segments are calculated and stored in a data vector when the parameters are updated. Then inside an optimization loop, the kernel function is evaluated by referring to this data vector.
Marginal Likelihood of the ProbabilityMixing Model
Under the probabilitymixing model, the marginal likelihood function of the ith spike train $d_{i}=(t_{1}^{i}, \ldots, t_{N_{i}}^{i})$ for a mixture of K stimuli is given by
and thus the marginal loglikelihood of all N spike trains $D=(d_{1}, \ldots, d_{N})$ is
Marginal refers to the observed data D; see Sect. 3.5.1 below for a definition of the full data. MLEs are then obtained by maximizing (19). The loglikelihood function consists of logarithms of sums, and the calculations are prone to encounter numerical over or underflow issues. To overcome this, we apply the logsumexp formula [37].
Optimizing the Likelihood Using the ExpectationMaximization Algorithm
As an alternative to optimizing directly the loglikelihood function (19), the EM algorithm [40] is well suited to solve optimization problems for mixture models and is simple to implement. The EM algorithm treats the unknown stimulus mixture component which the neuron responds to as unobserved data, or latent variables. We write $Y=(y_{1}, \ldots, y_{N})$ where $y_{i} \in\{1, 2, \ldots, K\}$, for the latent variables indicating which single stimulus each spike train is responding to. The full data then include both the observed spike trains D and the unobserved stimuli response Y.
The EM algorithm is an iterative procedure. In each iteration, the expectation of the full data loglikelihood conditional on the parameters from the previous iteration, is maximized to obtain the optimal parameters for the current iteration. The algorithm runs until convergence, i.e., the difference of parameter estimates is sufficiently small between two adjacent iterations. We use the notation θ for the current parameter to estimate, and $\theta_{1}$ for the parameter estimated in the previous iteration, and likewise for the components of the probability vector α, i.e., $\alpha_{k}$ and $(\alpha_{k})_{1}$.
In each iteration, the conditional expectation of the full data loglikelihood is (see Appendix A.1 for the derivation),
where the conditional probability is obtained using the Bayes formula:
The EM algorithm requires the calculation of the likelihood of the spike train for all components in the mixture. Thus, the EM algorithm has (approximately) the same time complexity regarding the number of evaluations of density functions as the calculation of the marginal likelihood.
Likelihood of the ResponseAveraging Model
In the responseaveraging model, the neuron responds to a weighted average of stimuli, and the model does not follow a probability mixture. The likelihood is given by
where $g(t)$ is now the probability density of spiking at time t when the neuron is responding to a weighted average of all K stimuli, $\sum_{k=1}^{K}\beta_{k}S_{k}(t)$.
Model Checking: Uniformity Test
The goodnessoffit can be verified by uniformity tests using the CDF $G(t)$ for all spike times in D. If the model perfectly describes the data, then the residuals
follow a standard uniform distribution, $z_{j}^{i} \sim\mathrm{U}(0,1)$. We then merge all the residuals for a specific model, and test the residuals against the uniform distribution. Quantile–quantile (QQ) plots and the Kolmogorov–Smirnov (KS) test can be employed to check for uniformity.
Simulation Study
To illustrate the approach, we first detail the simulation study of the bursting kernel and the sinusoidal stimulus. Then results using the other types of kernels and stimuli are briefly illustrated and summarized.
Traces from model (1) using the bursting response kernel shown in Fig. 2(a), and one of the two sinusoidal stimuli shown in Fig. 2(b) or a mixture thereof was simulated according to the Euler–Maruyama scheme with a time step size of 0.1 ms. The process was run until reaching the threshold $x_{\mathrm{th}}$ where the time was recorded. The process was then reset to $x_{0}$ and started again, while the stimulus continued without any interruption, and the previously recorded spike times entered in the calculation of the postspike currents. This was continued until the spike train was 4 s long, containing around 60 to 70 spikes. Table 1 shows the values of the parameters used for simulation and numerical computation.
Parameter estimation was split in two, in agreement with how a typical experiment would be conducted. First we simulated spike trains responding to single stimuli. Note that in this case the probabilitymixing and the responseaveraging models are the same, and $\alpha=\beta=1$ are onedimensional. The data set contains 10 spike trains, with five attending the first single stimulus and the other five attending the second single stimulus. Using this data set, we estimated parameters of the response kernel, η, and parameters of the diffusion model, μ and σ.
Second, we simulated spike trains using a mixture of the two sinusoidal stimuli. Two data sets were simulated, one data set consisting of 10 spike trains following the probabilitymixing model, and another data set consisting of 10 spike trains following the responseaveraging model. To check if the two models could be distinguished, we fitted the data using the probabilitymixing model and the responseaveraging model on both data sets, resulting in four combinations. During this stage, we fixed the response kernel parameters η to values estimated in the first step, and estimated again μ, σ, as well as α or β, depending on the model. There are therefore two sets of estimates of μ and σ for each trial. The purpose is threefold; first of all, these parameters might slightly drift in a real neuron when changing the stimulus (even if we do not change them in the simulation); second, it is of interest to understand the statistical accuracy and uncertainty of these parameter estimates when inferred in the two experimental settings; and third, comparing estimates from both single stimulus and stimulus mixtures can serve as model control, as explained below. When fitting the probabilitymixing model on the data generated from this same model, we used both the marginal MLE and the EM algorithm. The above simulation and estimation procedure was repeated 100 times, generating 100 sets of estimates.
The simulation study serves different purposes. First, the four numerical methods to obtain the PDFs of the spike times, namely the first Volterra, second Volterra, Fokker–Planck PDF, and Fokker–Planck CDF, should be evaluated and compared. This is done on single stimulus spike train data. Second, the quality of the parameter estimates should be assessed, as well as how important it is to use the correct model for the estimation. This is conducted using spike trains simulated from stimulus mixtures. Also the performance of the marginal MLE and the EM algorithm in the case of the probabilitymixing models should be compared. Third, it should be evaluated if it is possible to detect which of the two models generated the data. Results from these three analyses are presented in the following.
Numerical Solutions of the Partial Differential and Integral Equations
Figure 3 shows the PDFs of four example ISIs, i.e., for four different histories of past spikes, calculated by the four numerical methods, first Volterra, second Volterra, Fokker–Planck PDF and Fokker–Planck CDF, under single stimulus trials. Time has been set to 0 at the last spike time. The examples are taken from a spike train attending to the single stimulus $s_{1}$. Each column shows one example ISI, with the spike history indicated above the column (with different time axes) and the corresponding sinusoidal stimulus (same time axes as the PDFs), for four different grid sizes. The four boxed panels in each column show the solutions of the PDEs and the IEs for the ISI on top. A reference dashed black line obtained with high accuracy has been added in all panels for comparison. The grid size is given by Δt for the time discretization, and Δx for the space discretization, and varies from row to row. As expected, for large grid sizes (small number of bins), the performance of the four methods differ (see the three lower rows of boxed panels), but the four results converge for decreasing grid sizes (see the upper row of boxed panels). We find that the first Volterra method is more sensitive to the grid size, while the Fokker–Planck PDF method is the most robust. In the parameter estimation below, we use $\Delta t = 0.002~\mbox{s}$ and $\Delta x=0.02$ shown in the row indicated with a star.
Figure 4(a) and (b) show the timeevolving PDF and CDF of $X_{t}$ from the numerical solutions of the Fokker–Planck equation, for the ISI of the first column of Fig. 3. Time has been set to 0 at the last spike time. At 0, the PDF equals the (discretized) Dirac delta function, and the CDF equals the Heaviside step function, since at spike times, the voltage always resets to a fixed value, $x_{0}$. As time increases, the PDF shows how the probability flows out at the threshold; and the CDF at the voltage threshold illustrates the survival probability.
Figure 4(c) shows in the upper panels three examples of spike times PDFs, $g(t)$, and the lower panels show a corresponding example trace for each, plotted on top of their timeevolving PDFs of $X(t)$, $f(x,t)$, as heatimages. The three ISIs are taken from the left, middle left, and middle right panels of Fig. 3.
Results from Single Stimulus Trials
Parameter estimates of μ and σ from the 100 repetitions are shown in Fig. 5 as boxplots. In the lower panels, the time elapsed and the number of loops for optimization are also plotted. The means and standard deviations of parameter estimates are given in Table 2. The first Volterra method is less stable and less accurate, which is expected due to the lower accuracy in solving the spike time PDFs shown in Fig. 3. The second Volterra performs best for the estimation of σ, and the Fokker–Planck PDF performs best for μ, while the Fokker–Planck CDF does not perform as well as any of the two. On the other hand, the first Volterra and the Fokker–Planck CDF are less computational expensive. The Fokker–Planck CDF method is used in later analysis of stimulus mixtures, considering both accuracy and efficiency, though the Fokker–Planck PDF with a finer grid is used when performing KStests for model selection below. We also find that different methods result in different systematic estimation bias. When estimating μ some methods tend to overestimate and others tend to underestimate, whereas when estimating σ all methods have a tendency to overestimate.
In Fig. 6, the 100 estimated response kernels from the four methods are plotted together as colored lines. The parameters of the kernel are in practice not identifiable, so we evaluate by plotting the shape of the kernel function. All methods achieved good results, capturing the overall shape. The two PDE methods obtained slightly better results, whereas the IE methods are systematically biased.
In Fig. 7(a) are QQplots of the uniform residuals calculated using the transformation from Eq. (23) for the four methods. The uniform residuals are pooled together from all 100 repetitions. Again, all four methods are competitive but biased, with a different bias for PDE methods and for IE methods. This bias, arising from the numerical approximations, has to be taken into account when later testing which model generated the data, forcing us to use a finer and computationally more expensive grid size.
Distinguishing Between ResponseAveraging and ProbabilityMixing
The following results show that the two models can be distinguished for parameter values such that the two models are sufficiently different, which will be defined below in Sect. 4.6. Each model is fitted using the Fokker–Planck CDF method, both on data simulated according to the correct model as well as the wrong model. Figure 8 shows the estimation of μ, σ, and α or β, depending on the model, and Table 3 reports the means and standard deviations of estimates. Accurate estimation is achieved only if we apply the correct model to the corresponding data, the wrong model fitted to data generated by the other model clearly shows bad results. This implies that it is important to use the correct model for reliable inference, but we can also use this to distinguish the two models. If estimates of μ and σ change considerably from estimation on single stimulus data to estimation on stimulus mixture data, then one should suspect that the used model is wrong. This is illustrated in Fig. 9, where scatterplots of estimates from stimulus mixture data assuming a specific model is plotted against estimates from single stimulus data. The straight lines are identity lines. When the correct model is used, estimates are clustered around the identity line, but clearly separated away from the identity line if the model used for fitting is wrong. To formalize the model selection procedure, QQ plots of uniform residuals using Eq. (23) from all 100 repetitions are shown in Fig. 7(b), where points away from the identity line indicate the model is wrong. The lines for the wrong model selections are clearly worse than the correct models, but even the correct models show a significant deviation from the identity lines, which would turn out as also the correct model being rejected in a KStest. This is most probably due to the numerical approximations, as also seen in Fig. 7(a). To check this, we conducted the same estimation procedure with the Fokker–Planck PDF method using a finer grid of $\Delta t = 0.0005~\mbox{s}$ and $\Delta x = 0.01$, and repeated for 20 times. Results are reported in Table 4, where it is clear that with a finer grid, the KStest works as desired with high power to detect deviations from the correct model. We suggest that for parameter estimation a very fine grid is not needed, whereas for model control, the numerical approximation of the spike time PDF has to be precise. To conclude, the two models are distinguishable for the parameter settings explored here.
ProbabilityMixing with EM
In the previous section, the marginal MLE was used when fitting the probabilitymixing model. Here we compare the performance of the marginal MLE and the EM algorithm on the probabilitymixing model fitted to the corresponding data. Figure 10 shows scatterplots of estimates obtained by the two methods, and the last two rows in Table 3 show the means and standard deviations. The two methods provide similar results, and have the same accuracy for all three parameters. However, the variance of the EM algorithm is slightly smaller, particularly for α. The computational burden in one loop of the numerical optimization for the two methods is approximately the same.
Generalizations
In this section we only apply the Fokker–Planck CDF method and analyze the model for different types of response kernels and stimuli.
Single stimulus. We analyze nine combinations of response kernels and stimuli. For each combination we simulate 10 spike trains following one single stimulus. Figure 1 shows the combinations and the realizations of spike trains. On these spike trains parameters and response kernels are estimated. The simulations are then repeated 100 times. For the stochastic stimulus, we use a single realization so that the stimulus is identical in all repetitions and the statistical performance of the estimators can be assessed. The estimates of parameters and response kernels are shown in Fig. 11. The estimates using the delay kernel have larger variance, possibly due to our specific choice of kernel parameters that makes the spiking rate less sensitive to stimulus strength (see bottom panels of Fig. 1). The estimates of parameters and kernels for all combinations are acceptable. The parameters used for the response kernels and stimuli are shown in Table 5.
Stimulus mixtures. We use two OU processes as stimuli, and apply all three types of response kernels. The top panels of Fig. 12 show the two stochastic stimuli, and their weighted average. The latter is what neurons respond to according to the responseaveraging model. For each combination, we simulate 10 spike trains, using identical stimuli in each repetition. Results are shown in the left panels of Fig. 13, where both the probabilitymixing (PM) model and the responseaveraging (RA) model are fitted to data generated from both models. When fitting the probabilitymixing model, only the EM algorithm is applied. We employ the same strategy as in the main analysis: we first estimate parameters on data generated from single stochastic stimuli, and then fix the response kernel and estimate the other parameters on data generated from stochastic stimulus mixture. The results for all three kernels on a stochastic stimulus mixture are the same as the main analysis above using the bursting kernel and sinusoidal stimuli: we obtain accurate estimates of all parameters only if we apply the correct model to the corresponding data.
State dependent noise. Finally, the diffusion term in the LIF model (1) was modified to include the square root of $X(t)$ as in the Feller model [41–43], yielding
The same analysis as in the previous section was repeated using two OU processes as stimuli and three types of response kernels. Results are shown in the right panels of Fig. 13, which are almost the same as the results using the original LIF model shown in the left panels.
Model selection. In stimulus mixture analysis, model selection is conducted for both the OUbased and the Fellerbased LIF models. In Fig. 14 we compare the deviance information criterion (DIC) between the correct and the incorrect model. The DIC difference equals −2 times the difference of the loglikelihoods, because the two models have the same number of parameters. The correct model is strongly supported in every case. Table 6 shows rejection ($p<0.05$) ratios using KStests for all combinations in the stimulus mixture analysis. We also tried other pairs of stochastic stimulus mixtures (results not shown) and found that the more similar the two stimuli are, the more the rejection ratios tend to decrease, whether using the correct or the incorrect model, and if two stimuli are more different, all rejection ratios tend to increase, including rejections of the true model. Finally, as expected the KStest rejection ratio is sensitive to data size: using smaller number of spike trains reduces the rejection ratio. In particular, the rejection of fitting the PM model to RA data (PMRA) with the decay kernel, and fitting the RA model to PM data (RAPM) with the delay kernel, is extremely sensitive to similarity of stimuli and data size. This makes the KStests less robust. Thus, we recommend using the KStests together with other model selection methods for more reliable conclusions.
Model Selection Accuracy
The results above show that parameters can be inferred and the correct model can be determined for the specific parameter choices used in the simulations. Here we explore the model selection accuracy for varying parameter values including the weight, stimulus dissimilarity, stimulus strength and number of spike trains. In the following analysis, we use the bursting response kernel, a mixture of two stochastic stimuli and the Fokker–Planck CDF method. To introduce a stimulus dissimilarity, a sinusoidal perturbation is added to one of two identical OU processes, $\tilde{S}(t) = S(t) + a\sin(10t)$, where t is measured in seconds and a is the perturbation size. To change the stimulus strength, the OU processes are linearly scaled using $\tilde{S}(t) = bS(t)$ where b denotes the scaling size.
We focus on model selection accuracy without reporting parameter estimates. Model selection is denoted successful if the DIC for the true model is more than 2 smaller than the wrong model. This is the value suggested in [44] to indicate substantial empirical support for the selected model compared to the other model. Figure 15 explores model selection results as a function of parameter values, and provides an overall picture how these parameters affect model selection. The conveyed message verifies our intuition: model selection is more reliable if the stimuli are more different, the weights are more even, the stimulus difference is stronger or the sample size is larger (a larger number of spike trains). The first three make the responses of the two models more different, and the last provides more statistical power. Furthermore, the thresholds of these parameter values in terms of successful model selection are surprisingly low. A weight value of 0.2 and a perturbation size around 6 (i.e., around 10 % of the stimulus strength) are sufficient to ensure a decent selection. For a more even weight of 0.4, only a perturbation size of 3 (around 5 %) is necessary to provide good model selection for both RA and PM data. Indeed, 5 % perturbation in a stimulus is undetectable by a simple graphical inspection of the spike trains (bottom panels in the figure), but the finer statistical analysis can detect the difference between the models. Even with small weight and stimulus dissimilarity, model selection can be improved by using stronger stimuli or enlarging the sample size with more spike trains. Note that these analyses are easily generalized for a given problem at hand by first estimating the response kernel of a given neuron under a given stimulus, and then simulating data with this response kernel and stimulus, varying parameters of the two models. That will indicate for which parameter values the model selection can be trusted.
Discussion
Estimation of the Decay Rate
We have shown that parameter inference can be successfully conducted for the probabilitymixing and the responseaveraging model on corresponding data incorporating different response kernels for LIF neurons. The decay rate γ has been assumed known. We also attempted to estimate all parameters including γ (results not shown), but the optimization often finds local minima and leads to low accuracy. The estimation of γ seems to suffer from identifiability problems, due to only observing spike times and not the underlying membrane potential. Nevertheless, to estimate γ we may fix it at different values and run the optimization procedure for the rest of the parameters, and then compare the model fit for the different γ values. This is not pursued here.
Bias of the Numerical Methods
We found that the parameter estimates and the QQ plots from the four methods suffer from over and underestimation issues. The MLE is based on the firstpassage time probabilities, which we obtain using four numerical methods, Fokker–Planck PDF, Fokker–Planck CDF, first Volterra and second Volterra. Because of the intrinsic differences between these methods, discretization leads to different biases of the calculated spike time PDFs. As seen from Fig. 3, when increasing the grid size, the first Volterra and the Fokker–Planck CDF methods tend to increase the PDF value in the beginning of the ISI, while the second Volterra tends to slightly decrease it. The low accuracy of the first Volterra method arises from a singularity of $f^{*}(x,tv,s)$ when $v=x$ and $t\to s$. However, by removing the singularity the second Volterra is more accurate for numerical computations.
Efficiency of Numerical Methods
We choose the Fokker–Planck CDF method for estimation of mixtures, because it achieves a wellbehaved balance between accuracy and computational burden. Table 2 also shows that this method has the smallest variance on parameter estimates.
Although the first Volterra method is the computationally fastest, it has poor convergence, as seen from the number of loops in the bottom right panel in Fig. 5. Overall, the PDE methods tend to converge faster than the IE methods.
The performance is affected by the grid size. The estimates in Fig. 5 uses $\Delta t=0.002~\mbox{s}$ and $\Delta x = 0.02$. This discretization setting generally achieves acceptable computation times and statistical accuracy, but as shown in Sect. 4.3, a finer grid is needed for model selection. One may tweak the grid sizes in order to obtain separate settings for each of the four methods to obtain comparable efficiency and accuracy. However, considering that in practical data the errors come from many sources like measurement errors and approximate modeling, the optimal discretization on simulated data is of less importance and interest. Thus, we suggest the current setting as providing a generally good balance, and we will not investigate this further.
EM for Better Estimation of Mixture Probabilities
Figure 10 shows that the estimation of the mixture probability parameter α is slightly less stable for the marginal MLE than for the EM algorithm. The EM algorithm implicitly enlarges the data size by using latent variables for the mixture probability, referred to as data augmentation [45]. The completedata loglikelihood function used in the M step does not contain logarithms of sums, making the estimation more stable. By iteratively updating the expectation in the E step and obtaining stable estimation in the M step, the EM algorithm improves the stability when inferring the probabilitymixing model, and in general, mixture models.
Although the EM algorithm performs better, it is only slightly better for α and the improvement is negligible or nonexistent for μ and σ. This is because we only use two components in the mixture, which does not generate notable differences between the marginal MLE and the EM algorithm. A larger advantage of the EM algorithm can be expected under more complex stimulus mixtures. Furthermore, the response kernel is fixed, and the two methods use the same initial values for μ and σ (obtained from the single stimulus trials) in the optimization procedure, which also contributes to the similarity of results between the two methods.
Extension of Noise
In this paper a onedimensional stochastic differential equation model driven by a Wiener process for the membrane potential has been considered, which arises as an approximation to Stein’s model [46], leading to the OU model, or to the extended model including reversal potentials, proposed by Tuckwell [41], leading to the Feller model [42]. The model does not take into account specific dynamics of synaptic input or ion channels, which affects the dynamics, see, e.g., [47–49], where the autocorrelations of the synaptic input is shown to be an important factor. This is partially accounted for in our model through the memory kernels. Incorporating autocorrelated synaptic input or ion channel dynamics would lead to a multidimensional model. In principle, the firstpassage time probabilities could then be obtained by solving multidimensional Fokker–Planck equations [24]. However, the statistical problem is further complicated by the incomplete observations, since typically only the membrane potential is measured, as studied in [50]. In even more realistic models nonGaussian noise can be included, for example combining the diffusion process with discrete stochastic synaptic stimulus arrivals, leading to a jumpdiffusion process, whose Fokker–Planck equation is generalized as an integrodifferential equation [51]. Solving multidimensional or generalized Fokker–Planck equations are significantly more expensive and exact MLE becomes less appealing. This is not pursued here.
The ResponseAveraging Model
The responseaveraging model used here is slightly different from the responseaveraging model by Reynolds et al. [8]. In our model the average is calculated over the currents for each stimulus, while in their model the average is calculated over the firing rates for each stimulus. The reason is as follows. In a spiking neuron model like the LIF model, the generation of each single spike rather than the firing rate is modeled. Whether in the probabilitymixing model, the responseaveraging model or any other model, the spiking is affected by stimuli only through currents. Our model is formulated based on this idea, using a unified spikegenerating mechanism for both the probabilitymixing and the responseaveraging model. The resulting firing rate averaged over a time window from a weighted average of single stimuli, will also be a weighted average of firing rates from single stimuli but with different weights. Our responseaveraging model therefore provides the same consequence in terms of firing rates as the model by Reynolds et al.
Model Selection of ProbabilityMixing and ResponseAveraging
We finish by addressing the possible model selection methods for probability mixing and response averaging on real data. We have shown that the probabilitymixing and the responseaveraging models can be clearly distinguished if fitted on simulated data. However, real data will likely not follow exactly one of the two models, but one of the models might give a better description of the data than the other. We might need to design more sophisticated methods for model checking and model selection. Apart from conducting uniformity tests based on the uniform residuals from the transformation (23), such as the KStest as we have done, we can compare the Akaike information criterion (AIC) and Bayesian information criterion (BIC) between the two models. We have used a unified DIC method due to equal number of parameters, but AIC and BIC should be used if two models have differing numbers of parameters. Furthermore, the model can also be checked by evaluating the performance of prediction (of spikes) and decoding (of stimuli), using methods such as root mean squared deviation (RMSD) between empirical and predicted values. See [19] for the use of these approaches to distinguish between the two models on experimental data from the middle temporal visual area of rhesus monkeys.
Abbreviations
 LIF:

Leaky integrateandfire
 PDE:

Partial differential equation
 IE:

Integral equation
 EM:

Expectationmaximization
 ISI:

Interspike interval
 MLE:

Maximum likelihood estimation/estimator
 PDF:

Probability density function
 CDF:

Cumulative distribution function
 QQ:

Quantile–quantile
 KS:

Kolmogorov–Smirnov
 DIC:

Deviance information criterion
 AIC:

Akaike information criterion
 BIC:

Bayesian information criterion
References
 1.
Gilmore RO, Hou C, Pettet MW, Norcia AM. Development of cortical responses to optic flow. Vis Neurosci. 2007;24:845–56.
 2.
Kanwisher N, Yovel G. The fusiform face area: a cortical region specialized for the perception of faces. Philos Trans R Soc Lond B. 2006;361:2109–28.
 3.
Smith AT, Singh KD, Williams AL, Greenlee MW. Estimating receptive field size from fMRI data in human striate and extrastriate visual cortex. Cereb Cortex. 2001;11:1182–90.
 4.
Gattass R, NascimentoSilva S, Soares JGM, Lima B, Jansen AK, Diogo ACM, Farias MF, Marcondes M, Botelho EP, Mariani OS, Azzi J, Fiorani M. Cortical visual areas in monkeys: location, topography, connections, columns, plasticity and cortical dynamics. Philos Trans R Soc Lond B. 2005;360:709–31.
 5.
Kanwisher N, Yovel G. The fusiform face area: a cortical region specialized for the perception of faces. Philos Trans R Soc Lond B, Biol Sci. 2006;361(1476):2109–28.
 6.
Gilmore RO, Hou C, Pettet MW, Norcia AM. Development of cortical responses to optic flow. Vis Neurosci. 2007;24(6):845–56.
 7.
Freeman J, Simoncelli EP. Metamers of the ventral stream. Nat Neurosci. 2011;14(9):1195–201.
 8.
Reynolds JH, Chelazzi L, Desimone R. Competitive mechanisms subserve attention in macaque areas V2 and V4. J Neurosci. 1999;19:1736–53.
 9.
Bundesen C, Habekost T, Kyllingsbæk S. A neural theory of visual attention: bridging cognition and neurophysiology. Psychol Rev. 2005;112(2):291–328.
 10.
Bundesen C, Habekost T. Principles of visual attention: linking mind and brain. Oxford: Oxford University Press; 2008.
 11.
Reynolds JH, Heeger DJ. The normalization model of attention. Neuron. 2009;61(2):168–85.
 12.
Zoccolan D, Cox DD, DiCarlo JJ. Multiple object response normalization in monkey inferotemporal cortex. J Neurosci. 2005;25(36):8150–64.
 13.
Recanzone GH, Wurtz RH, Schwarz U. Responses of MT and MST neurons to one and two moving objects in the receptive field. J Neurophysiol. 1997;78(6):2904–15.
 14.
Britten KH, Heuer HW. Spatial summation in the receptive fields of MT neurons. J Neurosci. 1999;19(12):5074–84.
 15.
Nandy AS, Sharpee TO, Reynolds JH, Mitchell JF. The fine structure of shape tuning in area V4. Neuron. 2013;78(6):1102–15.
 16.
Busse L, Wade AR, Carandini M. Representation of concurrent stimuli by population activity in visual cortex. Neuron. 2009;64(6):931–42.
 17.
MacEvoy SP, Tucker TR, Fitzpatrick D. A precise form of divisive suppression supports population coding in the primary visual cortex. Nat Neurosci. 2009;12(5):637–45.
 18.
Lee J, Maunsell JH. A normalization model of attentional regulation of single unit responses. PLoS ONE. 2009;4:e4651.
 19.
Li K, Kozyrev V, Kyllingsbæk S, Treue S, Ditlevsen S, Bundesen C. Neurons in primate visual cortex alternate between responses to multiple stimuli in their receptive field. Submitted. 2016.
 20.
Burkitt AN. A review of the integrateandfire neuron model: I. Homogeneous synaptic input. Biol Cybern. 2006;95(1):1–19.
 21.
Sacerdote L, Giraudo MT. Stochastic integrate and fire models: a review on mathematical methods and their applications. In: Bachar B, Batzel JJ, Ditlevsen S, editors. Stochastic biomathematical models with applications to neuronal modeling. New York: Springer; 2013. p. 99–148. (Lecture notes in mathematics, vol. 2058).
 22.
Gerstner W, Kistler WM. Spiking neuron models: single neurons, populations, plasticity. Cambridge: Cambridge University Press; 2002.
 23.
Gerstner W, Van Hemmen JL, Cowan JD. What matters in neuronal locking? Neural Comput. 1996;8(8):1653–76.
 24.
Paninski L, Pillow JW, Simoncelli EP. Maximum likelihood estimation of a stochastic integrateandfire neural encoding model. Neural Comput. 2004;16(12):2533–61.
 25.
Sirovich L, Knight B. Spiking neurons and the first passage problem. Neural Comput. 2011;23(7):1675–703.
 26.
Russell A, Orchard G, Dong Y, Mihalas S, Niebur E, Tapson J, EtienneCummings R. Optimization methods for spiking neurons and networks. IEEE Trans Neural Netw. 2010;21(12):1950–62.
 27.
Iolov A, Ditlevsen S, Longtin A. Fokker–Planck and Fortet equationbased parameter estimation for a leaky integrateandfire model with sinusoidal and stochastic forcing. J Math Neurosci. 2014;4(1):4.
 28.
Dong Y, Mihalas S, Russell A, EtienneCummings R, Niebur E. Parameter estimation of historydependent leaky integrateandfire neurons using maximumlikelihood methods. Neural Comput. 2011;23(11):2833–67.
 29.
Ditlevsen S, Lansky P. Parameters of stochastic diffusion processes estimated from observations of firsthitting times: application to the leaky integrateandfire neuronal model. Phys Rev E. 2007;76(4):041906.
 30.
Ditlevsen S, Ditlevsen O. Parameter estimation from observations of firstpassage times of the Ornstein–Uhlenbeck process and the Feller process. Probab Eng Mech. 2008;23(2):170–9.
 31.
Pillow JW, Paninski L, Uzzell VJ, Simoncelli EP, Chichilnisky EJ. Prediction and decoding of retinal ganglion cell responses with a probabilistic spiking model. J Neurosci. 2005;25(47):11003–13.
 32.
Redner S. A guide to firstpassage processes. Cambridge: Cambridge University Press; 2001.
 33.
Karlin S, Taylor HM. A second course in stochastic processes. vol. 2. Houston: Gulf Pub; 1981.
 34.
Lansky P, Ditlevsen S. A review of the methods for signal estimation in stochastic diffusion leaky integrateandfire neuronal models. Biol Cybern. 2008;99:253–62.
 35.
Hurn AS, Jeisman J, Lindsay K. ML estimation of the parameters of SDEs by numerical solution of the Fokker–Planck equation. In: MODSIM 2005: international congress on modelling and simulation: advances and applications for management and decision making. 2005. p. 849–55.
 36.
Paninski L, Haith A, Szirtes G. Integral equation methods for computing likelihoods and their derivatives in the stochastic integrateandfire model. J Comput Neurosci. 2008;24(1):69–79.
 37.
Press WH. Numerical recipes: the art of scientific computing. 3rd ed. Cambridge: Cambridge University Press; 2007.
 38.
Ditlevsen S, Lansky P. Estimation of the input parameters in the Ornstein–Uhlenbeck neuronal model. Phys Rev E. 2005;71:011907.
 39.
Buonocore A, Nobile AG, Ricciardi LM. A new integral equation for the evaluation of firstpassagetime probability densities. Adv Appl Probab. 1987;19:784–800.
 40.
Dempster AP, Laird NM, Rubin DB. Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc, Ser B, Methodol. 1977;39:1–38.
 41.
Tuckwell HC. Synaptic transmission in a model for neuronal activity. J Theor Biol. 1979;77:65–81.
 42.
Lansky P, Lanska V. Diffusion approximations of the neuronal model with synaptic reversal potentials. Biol Cybern. 1987;56:19–26.
 43.
Ditlevsen S, Lansky P. Estimation of the input parameters in the Feller neuronal model. Phys Rev E. 2006;73:061910.
 44.
Burnham KP, Anderson DR. Model selection and multimodel inference: a practical informationtheoretic approach. New York: Springer; 2003.
 45.
Hastie T, Tibshirani R, Friedman J, Hastie T, Friedman J, Tibshirani R. The elements of statistical learning. vol. 2. New York: Springer; 2009.
 46.
Stein RB. A theoretical analysis of neuronal variability. Biophys J. 1965;5:173–95.
 47.
Brunel N, Sergi S. Firing frequency of leaky integrateandfire neurons with synaptic current dynamics. J Theor Biol. 1998;195(1):87–95.
 48.
Moreno R, de la Rocha J, Renart A, Parga N. Response of spiking neurons to correlated inputs. Phys Rev Lett. 2002;89:288101.
 49.
MorenoBote R, Parga N. Role of synaptic filtering on the firing response of simple model neurons. Phys Rev Lett. 2004;92:028102.
 50.
Ditlevsen S, Samson A. Estimation in the partially observed stochastic Morris–Lecar neuronal model with particle filter and stochastic approximation methods. Ann Appl Stat. 2014;8(2):674–702.
 51.
Hanson FB. Applied stochastic processes and control for jumpdiffusions: modeling, analysis, and computation. vol. 13. Philadelphia: SIAM; 2007.
Acknowledgements
The work is part of the Dynamical Systems Interdisciplinary Network, University of Copenhagen.
Author information
Additional information
Competing Interests
The authors declare that they have no competing interests.
Authors’ Contributions
KL, SD: Conceived and designed the research. KL: Performed all analyses, simulations and figures. All authors interpreted the results. All authors wrote the paper.
Appendix
Appendix
A.1 The EM Algorithm for Stimulus Mixtures
The complete likelihood for the full data $(D, Y)$ is
A.1.1 Expectation Step
The expectation of the full data loglikelihood conditional on the previous parameters $\theta_{1}$ and the observed data D is
The conditional probability of the latent variable is obtained from Bayes formula:
A.1.2 Maximization Step
In the Maximization step, the new parameter θ is obtained by optimizing the conditional expectation $Q(\theta \theta_{1})$. A new iteration is then initiated using θ as the previous parameter. The loops run until θ and $\theta_{1}$ are sufficiently close.
A.2 The Fokker–Planck CDF Method
Plugging $f(x,t) = \partial_{x}F(x,t)$ into the Fokker–Planck PDE
gives
Integrating both sides w.r.t. x yields
Recall the lower reflecting boundary at $x=x^{}$, where $F(x^{}, t) = 0$ and thus $\partial_{t}F(x,t)  _{x=x^{}} = 0$. We also see that the flux equals 0, so
Thus, $C(t)=0$, and we obtain the PDE for $F(x,t)$:
A.3 Removing the Singularity in the SecondKind Volterra Equation
The singularity arises because $f^{*}(x,t  v, s)$ diverges when $v=x$ and $t\to s$. This can be resolved by the method proposed by [39]. Note that the substitution of $\psi(x, t  v, s)$ in Eq. (15) with any function of the form
will also satisfy the second Volterra equation, since
where we have applied the first Volterra equation, Eq. (11).
We then set $\phi(x, t v, s)$ to 0 as $t \to s$ by letting
Then we have
and the singularity will be removed when $v=x$ and $t\to s$.
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
About this article
Received
Accepted
Published
DOI
Keywords
 Probabilitymixing
 Responseaveraging
 Parameter estimation
 Model selection
 Visual attention