Fokker–Planck and Fortet Equation-Based Parameter Estimation for a Leaky Integrate-and-Fire Model with Sinusoidal and Stochastic Forcing

Abstract Analysis of sinusoidal noisy leaky integrate-and-fire models and comparison with experimental data are important to understand the neural code and neural synchronization and rhythms. In this paper, we propose two methods to estimate input parameters using interspike interval data only. One is based on numerical solutions of the Fokker–Planck equation, and the other is based on an integral equation, which is fulfilled by the interspike interval probability density. This generalizes previous methods tailored to stationary data to the case of time-dependent input. The main contribution is a binning method to circumvent the problems of nonstationarity, and an easy-to-implement initializer for the numerical procedures. The methods are compared on simulated data. List of Abbreviations LIF: Leaky integrate-and-fire ISI: Interspike interval SDE: Stochastic differential equation PDE: Partial differential equation


Introduction
Information processing in the nervous system is carried out by spike timings in neurons. To study the neural code in such a complicated system, a first step is to understand signal processing and transmission in single neurons. Stochastic leaky integrate-and-fire (LIF) neuronal models are a good compromise between biophysical realism and mathematical tractability, and are commonly applied as theoretical tools to study properties of real neuronal systems. A central issue is then to perform statistical inference from experimental data and estimate model parameters. Many electrophysiological experiments on neurons, namely extra-cellular recordings, are only capable of detecting the time of the spike and not the detailed voltage trajectory leading up to the spike. Estimating the parameters of the LIF model from this type of data is equivalent to estimating the parameters of a stochastic model from the statistics of the first-passage times only. A common assumption is that the data are well described by a renewal process, thus basing the statistical inference on the interspike intervals (ISIs), assuming these are realizations of independent and identically distributed random variables. Since only partial information about the process is available, the statistical problem becomes more difficult, and no explicit expression for the likelihood is available.
Different methods have been proposed. In the seminal paper [1], a point process approach is proposed. The spike trains of a collection of neurons are represented as counting processes. Time is discretized and the point processes approximated by 0-1 time series. Then the probability of firing in the next time interval is modeled as a function of the spike history. In this way, maximum likelihood estimation is feasible. External stimuli are not considered. In [2], a numerically involved moment method is developed. It uses the first two moments of the first-passage times of the Ornstein-Uhlenbeck process to a constant threshold, which are given as series expressions, and equates them to their empirical counterparts. In [3,4], certain explicit moment relations derived from the Laplace transform of the first-passage time distribution are applied, but these are only valid under stimulation (supra-threshold regime). In [5], inference is based on numerical inversion of the Laplace transform. In [6], a functional of a three-dimensional Bessel bridge is applied to obtain a maximum likelihood estimator. None of these methods are feasible to extend to the time-inhomogeneous case, which is of our interest. In [7,8], an integral equation is used to derive an estimator in the time-homogeneous setting. This approach is readily extended to time varying input, which we will explore in this paper. Some of the above methods are compared in [9]. Finally, a review of estimation methods is provided in [10].
Many sensory stimuli, like sound, contain an oscillatory component [11,12]. Such inputs will cause oscillating membrane potentials in the neuron, generating rhythmic spiking patterns. The oscillation frequency determines the basic rhythm of spiking, and is considered to be significant for neuronal information processing. The dynamics of periodically forced neuron models have been extensively studied; see [13][14][15][16][17][18] and references therein. Even so, attempts to solve the estimation problem in these nonstationary settings have been rare. One problem is that the ISIs are no longer independent nor identically distributed. In [19], a more complicated model with linear filters is considered, allowing also for the spike history to influence the membrane potential dynamics. The estimation problem is solved through numerical solutions to the Fokker-Planck equation, and it is shown that the log-likelihood is concave, thus ensuring a global maximum; see also [20,21]. Because their model is more involved, some approximations to the solution of the Fokker-Planck equation are applied, to ensure acceptable computing times. We will apply the full Fokker-Planck equation to solve our estimation problem, since the computing time is always lower than 2 seconds for a sample size of 1000 spikes.
In this paper, we thus describe and discuss two methods to estimate parameters of LIF models with the added complexity of a time-varying input current. We assume that the time-varying current is a sinusoidal wave, but we believe that the approaches generalize to an arbitrary periodic forcing with known frequency. One approach relies on the Fortet integral equation, which is readily extended to the time-inhomogeneous case. An advantage of this approach is that if the transition density of the diffusion in the LIF model is known, as is the case for the Ornstein-Uhlenbeck and the Feller model, the computational burden is limited. A second approach involves numerical solution of the Fokker-Planck equation, where the time-dependence is explicitly accounted for. After a numerical differentiation, the likelihood function can be calculated providing the maximum likelihood estimator. Nevertheless, we chose an alternative loss function which seems marginally more robust, directly comparing the survival function provided by the solution of the Fokker-Planck equation with its empirical counterpart. The two approaches give similar results and they are more carefully compared in the supplementary online material. Both methods need sensible starting values for the optimization algorithms, and we provide an easy-to-implement initializer. The estimation procedures are compared on simulated data and we find that both algorithms are able to find estimates close to the true values for several different dynamical regimes. We find that for small sample sizes the Fokker-Planck algorithm can be considered marginally preferable, whereas for larger sample sizes the Fortet algorithm becomes marginally superior. Moreover, at high frequencies of the sinusoidal forcing, the Fortet is better at identifying the parameters, though in general there is less information in the data to distinguish between a constant input and the amplitude of the periodic forcing.

Model
The time evolution of the voltage of a spiking neuron is modelled by a stochastic process, V , given as solution to the following stochastic differential equation (SDE): Here, μ is a bias current acting on the cell, τ is the decay time, A and ω are the amplitude and (angular) frequency of the sinusoidal current acting on the cell, σ is the strength of the stochastic fluctuations, W = {W t } t≥0 is a standard Wiener process, and t + n denotes the right limit taken at t n . A spike occurs when the membrane voltage V (t) crosses a voltage threshold, v th , and then V (t) is instantaneously reset to the resting potential v 0 . The difference between subsequent spike times, J n = t n − t n−1 , is called the interspike interval (ISI).
We will assume that τ is known (but see Sect. 6 for a discussion of the alternative) and nondimensionalize Eq. (1) as follows: where we have defined I n = J n /τ . We can also write the dynamics between two spike times s n and s n+1 in terms of elapsed time since the last spike, s = s − s n , s < I n+1 , dX s = α − X s + γ sin Ω s + φ n ds + βdW s , s = s − s n , φ n = s n mod 2π Ω . ( This form of the dynamics highlights that this is not a renewal process since different trajectories between spikes have different phase shifts φ n = s n modulo 2π/Ω. This will be important in the following discussion. The shape of the ISI distribution depends on the model parameters, and it is natural to divide the parameter space in different regimes characterized by their qualitative behavior. Four distinct parameter regimes will be considered; supra-threshold, critical, subthreshold, and supersinusoidal. To understand the reasoning behind the regime names, observe that in the absence of noise, β = 0, the deterministic model will produce spikes if and only if see the discussion in [14], which can be directly inferred from the solution in Eq. (12) below. In both the supra-threshold and supersinusoidal regimes, α +γ / √ 1 + Ω 2 > 1. The difference between the two is that in the supra-threshold regime the constant bias current alone is sufficient for spikes to occur, also in absence of noise, that is, α > 1.
In the supersinusoidal regime, the sinusoidal current is necessary for spikes to occur in absence of noise, that is, α + γ / √ 1 + Ω 2 > 1 and α ≤ 1. In the critical regime, the sum of the two terms is just barely enough to guarantee deterministic spiking, that is α + γ / √ 1 + Ω 2 ≈ 1. Finally, in the subthreshold regime, there would be no spikes without the noise, α + γ / √ 1 + Ω 2 < 1. Table 1 Table 1. a supra-threshold, b supersinusoidal, c critical, d subthreshold. In the supra-threshold regime spikes occur regularly and often; in the supersinusoidal regime spikes cluster near the peak of the sine wave; in the critical regime they occur less often; and in the subthreshold regime, spikes occur rarely. For all regimes, Ω = 1 Fig. 2 The four different parameter regimes using the parameter values given in Table 1. Illustrated are the probability density functions, g φ m (t), for representative φ m = 2π/Ω × {0, 0.25, 0.5, 0.75}. Varying φ m has, for the most part, the effect of shifting the curves laterally, while varying α, β, γ changes their characteristic form. For all regimes, Ω = 1. a supra-threshold, b supersinusoidal, c critical, d subthreshold With regards to Figs. 2 and 3, it is worth noting explicitly that combinations of noise and sinusoidal forcing can cause firing patterns in which spikes are phase locked, but skip a certain number of cycles. This leads to multimodal ISI densities. There are many different dynamical mechanisms that can yield such patterns, and the  Table 1. Illustrated are the survivor distribution functions,Ḡ φ m (t), for representative φ m = 2π/Ω × {0, 0.25, 0.5, 0.75}. Varying φ m has, for the most part, the effect of shifting the curves laterally, while varying α, β, γ changes their characteristic form. a supra-threshold, b supersinusoidal, c critical, d subthreshold particular correlations between the ISIs will depend on the underlying voltage dynamics (which, in our case, we assume to be given by Eq. (1)); in particular, it may be difficult to distinguish whether the dynamics are subthreshold or supra-threshold, since both can show similar ISI densities; see [22].

Basic ISI Probability Density Functions
Here, we introduce the notation for the probability density, distribution and survival functions of I n , an ISI arising from a trajectory produced by Eq. (3), g φ (τ )dτ := P I n+1 ∈ [τ, τ + dτ )|φ n = φ (probability density), The subscript φ is to stress that g, G, andḠ depend on the value of φ n in Eq. (3). This is the formal statement that in a sinusoidally-driven neuron, the interspike intervals are not identically distributed, and are only independent conditioned on the sinusoidal phase at an interval's onset. Knowing these distributions would provide the likelihood function, offering estimation by the preferred method of choice, the maximum likelihood estimator. Unfortunately, explicit expressions for the ISI distribution are not available except for the special case of γ = 0 and α = 1; see [3]. Different representations of the likelihood function are available though, see [23], one of which we will use below.

Fokker-Planck Equation with Absorbing Boundaries
The Fokker-Planck equation is a partial differential equation (PDE) describing the evolution of the probability density, f (x, t), of X t . For the sinusoidally-forced Ornstein-Uhlenbeck process, Eq. (3), with the threshold x th = 1, the PDE is Due to the reset, we have that at time t = 0, X t = 0 and so for the initial conditions we can write where δ(·) is the Dirac delta function. The spike is represented as a zero boundary condition for f at The natural way of using the Fokker-Planck equation in first-hitting-times problems is as follows. Denote the integral of Equation (7) forms the basis of one of the methods below for estimating the structural parameters from the observed data.
Since Eq. (5) has to be solved numerically, we will need to truncate its domain from below. The most natural way to do this, given the dynamics, is to impose reflecting boundary conditions at some where the probability mass is very small. For the left (lower) limit of the computational domain, we use the formula This choice requires some explanation. In the t → ∞ limit, the distribution of X t in Eq. (3) without thresholding is Gaussian with mean given by Eq. (12) (below) and variance equal to β 2 /2. Thus, to truncate the computational domain for the thresholded process from below, we take the lowest value of the asymptotic mean, α − γ / √ 1 + Ω 2 , then from this we subtract two standard deviations, 2β/ √ 2 and set the result to be the lower bound, x − . Finally, if this value for x − happens to be larger than −0.25, we enforce that x − ≤ −0.25.
Numerical considerations lead us to solve for F , instead of f , since delta functions are difficult to represent in floating point, while the initial conditions for F , the Heaviside step function, H (x), faces no such difficulties [24]. The Heaviside step function is defined to be equal to 0 for x < 0 and to be equal to 1 for x ≥ 0. At this point, we need to derive the PDE for the distribution F , starting from the PDE for the density, f , Eq. (5).
First, at the lower boundary, it is intuitive that the distribution should be zero, Inside the domain, the PDE itself reformulates as Integrating with respect to x then gives where C(t) is a constant of integration depending on t. Now consider the lower boundary condition, x = x − . Here, F (x − , t) = 0 implies that ∂ t F = 0 and so The right-hand side in Eq. (8) is precisely the reflecting boundary condition on f once we recall that ∂ x F = f . Therefore, C(t) ≡ 0. Thus, the fully specified PDE for F , which we will be solving frequently in what follows, is Numerical solutions for Eq. (9) are shown in Fig. 4. We have used the standard Crank-Nicholson finite-difference algorithm (central-differences in space with equally weighted implicit-explicit terms in time, see [25]).

Fortet Equation
Consider a general form of Eq. (3), Note that this is the distribution of Y t in absence of a threshold, different from the distribution given in Eq. (7), which is the distribution of the process constrained to be below the threshold. Now consider an arbitrary time-dependent threshold v th (t). The Fortet equation (see [26]) convolves the first-hitting time probabilities, g(t), with the transition density of the process. Integrating over (−∞, v th (t)), we obtain The left-hand side is simply the probability of exceeding v th at time t starting at v 0 at time 0. This can also be written as the probability of hitting v th for the first time at time τ < t and then exceeding v th at time t starting at v th at time τ , integrated over all τ . The Fortet equation is particularly appealing to use when we have an analytical expression for Φ. For the problem at hand, Φ is complicated due to the time-dependent forcing. However, the following transformation yields a time-homogeneous Y for which Φ will be tractable, along with an associated moving threshold, v th (t). This makes feasible the use of the Fortet equation. To obtain this transformation, cf. [27], consider the deterministic version of the SDE in Eq. (3) Now take X t , the solution to Eq. (3) and v(t), Eq. (12), and let which has the time and parameter dependent threshold That is, X t hits the constant threshold v th if and only if Y t hits the moving threshold v th{α,γ ;φ} (t), where the subindex indicates the dependence on α, γ and φ. Therefore, the ISIs produced by X and Y are the same and so are their distributions. Thus, g φ (τ ) satisfies where is the conditional cumulative distribution function of Y t defined in Eq. (13).

Parameter Estimation Algorithms
The unknown parameters in Eq. (3) are α, β, and γ , while we assume Ω known. The reason why the amplitude, γ , is often unknown while the frequency, Ω, is known is that one can usually observe the sinusoidal input and thus its frequency. Further, the encoding of the input into neuronal firing patterns often involves phase locking to the sinusoidal component. However, the actual forcing amplitude at the level of the neuron is usually modified by various synaptic and other filtering processes, unless the cell receives direct sinusoidal current injection.
Our goal is to estimate the structural parameters (α, β, γ ) from a sample of spike time data, {i 1 , . . . , i N }. There are several algorithms for estimating the parameters for the simpler and more common case of γ = 0. One such algorithm relies on the Fortet equation (see [7,8]), which we extend to the presence of a time-varying current. A more basic approach is to directly solve the Fokker-Planck equation for the probability density of X t , [19][20][21], from which one can derive the survival distribution of I n , and use this to compare against the empirical survival distribution of I n obtained from data. An approximate maximum likelihood approach is also possible by numerical differentiation. The relation between Fokker-Planck equations and the first-passage time problem is discussed in most introductory books on stochastic analysis; see, for example, [28]. A recent review of this approach for the simple γ = 0 case in neuronal modeling can be found in [21], wherein the first passage problem is discussed at great lengths in the context of spiking neurons. We will use this in Sect. 2.2. A more elaborate approach using the Fokker-Planck equation to approximate the hitting time distribution is given in [29]. The techniques in [29] avoid the need to compute the Fokker-Planck PDE numerically, instead approximating it with analytically known solutions. This approach might offer significant computational savings, but since this would at most amount to a computational speed-up of our algorithm, we have left this unexplored for now.
The immediate problem in generalizing the aforementioned approaches to the case of γ = 0 is that the I n 's are no longer identically distributed since the phase φ n−1 of the nth interval I n depends on t n−1 , the time the previous spike occurred. The I n 's are also dependent, but conditionally independent given φ n−1 . So the trajectories in each interval are parameterized by the value of φ n−1 at the time of the last spike/reset. We overcome this obstacle by splitting the I n 's in groups, and approximating the I n 's within groups as coming from identically distributed trajectories in a sense to be specified below. This approximation which solves the challenge of dependent and non-identically distributed ISIs is the primary contribution of this paper.

φ-Binning
Before we can use Eq. (9) or (15), we need to deal with the fact that φ is not fixed, but instead each I n starts with a distinct φ n . Our approach is to partition the interval [0, 2π/Ω] into M bins, where M N , and represent each bin by the midpoint of the bin, φ m . Then we approximate the N observed φ n 's by the closest φ m and pretend that any observed I n was not produced by a trajectory of the form in Eq. (3) with φ = φ n , but with φ = φ m . Our hope is that for a judicious choice of M, we can balance the There is clearly much freedom in how one sets up these bins, but we will do the simplest thing and make them all of equal width, δφ = 2π/(ΩM). Each φ n will belong to one and only one of the bins [φ m − δφ/2, φ m + δφ/2) M m=1 , with center points φ m = δφ/2 + (m − 1)δφ, for m = 1, . . . , M. Thus, given an empirically observed I n with associated φ n , we will pretend that it was produced by the process This binning is illustrated in Fig. 5.
While we have no rigorous approach to determine the value of M, our limited experience suggests that given N = 1000 ISIs, M = 10, or M = 20 gives satisfactory results for very different parameter regimes. In general, choosing M is a balancing act. For M too high, the resulting bins will have too few data points to approximatē G(I ) accurately. Therefore, M is forced to be small when sample size is not large. For M too low, the approximation of the phase shifts will be poor, leading to a biased estimate ofḠ(I ). We illustrate the effect of increasing M in Fig. 6. Generally, as long as there are sufficient data points, as M increases, the approximation of using the survival distribution with φ m instead of φ n improves since φ m (n) → φ n as M → ∞. In the sequel, we will use M = 20 for sample sizes of N = 1000 and M = 8 for sample sizes of N = 100.

Fokker-Planck Algorithm
Within each bin it is clear how to apply Eq. (7). In the mth bin, for a given φ m , we approximateḠ φ (t) bŷ The weight N m is included so that bins with larger sample sizes have a larger influence on the estimates. To evaluate the supremum in Eq. (17), we spline interpolate the empirically dis-creteĜ for each φ m , sample at the time nodes of the PDE discretization and finally take the maximum amongst the sampled values. We then minimize L using an optimization algorithm (see below, Sect. 4) and take our estimatesα,β,γ to bê α,β,γ = arg min α,β,γ L(α, β, γ ).
Note that the relation between the spike time survival density,Ḡ φ and the transition distribution, F φ , in Eq. (7) could also allow for an approximate maximum likelihood estimator (MLE), based on maximizing where the derivative has to be approximated by finite differences. We can then again use binning to avoid having to compute the PDE separately for each (i n , φ n−1 ). Our experience with the MLE approach has been that the quality of the estimates provided are similar to those obtained by minimizing Eq. (17) and that the associated computing times are on the same order. Due to this similarity and in order to keep the paper concise, we include details of the MLE estimates only in the supplementary online material.

Fortet Algorithm
An alternative approach is to form a loss function from Eq. (15). This is similar to what is done in [7,8] for the simpler case of a constant threshold. Noting that where the expectation is taken with respect to the distribution of the random variable I , we can use the fact that the ISIs are approximately independent and invoke the law of large numbers to estimate the integral as {β} v th{α,γ ;φ} (t), t|v th{α,γ ;φ} (i n ), i n .
We divide each inner term by ω(φ m ; α, β, γ ) = sup s>0 |1 − Φ (φ m ) α,β,γ (v th (s), s|v 0 )|, following the suggestion in [8]. This scaling ensures that Eq. (15) divided by ω(α, β, γ ) will vary between 0 and 1 for all parameter values thus giving sense to the measure defined by the loss function. Since we can solve in closed form for Φ, we have all we need given an observed spike train of i n 's. We evaluate the sup by sampling at K = 500 uniformly spaced points in (0, I max + ] and taking the maximum of the sampled values.

Initialization of the Algorithms
The parameter search can be initialized in a simple way using the fact that the Fokker-Planck PDE is almost an "advection-diffusion" equation whose solution is almost a Gaussian. ThenḠ(t) can be approximated by the amount of probability mass of a Gaussian to the left of the threshold at time t. The idea is as follows. Suppose we are solving the following PDE: Its solution given an initial condition ρ(x, 0) = δ(x) will be a Gaussian bell moving to the right with speed U and standard deviation σ = β √ t. The survivor functionḠ(t) can be thought of as the amount of area that has passed the threshold (from the left moving to the right). We can then invert the information aboutḠ to estimate U and β. In particular, a Gaussian bell has ≈ 0.158 of its mass more than one standard deviation to the right of its mean. Thus, at time t 1 such that G(t 1 ) = 0.842, the right tail of more than one standard deviation of the Gaussian bell has crossed the threshold. The threshold is at x = 1 and we obtain the following equation: Similarly, at time t 2 such thatḠ(t 2 ) = 1 − 0.842, the Gaussian bell has crossed the threshold except for the left tail and we have If U and β were constant, then Eqs. (20) and (21) provide two equations in two unknowns. However, is not constant and we approximate U as i.e., we approximate the space-dependent term, x, with the mid-point between the reset value, v 0 = 0, and the threshold, v th = 1, and we approximate the time-dependent term, sin(Ω(τ + φ)), by its time-average value between 0 and t. If we use the 0th, 1st, and 2nd standard deviation points, we can form 5 equations in 3 unknowns as follows: with the time-average weighting function s(t) = (cos(Ωφ) − cos(Ω(t + φ)))/Ω. However, the approximation is best for earlier times, when the solution is closer to a Gaussian bell that is approaching the threshold, but less correct for later times, since it neglects the loss of probability mass and thus overestimates the backward probability current. Indeed, we have found it to be best to use only t 1 and t 2 . In the following we use only these equations: for the initializer. We can form these equations separately for each φ m bin, thus resulting in M × 2 equations for the unknowns α, β, and γ . Since we have more equations than unknowns, we use least-squares estimates in a regression to pick out unique α, β, and γ estimates. The proposed initialization procedure has two advantages. First, it is automatic, i.e., it requires only the data and no input or guidance from the user. Second, it is extremely fast. While the precise effect of the initializer is shown in Sect. 4, it is intuitively clear that it will work best in the supra-threshold parameter regime when the bell curve is truly moving past the threshold as a whole and less so for subthreshold regimes, when only the diffusive force serves to propel the process to reach v th . The behavior of the initializer in the different regimes is illustrated in Fig. 7. What we show in Fig. 7 is the following: First, we show the survival distribution for a given regime and φ m fixed. Then using data generated from such a regime and with φ n in the mth bin, the initializer tries to find the best approximation by the motion of a Gaussian bell which will fit these data, in the sense of solving for α, β, γ as previously described. Once this is done, we then show in red the amount of area under this Gaussian bell to the left of the threshold. Of course, the interpretation of the survival distribution for an ISI as a fraction of the area under a moving bell with conserved total area is wrong, but the assumption is useful in automatically generating initial values for the more appropriate approximations to start their work.

Method Comparison on Simulated Data
We will now use our algorithms on spike trains simulated from the four different regimes: the supra-threshold, the critical, the supersinusoidal and the subthreshold. We have used 100 sample spike trains per regime, with N = 100 as well as N = 1000 spikes per train. In order to perform the numerical minimization of Eqs. (17) and (18), we have used an implementation of the Nelder-Mead algorithm from the SciPy library [30]. The Nelder-Mead algorithm is a non-linear minimization routine which uses a bounding-polygon method to zero-in on the minimum and thus avoids the need to provide the gradient of the loss function. It is the standard non-gradient minimization algorithm.
The estimation results are shown in Figs. 8, 9, 10, 11, where we plot box plots for the estimates,α,β,γ in the four regimes. We also tabulate the average and the empirical 95 % confidence intervals of the estimates in Tables 2 and 3. Conclusions that can be drawn from these results are as follows. The initializer method is effective for the supra-threshold regime and gives reasonable ballpark estimates for all regimes, though the error can be substantial for the supersinusoidal regime. In general, both the Fortet and Fokker-Planck algorithm estimate the parameters well in Fig. 7 The blue curves are the numerically obtained survivor distributionsḠ φ for the exact parameters in the four regimes (as in Table 1) and Ω = 1. The red curves are obtained in the following manner: Simulations using the true parameters were used to generate sample spikes. Using these samples, the initializer algorithm was used to generate estimates for α, β, γ . Using these estimates, the bell curve discussed in Sect. 3.4 was formed and evolved in time. Thus, the red curve drawn in the figures measures the area under this bell that is to the left of the threshold at time t . a supra-threshold, b supersinusoidal, c critical, d subthreshold  upper plots (a, b, c) show estimates using N = 100 sample spikes per estimation, while the lower plots (d, e, f) use N = 1000. The dashed line indicates the true parameter value, while the red line inside the boxes indicates the median of the estimates. The boxes contain the central 50 % of the estimates. The bars indicate the range of the estimates, except for outliers given by the points outside the bars, and defined to be more than 1.5 times the interquantile range (the height of the box) from the box the supra-threshold, critical and supersinusoidal regimes. The estimators' variance is especially low in the supra-threshold regime, while it is higher for the critical and supersinusoidal regimes. In the supersinusoidal regime, the two algorithms give accurate estimates even though the initializer can be quite off. On the other hand, in the subthreshold regime, the initializer has a performance comparable to that of the two more involved methods. It seems that distinguishing between the constant bias and the sinusoidal current is difficult if their sum is not sufficient to generate spikes without noise.
The Fokker-Planck method has a larger bias but a smaller spread than the Fortet method for N = 100, Table 2. However for N = 1000, the two methods have comparable spreads, while the Fortet method retains a smaller bias, see Table 3. More precisely, for N = 1000, the Fokker-Planck method has a smaller spread in the sub- The bars indicate the range of the estimates, except for outliers given by the points outside the bars, and defined to be more than 1.5 times the interquantile range (the height of the box) from the box threshold regime, while the Fortet method has a smaller spread in the supersinusoidal regime. As such, at least in the supersinusoidal regime, the Fortet method seems superior. A detailed comparison between the Fortet and the Fokker-Planck estimators for each parameter in each regime can be seen in Fig. 12 for N = 100 and in Fig. 13 for N = 1000.
The two algorithms are numerically intensive. For N = 100 and N = 1000 spikes, we show the times taken for the estimation in Table 4. While we have done most of our numerical work in Python/SciPy [30], we have implemented the critical components of both algorithms in C. That is, we solve the inner part of Eq. (18) and the Fokker-Planck PDE, Eq. (9), in C using the GSL libraries [31]. From Table 4, we can verify that the computing time for the Fortet algorithm scales proportionally with the number of spikes. This is to be expected, since the Fortet equation has a term of   the form i n which in turn has N terms and this forms the bulk of the computing time for the Fortet equation. The Fokker-Planck algorithm, on the other hand, scales lessthan-linearly with N , since the dependency on N is in forming the approximation,Ĝ to the survivor function and that is not computationally intensive (solving the PDE is).

The Effect of Ω
So far, we have held Ω constant and equal to 1. We now investigate the effect of varying Ω on the quality of estimates. To narrow the scope, we focus on increasing Ω while keeping the parameters in the critical regime such that α + γ / √ 1 + Ω 2 = 1 and α = 0.5. This amounts to increasing γ with Ω. We do the estimations for four values of Ω = [1,5,10,20]. Similarly to the previous section, we use 100 sample spike trains per parameter set, with each spike train consisting of N = 1000 ISIs.
We show box plots of the estimates for each Ω in Fig. 14. We then directly compare the two algorithms, Fortet vs. Fokker-Planck, in Fig. 15. The immediate observation is that the Fokker-Planck algorithm fails to keep up at the higher frequencies and consistently underestimates γ . The Fortet algorithm does better, but still underestimates γ . In general, this underestimation of γ is accompanied by an overestimation of α. This is exacerbated at higher Ω. We illustrate the relation between estimates for α vs. γ in Fig. 16, where it is quite clear that an underestimation of γ is proportional to the overestimation of α.
For completeness, we also include the estimates' average and empirical 95 % confidence intervals in Table 5.

Discussion and Outlook
We have shown two methods to estimate parameters in Eq. (2) from ISI data. Our methods are based on binning the spikes into bins with representative phase shifts. We have devised a constructive procedure to automatically initialize the methods from the data.
Our computational results suggest that for low frequencies the Fortet algorithm is superior for large sample sizes, especially in the supersinusoidal regime, while the Fokker-Planck algorithm has a comparable accuracy and a lower variance for small The boxes contain the central 50 % of the estimates. The bars indicate the range of the estimates, except for outliers given by the points outside the bars, and defined to be more than 1.5 times the interquantile range (the height of the box) from the box sample sizes. Both algorithms find sensible estimates most of the time, although they seem less effective in the subthreshold regime. Their performance can be partially attributed to the ability of the initializer algorithm to supply good guesses for starting the optimization iterations.
The Fokker-Planck equation allows for approximate maximum likelihood estimation. We chose an alternative loss function, though, because it marginally appeared more robust, possibly because a numerical derivation step is avoided. This is further investigated by simulations in the supplementary online material. The simulations suggest that the distribution of the maximum likelihood estimates in the supersinusoidal regime appears bimodal, which is not the case for the alternative loss function, Eq. (17).
We have also made a preliminary exploration of the effect of Ω on the quality of the estimates. Our results show that an increase in Ω makes the parameters α and γ more difficult to estimate accurately and at high Ω, γ is underestimated, while α is over-estimated. We find that in this scenario, the Fortet algorithm does a markedly more accurate job then the Fokker-Planck algorithm.
We have assumed the time-constant τ of the leak term to be known. In most experiments that is not realistic, and it would be preferable to estimate τ alongside the other parameters. However, it is difficult to estimate [32]. When we tried to estimate it together with the other parameters, we usually obtained results which were not accurate. The obtained estimates resulted in ISIs that very well matched the data, Fig. 15 Estimates based on samples of N = 1000 spikes obtained from the Fokker-Planck algorithm against the Fortet algorithm for a parameter set in the critical regime, while varying Ω across [1,5,10,20] and holding γ / 1 + Ω 2 and α constant. a, b, c Ω = 1; d, e, f Ω = 5; g, h, i Ω = 10; j, k, l Ω = 20 no worse than the ISIs obtained from the true parameters. This leads us to believe that the simultaneous estimation of τ along with α, β, γ using only ISI data suffers from identifiability problems. In [5], they were able to estimate τ in the simpler nonsinusoidally-driven model, but concluded that adding τ as an unknown dramat- ically reduced the accuracy in the estimation of the other unknown parameters. The reason is that if τ is also estimated from a single dataset alongside the other parameters, then a reasonable fit can be found to the data for various combinations of α, β, γ , and τ , but the so-obtained parameter values can be far from the true values.
Our model is relatively simple and ignores neurophysiological realism, such as the fact that the spiking threshold is likely nonconstant, with a time-dependent functional form that would involve further unknown parameters. A recent paper attempting the parameter estimation in such a model, but without sinusoidal forcing, is [20]. Further- Table 5 Averages and empirical 95 % confidence intervals of estimates for N = 1000 spikes per train in the critical regime for varying Ω across [1,5,10,20]. Note that the upper subtable corresponds to the third subtable in Table 3; numbers differ slightly due to statistical fluctuations in the simulations more, intracellular recordings suggest that a hard threshold is a rough approximation and an exponential voltage-dependent spiking intensity is more realistic [33]. While our work has used a very specific form of the periodic forcing term, namely γ sin(Ωt), it is clear how to apply the approach to an arbitrary periodic function. This can be done as long as one knows where in the period of oscillation a spike has occurred. If that is the case, then the binning procedure can be applied and the estimation methods proposed can be attempted.