Data Assimilation Methods for Neuronal State and Parameter Estimation

This tutorial illustrates the use of data assimilation algorithms to estimate unobserved variables and unknown parameters of conductance-based neuronal models. Modern data assimilation (DA) techniques are widely used in climate science and weather prediction, but have only recently begun to be applied in neuroscience. The two main classes of DA techniques are sequential methods and variational methods. We provide computer code implementing basic versions of a method from each class, the Unscented Kalman Filter and 4D-Var, and demonstrate how to use these algorithms to infer several parameters of the Morris–Lecar model from a single voltage trace. Depending on parameters, the Morris–Lecar model exhibits qualitatively different types of neuronal excitability due to changes in the underlying bifurcation structure. We show that when presented with voltage traces from each of the various excitability regimes, the DA methods can identify parameter sets that produce the correct bifurcation structure even with initial parameter guesses that correspond to a different excitability regime. This demonstrates the ability of DA techniques to perform nonlinear state and parameter estimation and introduces the geometric structure of inferred models as a novel qualitative measure of estimation success. We conclude by discussing extensions of these DA algorithms that have appeared in the neuroscience literature. Electronic Supplementary Material The online version of this article (10.1186/s13408-018-0066-8) contains supplementary material.


The Parameter Estimation Problem
The goal of conductance-based modeling is to be able to reproduce, explain, and predict the electrical behavior of a neuron or networks of neurons. Conductancebased modeling of neuronal excitability began in the 1950s with the Hodgkin-Huxley model of action potential generation in the squid giant axon [1]. This modeling framework uses an equivalent circuit representation for the movement of ions across the cell membrane: where V is membrane voltage, C is cell capacitance, I ion are ionic currents, and I app is an external current applied by the experimentalist. The ionic currents arise from channels in the membrane that are voltage-or calcium-gated and selective for particular ions, such sodium (Na + ) and potassium (K + ). For example, consider the classical Hodgkin-Huxley currents: The maximal conductance g ion is a parameter that represents the density of channels in the membrane. The term (V − E ion ) is the driving force, where the equilibrium potential E ion is the voltage at which the concentration of the ion inside and outside of the cell is at steady state. The gating variable m is the probability that one of three identical subunits of the sodium channel is "open", and the gating variable h is the probability that a fourth subunit is "inactivated". Similarly, the gating variable n is the probability that one of four identical subunits of the potassium channel is open. For current to flow through the channel, all subunits must be open and not inactivated. The rate at which subunits open, close, inactivate, and de-inactivate depends on the voltage. The dynamics of the gating variables are given by where α x (V ) and β x (V ) are nonlinear functions of voltage with several parameters.
The parameters of conductance-based models are typically fit to voltage-clamp recordings. In these experiments, individual ionic currents are isolated using pharmacological blockers and one measures current traces in response to voltage pulses. However, many electrophysiological datasets consist of current-clamp rather than voltage-clamp recordings. In current-clamp, one records a voltage trace (e.g., a series of action potentials) in response to injected current. Fitting a conductance-based model to current-clamp data is challenging because the individual ionic currents have not been measured directly. In terms of the Hodgkin-Huxley model, only one state variable (V ) has been observed, and the other three state variables (m, h, and n) are unobserved. Conductance-based models of neurons often contain several ionic currents and, therefore, more unobserved gating variables and more unknown or poorly known parameters. For example, a model of HVC neurons in the zebra finch has 9 ionic currents, 12 state variables, and 72 parameters [2]. An additional difficulty in attempting to fit a model to a voltage trace is that if one performs a least-squares minimization between the data and model output, then small differences in the timing of action potentials in the data and the model can result in large error [3]. Data assimilation methods have the potential to overcome these challenges by performing state estimation (of both observed and unobserved states) and parameter estimation simultaneously.

Data Assimilation
Data assimilation can broadly be considered to be the optimal integration of observations from a system to improve estimates of a model output describing that system. Data assimilation (DA) is used across the geosciences, e.g., in studying land hydrology and ocean currents, as well as studies of climates of other planets [4][5][6]. An application of DA familiar to the general public is its use in numerical weather prediction [7]. In the earth sciences, the models are typically high-dimensional partial differential equations (PDEs) that incorporate dynamics of the many relevant governing processes, and the state system is a discretization of those PDEs across the spatial domain. These models are nonlinear and chaotic, with interactions of system components across temporal and spatial scales. The observations are sparse in time, contaminated by noise, and only partial with respect to the full state-space.
In neuroscience, models can also be highly nonlinear and potentially chaotic. When dealing with network dynamics or wave propagation, the state-space can be quite large, and there are certainly components of the system for which one would not have time course measurements [8]. As mentioned above, if one has a biophysical model of a single neuron and measurements from a current-clamp protocol, the only quantity in the model that is actually measured is the membrane voltage. The question then becomes: how does one obtain estimates of the full system state?
To begin, we assume we have a model to represent the system of interest and a way to relate observations we have of that system to the components of the model. Additionally, we allow, and naturally expect, there to be errors present in the model and measurements. To start, let us consider first a general model with linear dynamics and a set of discrete observations which depend linearly on the system components: y k+1 = H x k+1 + η k+1 , y k+1 ∈ R M . (6) In this state-space representation, x k is interpreted as the state of the system at some time t k , and y k are our observations. For application in neuroscience, we can take M L as few state variables of the system are readily observed. F is our model which maps states x k between time points t k and t k+1 . H is our observation operator which describes how we connect our observations y k+1 to our state-space at t k+1 . The random variables ω k+1 and η k+1 represent model error and measurement error, respectively. A simplifying assumption is that our measurements are diluted by Gaussian white noise, and that the error in the model can be approximated by Gaussian white noise as well. Then ω k ∼ N (0, Q k ) and η k ∼ N (0, R k ), where Q k is our model error covariance matrix and R k is our measurement error covariance matrix. We will assume these distributions for the error terms for the remainder of the paper.
We now have defined a stochastic dynamical system where we have characterized the evolution of our states and observations therein based upon assumed error statistics. The goal is now to utilize these transitions to construct methods to best estimate the state x over time. To approach this goal, it may be simpler to consider the evaluation of background knowledge of the system compared to what we actually observe from a measuring device. Consider the following cost function [9]: where z 2 A = z T A −1 z. P b acts to give weight to certain background components x b , and R acts in the same manner to the measurement terms. The model or background term acts to regularize the cost function. Specifically, trying to minimize 1 2 is underdetermined with respect to the observations unless we can observe the full system, and the model term aims to inform the problem of the unobserved components. We are minimizing over state components x. In this way, we balance the influence of what we think we know about the system, such as from a model, compared to what we can actually observe. The cost function is minimized from This can be restructured as where The optimal Kalman gain matrix K acts as a weighting of the confidence of our observations to the confidence of our background information given by the model. If the background uncertainty is relatively high or the measurement uncertainty is relatively low, K is larger, which more heavily weights the innovation y − H x b . The solution of (7) can be interpreted as the solution of a single time step in our state-space problem (5)- (6). In the DA literature, minimizing this cost function independent of time is referred to as 3D-Var. However, practically we are interested in problems resembling the following: where formally the background component x b has now been replaced with our model. Now we are concerned with minimizing over an observation window with N + 1 time points. Variational methods, specifically "weak 4D-Var", seek minima of (11) either by formulation of an adjoint problem [10], or directly from numerical optimization techniques. Alternatively, sequential data assimilation approaches, specifically filters, aim to use information from previous time points t 0 , t 1 , . . . , t k , and observations at the current time t k+1 , to optimally estimate the state at t k+1 . The classical Kalman filter utilizes the form of (10), which minimizes the trace of the posterior covariance matrix of the system at step k + 1, P a k+1 , to update the state estimate and system uncertainty. The Kalman filtering algorithm takes the following form. Our analysis estimate, x a k from the previous iteration, is mapped through the linear model operator F to obtain our forecast estimatex The observation operator H is applied to the forecast estimate to generate the measurement estimateŷ The forecast estimate covariance P f k+1 is generated through calculating the covariance from the model and adding it with the model error covariance Q k : Similarly, we can construct the measurement covariance estimate by calculating the covariance from our observation equation and adding it to the measurement error covariance R k : The Kalman gain is defined analogously to (10): The covariance and the mean estimate of the system are updated through a weighted sum with the Kalman gain: x a k+1 =x These equations can be interpreted as a predictor-corrector method, where the predictions of the state estimates arex f k+1 with corresponding uncertainties P f k+1 in the forecast. The correction, or analysis, step linearly interpolates the forecast predictions with observational readings.
In this paper we only consider filters, however smoothers are another form of sequential DA that also use observational data from future times t k+2 , . . . , t k+l to estimate the state at t k+1 .

Nonlinear Filtering
For nonlinear models, the Kalman equations need to be adapted to permit nonlinear mappings in the forward operator and the observation operator: Our observation operator for voltage data remains linear: where e j is the j th elementary basis vector, is a projection onto the voltage component of our system. Note that h(x) is an operator, not to be confused with the inactivation gate in (2). Our nonlinear model update, f (x) in (19), is taken as the forward integration of the dynamical equations between observation times. Multiple platforms for adapting the Kalman equations exist. The most straightforward approach is the extended Kalman filter (EKF) which uses local linearizations of the nonlinear operators in (19)- (20) and plugs these into the standard Kalman equations. By doing so, one preserves Gaussianity of the state-space. Underlying the data assimilation framework is the goal of understanding the distribution, or statistics of the distribution, of the states of the system given the observations: The Gaussianity of the state-space declares the posterior conditional distribution p(x|y) to be a normal distribution by the product of Gaussians being Gaussian, and the statistics of this distribution lead to the Kalman update equations [10]. However, the EKF is really only suitable when the dynamics are nearly linear between observations and can result in divergence of the estimates [11]. Rather than trying to linearize the transformation to preserve Gaussianity, where this distributional assumption is not going to be valid for practical problems anyway, an alternative approach is to preserve the nonlinear transformation and try to estimate the first two moments of transformed state [11]. The Unscented Kalman Filter (UKF) approximates the first two statistics of p(x k |y 0 . . . y k ) by calculating sample means and variances, which bypasses the need for Gaussian integral products. The UKF uses an ensemble of deterministically selected points in the state-space whose collective mean and covariance are that of the state estimate and its associated covariance at The forward operator f (x) is applied to each of these sigma points, and the mean and covariance of the transformed points can then be computed to estimate the nonlinearly transformed mean and covariance. Figure 1 depicts this "unscented" transformation. The sigma points precisely estimate the true statistics both initially ( Fig. 1(A)) and after nonlinear mapping ( Fig. 1(B)).
In the UKF framework, as with all DA techniques, one is attempting to estimate the states of the system. The standard set of states in conductance-based models includes the voltage, the gating variables, and any intracellular ion concentrations not taken to be stationary. To incorporate parameter estimation, parameters θ to be estimated are promoted to states whose evolution is governed by the model error random variable: This is referred to as an "artificial noise evolution model", as the random disturbances driving deviations in model parameters over time rob them of their time-invariant definition [12,13]. We found this choice to be appropriate for convergence and as a tuning mechanism. An alternative is to zero out the entries of Q k corresponding to the parameters in what is called a "persistence model" where θ k+1 = θ k [14]. However, changes in parameters can still occur during the analysis stage. We declare our augmented state to be comprised of the states in the dynamical system as well as parameters θ of interest: where q represents the additional states of the system besides the voltage. The filter requires an initial guess of the statex 0 and covariance P xx . An implementation of this algorithm is provided as Supplementary Material with the parent function UKFML.m and one time step of the algorithm computed in UKF_Step.m. An ensemble of σ points are formed and their position and weights are determined by λ, which can be chosen to try to match higher moments of the system distribution [11]. Practically, this algorithmic parameter can be chosen to spread the ensemble for λ > 0, shrink the ensemble for −N < λ < 0, or to have the mean point completely removed from the ensemble by setting it to zero. The ensemble is formed on lines 80-82 of UKF_Step.m. The individual weights can be negative, but their cumulative sum is 1.
We form our background estimatex b k+1 by applying our map f (x) to each of the ensemble membersX j = f (X j ) (25) and then computing the resulting mean: We then propagate the transformed sigma points through the observation operator and compute our predicted observationŷ b k+1 from the mapped ensemble: We compute the background covariance estimate by calculating the variance of the mapped ensemble and adding the process noise Q k : and do the same for the predicted measurement covariance with the addition of R k : Predicted Meas. Cov. : The Kalman gain is computed by matrix multiplication of the cross-covariance: with the predicted measurement covariance: Kalman Gain : K = P xy P −1 yy .
When only observing voltage, this step is merely scalar multiplication of a vector. The gain is used in the analysis, or update step, to linearly interpolate our background statistics with measurement corrections. The update step for the covariance is and the mean is updated to interpolate the background estimate with the deviations of the estimated measurement term with the observed data y k+1 : The analysis step is performed on line 124 of UKF_Step.m. Some implementations also include a redistribution of the sigma points about the forecast estimate using the background covariance prior to computing the cross-covariance P xy or the predicted measurement covariance P yy [15]. So, after (29), we redefineX j ,Ỹ j in (25) as follows:X The above is shown in lines 98-117 in UKF_Step. A particularly critical part of using a filter, or any DA method, is choosing the process covariance matrix Q k and the measurement covariance matrix R k . The measurement noise may be intuitively based upon knowledge of one's measuring device, but the model error is practically impossible to know a priori. Work has been done to use previous innovations to simultaneously estimate Q and R during the course of the estimation cycle [16], but this becomes a challenge for systems with low observability (such as is the case when only observing voltage). Rather than estimating the states and parameters simultaneously as with an augmented state-space, one can try to estimate the states and parameters separately. For example, [17] used a shooting method to estimate parameters and the UKF to estimate the states. This study also provided a systematic way to estimate an optimal covariance inflation Q k . For high-dimensional systems where computational efficiency is a concern, an implementation which efficiently propagates the square root of the state covariance has been developed [18].  Example of iterative estimation in UKF. The red circles are the result of forward integration through the model using the previous best estimates. The green are the estimates after combining these with observational data. The blue stars depict the true system output (without any noise), and the magenta stars are the noisy observational data with noise generated by (48) and ε = 0.1

Variational Methods
In continuous time, variational methods aim to find minimizers of functionals which represent approximations to the probability distribution of a system conditioned on some observations. As our data is available only in discrete measurements, it is practical to work with a discrete form similar to (7) for nonlinear systems: We assume that the states follow the state-space description in (19)- (20) with ω k ∼ N (0, Q) and η k ∼ N (0, R), where Q is our model error covariance matrix and R is our measurement error covariance matrix. As an approximation, we impose Q, R to be diagonal matrices, indicating that there is assumed to be no correlation between errors in other states. Namely, Q, contains only the assumed model error variance for each state-space component, and R is just the measurement error variance of the voltage observations. These assumptions simplify the cost function to the following: where V k = x 1,k . For the current-clamp data problem in neuroscience, one seeks to minimize equation (36) in what is called the "weak 4D-Var" approach. An example implementation of weak 4D-Var is provided in w4DvarML.m in the Supplementary Material. An example of the cost function with which to minimize over is given in the child function w4dvarobjfun.m. Each of the x k is mapped by f (x) on line 108. Alternatively, "strong 4D-Var" forces the resulting estimates to be consistent with the model f (x). This can be considered the result of taking Q → 0, which yields the nonlinearly constrained problem such that The rest of this paper will be focused on the weak case (36), where we can define the argument of the optimization as follows: resulting in an (N + 1)L + D-dimensional estimation problem. An important aspect of the scalability of this problem is that the Hessian matrix is sparse. Namely, each state at each discrete time has dependencies based upon the model equations and the chosen numerical integration scheme. At the heart of many gradient-based optimization techniques lies a linear system, involving the Hessian and the gradient ∇C(x n ) of the objective function, that is used to solve for the next candidate point. Specifically, Newton's method for optimization is Therefore, if (N + 1)L + D is large, then providing the sparsity pattern is advantageous when numerical derivative approximations, or functional representations of them, are being used to perform minimization with a derivative-based method. One can calculate these derivatives by hand, symbolic differentiation, or automatic differentiation.
A feature of the most common derivative-based methods is assured convergence to local minima. However, our problem is non-convex due to the model term, which leads to the development of multiple local minima in the optimization surface as depicted in Fig. 3. For the results in this tutorial, we will only utilize local optimization tools, but see Sect. 5 for a brief discussion of some global optimization methods with stochastic search strategies.

Twin Experiments
Data assimilation is a framework for the incorporation of system observations into an estimation problem in a systematic fashion. Unfortunately, the methods themselves do not provide a great deal of insight into the tractability of unobserved system components of specific models. There may be a certain level of redundancy in the model equations and degeneracy in the parameter space leading to multiple potential solutions [19]. Also, it may be the case that certain parameters are non-identifiable if, for instance, a parameter can be completely scaled out [20]. Some further work on identifiability is ongoing [21,22].  Before applying a method to data from a real biological experiment, it is important to test it against simulated data where the ground truth is known. In these experiments, one creates simulated data from a model and then tries to recover the true states and parameters of that model from the simulated data alone.

Recovery of Bifurcation Structure
In conductance-based models, as well as in real neurons, slight changes in a parameter value can lead to drastically different model output or neuronal behavior. Sudden changes in the topological structure of a dynamical system upon smooth variation of a parameter are called bifurcations. Different types of bifurcations lead to different neuronal properties, such as the presence of bistability and subthreshold oscillations [23]. Thus, it is important for a neuronal model to accurately capture the bifurcation dynamics of the cell being modeled [24]. In this paper, we ask whether or not the models estimated through data assimilation match the bifurcation structure of the model that generated the data. This provides a qualitative measure of success or failure for the estimation algorithm. Since bifurcations are an inherently nonlinear phenomenon, our use of topological structure as an assay emphasizes how nonlinear estimation is a fundamentally distinct problem from estimation in linear systems.

Morris-Lecar Model
The Morris-Lecar model, first used to describe action potential generation in barnacle muscle fibers, has become a canonical model for studying neuronal excitability [25]. The model includes an inward voltage-dependent calcium current, an outward voltage-dependent potassium current, and a passive leak current. The activation gating variable for the potassium current has dynamics, whereas the calcium current activation gate is assumed to respond instantaneously to changes in voltage. The calcium current is also non-inactivating, resulting in a two-dimensional model. The model exhibits multiple mechanisms of excitability: for different choices of model parameters, different bifurcations from quiescence to repetitive spiking occur as the applied Three different excitability regimes of the Morris-Lecar model. The bifurcation diagrams in the top row depict stable fixed points (red), unstable fixed points (black), stable limit cycles (blue), and unstable limit cycles (green). Gray dots indicate bifurcation points, and the dashed gray lines indicate the value of I app corresponding to the traces shown for V (middle row) and n (bottom row). (A) As I app is increased from 0 or decreased from 250 nA, the branches of stable fixed points lose stability through subcritical Hopf bifurcation, and unstable limit cycles are born. The branch of stable limit cycles that exists at I app = 100 nA eventually collides with these unstable limit cycles and is destroyed in a saddle-node of periodic orbits (SNPO) bifurcation as I app is increased or decreased from this value. (B) As I app is increased from 0, a branch of stable fixed points is destroyed through saddle-node bifurcation with the branch of unstable fixed points. As I app is decreased from 150 nA, a branch of stable fixed points loses stability through subcritical Hopf bifurcation, and unstable limit cycles are born. The branch of stable limit cycles that exists at I app = 100 nA is destroyed through a SNPO bifurcation as I app is increased and a SNIC bifurcation as I app is decreased. (C) Same as (B), except that the stable limit cycles that exist at I app = 36 nA are destroyed through a homoclinic orbit bifurcation as I app is decreased  18 18 current is increased [23]. Three different bifurcation regimes-Hopf, saddle-node on an invariant circle (SNIC), and homoclinic-are depicted in Fig. 4 and correspond to the parameter sets in Table 1. For a given applied current in the region where a stable limit cycle (corresponding to repetitive spiking) exists, each regime displays a distinct firing frequency and action potential shape.
The equations for the Morris-Lecar model are as follows: with The eight parameters that we will attempt to estimate from data are g L , g K , g Ca , φ, V 1 , V 2 , V 3 , and V 4 . We are interested in whether the estimated parameters yield a model with the desired mechanism of excitability. Specifically, we will conduct twin experiments where the observed data is produced by a model with parameters in a certain bifurcation regime, but the data assimilation algorithm is initialized with parameter guesses corresponding to a different bifurcation regime. We then assess whether or not a model with the set of estimated parameters undergoes the same bifurcations as the model that produced the observed data. This approach provides an additional qualitative measure of estimation accuracy, beyond simply comparing the values of the true and estimated parameters.

Results with UKF
The UKF was tested on the Morris-Lecar model in an effort to simultaneously estimate V and n along with the eight parameters in Table 1. Data was generated via a modified Euler scheme at observation points every 0.1 ms, where we take the stepsize t as 0.1 as well: The UKF is a particularly powerful tool when a lot of data is available; the computational complexity in time is effectively the same as the numerical scheme of choice, whereas the additional operations at each time point are O((L + D) 3 ) [26]. f (x) in (19) is taken to be the Morris-Lecar equations (42)- (43), acting as f (t k , x k ), integrated forward via modified Euler (47), and is given on line 126 of UKFML.m. The function fXaug.m, provided in the Supplementary Material, represents our augmented vector field. Our observational operator H is displayed on line 136 of UKFML.m. To reiterate, the states to be estimated in the Morris-Lecar model are the voltage and the potassium gating variable. The eight additional parameters are promoted to the members of state-space with trivial dynamics resulting in a ten-dimensional estimation problem.
These examples were run using 20 seconds of data which is 200,001 time points. During this time window, the Hopf, SNIC, and homoclinic models fire 220, 477, and 491 spikes, respectively. Such a computation for a ten-dimensional model takes only a few minutes on a laptop computer. R can be set to 0 when one believes the observed signal to be completely noiseless, but even then it is commonly left as a small number to try to mitigate the development of singularities in the predicted measurement covariance. We set our observed voltage to be the simulated output using modified Euler with additive white noise at each time point: where η ∼ N (0, (εσ true ) 2 ) is a normal random variable whose variance is equal to the square of the standard deviation of the signal scaled by a factor ε, which is kept fixed at 0.01 for these simulations. R is taken as the variance of η. The initial covariance of the system is α I I , where I is the identity matrix and α I is 0.001. The initial guess for n is taken to be 0. Q is fixed in time as a diagonal matrix with diagonal 10 −7 [max(V obs ) − min(V obs ), 1, |θ 0 |], where θ 0 represents our initial parameter guesses. We set λ = 5; however, this parameter was not especially influential for the results of these runs, as discussed further below. These initializations are displayed in the body of the parent function UKFML.m. Figure 5 shows the state estimation results when the observed voltage is from the SNIC regime, but the UKF is initialized with parameter guess corresponding to the Hopf regime. Initially, the state estimate for n and its true, unobserved dynamics have great disparity. As the observations are assimilated over the estimation window, the states and model parameters adjust to produce estimates which better replicate the observed, and unobserved, system components. In this way, information from the observations is transferred to the model. The evolution of the parameter estimates for this case is shown in the first column of Fig. 6, with φ, V 3 , and V 4 all converging to close to their true values after 10 seconds of observations. The only difference in parameter values between the SNIC and homoclinic regimes is the value of the parameter φ. The second column of Fig. 6 shows that when the observed data is from the homoclinic regime but the initial parameter guesses are from the SNIC regime, the estimates of V 3 and V 4 remain mostly constant near their original (and correct) values, whereas the estimate of φ quickly converges to its new true value. Finally, the third column of Fig. 6 shows that all three parameter estimates evolve to near their true values when the UKF is presented with data from the Hopf regime but initial parameter estimates from the homoclinic regime. Table 2 shows the parameter estimates at the end of the estimation window for all of the nine possible twin experiments. Promisingly, a common feature of the results is the near recovery of the true value of each of the parameters. However, the estimated parameter values alone do not necessarily tell us about the dynamics of the inferred  model. To assess the inferred models, we generate bifurcation diagrams using the estimated parameters and compare them to the bifurcation diagrams for the parameters that produced the observed data. Figure 7 shows that the SNIC and homoclinic bifurcation diagrams were recovered quite exactly. The Hopf structure was consistently recovered, but with shifted regions of spiking and quiescence and minor differences in spike amplitude.
To check the consistency of our estimation, we set 100 initial guesses for n across its dynamical range as samples from U(0, 1). Figure 8 shows that the state estimates   Table 2 for n across these initializations quickly approached very similar trajectories. We confirmed that after the estimation cycle was over, the parameter estimates for all 100 initializations were essentially identical to the values shown in Table 2. In this  U(0, 1), with all other parameters held fixed paper, we always initialized the UKF with initial parameter values corresponding to the various bifurcation regimes and did not explore the performance for randomly selected initial parameter guesses. For initial parameter guesses that are too far from the true values, it is possible that the filter would converge to incorrect parameter values or fail outright before reaching the end of the estimation window. Additionally, we investigated the choices of certain algorithmic parameters for the UKF, namely λ and α I . Figure 9(A) shows suitable ranges of these parameters, with the color indicating the root mean squared error of the parameters at the end of the cycle compared to their true values. We found this behavior to be preserved across our nine twin experiment scenarios. Notably, this shows that our results in Table 2 were generated using an initial covariance α I = 0.001 that was smaller than necessary. By increasing the initial variability, the estimated system can converge to the true dynamics more quickly, as shown for α I = 0.1 in Fig. 9(B). The value of λ does not have a large impact on these results, except for when α I = 1. Here the filter fails before completing the estimation cycle, except for a few cases where λ is small enough to effectively shrink the ensemble spread and compensate for the large initial covariance. For example, with λ = −9, we have N − 9 = 1 and, therefore, the ensemble spread in (24) is simply X j =x a k ± √ P xx . For even larger initial covariances (α I > 1), the filter fails regardless of the value of λ. We noticed that in many of the cases that failed, the parameter estimate for φ was becoming negative (which is unrealistic for a rate) or quite large (φ > 1), and that the state estimate for n was going outside of its biophysical range of 0 to 1. When the gating variable extends outside of its dynamical range it can skew the estimated statistics and the filter may be unable to recover. The standard UKF framework does not provide a natural way of incorporating bounds on parameter estimates, and we do not apply any for the results presented here. However, we did find that we can modify our numerical integration scheme to prevent the filter from failing in many of these cases, as shown in Fig. 9(C). Specifically, if n becomes negative or exceeds 1 after the update step, then artificially setting n to 0 or 1 in the  Fig. 6 for parameter estimates with λ = 5, α I = 0.001. C: Same as (A), but with a modification to the numerical integration scheme that restricts the gating variable n to remain within its biophysical range of 0 to 1 modified Euler method (47) before proceeding can enable the filter to reach the end of the estimation window and yield reasonable parameter estimates.

Results with 4D-Var
The following results illustrate the use of weak 4D-Var. One can minimize the cost function (36) using a favorite choice of optimization routine. For the following examples, we will consider a local optimizer by using interior point optimization with MATLAB's built-in solver fmincon. At the heart of the solver is a Newton-step which uses information about the Hessian, or a conjugate gradient step using gradient information [27][28][29]. The input we are optimizing over conceptually takes the form of . . . , V N , n 0 , n 1 , . . . , n N , θ 1 , θ 2 resulting in an (N + 1)L + D-dimensional estimation problem where L = 2. There are computational limitations with memory storage and the time required to sufficiently solve the optimization problem to a suitable tolerance for reasonable parameter estimates. Therefore, we cannot be cavalier with using as much data with 4D-Var as we did with the UKF, as that would result in a (200,001)2 + 8 = 400,010 dimensional problem. Using Newton's method (41) on this problem would involve inverting a Hessian matrix of size (400,010) 2 , which according to a rough calculation would require over 1 TB of RAM. Initialization of the optimization is shown on line 71 of w4DVarML.m. The estimated parameters are given in Table 3. These results were run using N = 2001 time points. To simplify the search space, the parameter estimates were constrained between the bounds listed in Table 4. These ranges were chosen to ensure that the maximal conductances, the rate φ, and the activation curve slope V 2 all remain positive. We found that running 4D-Var with even looser bounds (Table A1) yielded less accurate parameter estimates (Tables A2 and A3). The white noise perturbations for the 4D-Var trials were the same as those from the UKF examples. Initial guesses for the states at each time point are required. For these trials, V is initialized as V obs , and n is initialized as the result of integration of its dynamics forced with V obs using the initial guesses for the parameters, i.e., n = f n (V obs , n; θ 0 ). The initial guesses are generated beginning on line 38 of w4DvarML.m. We impose that Q −1 in (36) is a diagonal matrix with entries α Q [1, 100 2 ] to balance the dynamical variance of V and n. The scaling factor α Q represents the relative weight of the model term compared to the measurement term. Based on preliminary tuning experiments, we set α Q = 100 for the results presented. Figure 10 depicts the states produced by integrating the model with the estimated parameters across different iterations within the interior-point optimization. Over iteration cycles, the geometry of spikes as well as the spike time alignments eventually coincide with the noiseless data V true . Figure 11 shows the evolution of the parameters across the entire estimation cycle. For the UKF, the "plateauing" effect of the parameter estimates seen in Fig. 6 indicates confidence that they are conforming to being constant in time. With 4D-Var, and in a limiting sense of the UKF, the plateauing effect indicates the parameters are settling into a local minimum of the cost function.  Table 3) In Fig. 12 we show the bifurcation diagrams of the estimated models from our 4D-Var trials. Notice, and shown explicitly in Table 3, when initializing with the true parameters, the correct model parameters are recovered as our optimization routine is confidently within the basin of attraction of the global minimum. In the UKF, comparatively, there is no sense of stopping at a local minimum. Parameter estimates may still fluctuate even when starting from their true values, unless the variances of the state components fall to very low values and the covariance Q k can be tuned to have a baseline variability in the system. The parameter sets for the SNIC and homoclinic bifurcation regimes only deviate in the φ parameter, and so our optimization had great success estimating one from the other. The kinetic parameters (V 3 and V 4 ) for the Hopf regime deviate quite a bit from the SNIC or homoclinic. Still, the recovered bifurcation structures from estimated parameters associated with trials involving HOPF remained consistent with the true structure.
A drawback of the results shown in Table 3 is that for the default tolerances in fmincon, some runs took more than two days to complete on a dedicated core. Figure 11 shows that the optimal solution had essentially been found after 22,000 iterations; however, the optimizer kept running for several thousand more iterations before the convergence tolerances were met. Rather than attempting to speed up these computations by adjusting the algorithmic parameters associated with this solver for this specific problem, we decided to try to exploit the dynamic structure of the model equations using automatic differentiation (AD). AD deconstructs derivatives of the objective function into elementary functions and operations through the chain rule. We used the MATLAB AD tool ADiGator, which performs source transformation via operator overloading and has scripts available for simple integration with various optimization tools, including fmincon [30]. For the same problem scenario and algorithmic parameters, we additionally passed in the generated gradient and Hessian functions to the solver. For this problem, the Hessian structure is shown in Fig. 13. Note that we are using a very simple scheme in the modified Euler method (47) to perform numerical integration between observation points, and the states at k + 1 only have dependencies upon those at k and on the parameters. Higher order methods, including implicit methods, can be employed naturally since the system is being   Table 3 estimated simultaneously. A tutorial specific to collocation methods for optimization has been developed [31].
The results are shown in Table A4. Each twin experiment scenario took, at most, a few minutes on a dedicated core. These trials converged to the optimal solution in much fewer iterations than the trials without using the Hessian. Since convergence  . 14 (A) Logarithm of the value of the cost function for a twin experiment initialized with parameters from the Hopf regime but observational data from the SNIC regime. The iterates were generated from fmincon with provided Hessian and gradient functions. (B) Bifurcation diagrams produced from parameter estimates for selected iterations. The blue is the initial bifurcation structure, the gray is the true bifurcation structure for the parameters that generated the observed data, the red is the bifurcation structure of the iterates, and the green is the bifurcation structure of the optimal point determined by fmincon was achieved within a few dozen iterations, we decided to inspect how the bifurcation structure of the estimated model evolved throughout the process for the case of HOPF to SNIC. Figure 14 shows that by Iteration 10, the objective function value has decreased greatly, and parameters that produce a qualitatively correct bifurcation structure have been found. The optimization continues for another 37 iterations and explores other parts of parameter space that do not yield the correct bifurcation structure before converging very close to the true parameter values.
Again, these results, at best, can reflect only locally optimal solutions of the optimization manifold. The 4D-Var framework has been applied to neuroscience using a more systematic approach to finding the global optimum. In [32], a population of initial states x is optimized in parallel with an outer loop that incorporates an annealing algorithm. The annealing parameter relates the weights of the two summations in (36), and the iteration proceeds by increasing the weight given to the model error compared to the measurement error.
We also wished to understand more about the sensitivity of this problem to initial conditions. We initialized the system with the voltage states as those of the observation, the parameters as those of the initializing guess bifurcation regime, and the gating variable [n 0 , n 1 , . . . n N ] to be i.i.d. from U(0, 1). The results confirm our suspicions that multiple local minima exist. For 100 different initializations of n, for the problem of going from SNIC to HOPF, 63 were found to fall into a deeper minima, yielding better estimates and a smaller objective function value, while 16 fell into a shallower minima, and the rest into three different even shallower minima. While one cannot truly visualize high-dimensional manifolds, one can try to visualize a subset of the surface. Figure 3 shows the surface that arises from evaluating the objective function on a linear combination of the two deepest minima and an initial condition x 0 , which eventually landed in the shallower of the two minima as points in 4010dimensional space.

Application to Bursting Regimes of the Morris-Lecar Model
Many types of neurons display burst firing, consisting of groups of spikes separated by periods of quiescence. Bursting arises from the interplay of fast currents that generate spiking and slow currents that modulate the spiking activity. The Morris-Lecar model can be modified to exhibit bursting by including a calcium-gated potassium (K Ca ) current that depends on slow intracellular calcium dynamics [33]: Bursting can be analyzed mathematically by decomposing models into fast and slow subsystems and applying geometric singular perturbation theory. Several different types of bursters have been classified based on the bifurcation structure of the fast subsystem. In square-wave bursting, the active phase of the burst is initiated at a saddle-node bifurcation and terminates at a homoclinic bifurcation. In elliptic bursting, spiking begins at a Hopf bifurcation and terminates at a saddle-node of periodic orbits bifurcation. The voltage traces produced by these two types of bursting are quite distinct, as shown in Fig. 15.

Results with UKF
We conducted a set of twin experiments for the bursting model to address the same question as we did for the spiking model: from a voltage trace alone, can DA methods estimate parameters that yield the appropriate qualitative dynamical behavior? Specifically, we simulated data from the square-wave (elliptic) bursting regime, and then initialized the UKF with parameter guesses corresponding to elliptic (squarewave) bursting (these parameter values are shown in Table 5). As a control experiment, we also ran the UKF with initial parameter guesses corresponding to the same bursting regime as the observed data. The observed voltage trace included additive white noise generated following the same protocol as in previous trials. We used 200,001 time points with observations at every 1 ms. Between observations, the system was integrated forward using substeps of 0.025 ms. For the square-wave burster, this included 215 bursts with 4 spikes per burst, and 225 bursts with 2 spikes for the elliptic burster. The small parameters ε and μ in the calcium dynamics equation were assumed to be known and were not estimated by the UKF. Thus, for the bursting model, we are estimating one additional state variable (Ca) and one additional parameter (g KCa ) compared to the case for the spiking model. Table 6 shows the UKF parameter estimates after initialization with either the true parameters or the parameters producing the other type of bursting. The results for either case are quite consistent and fairly close to their true values for both types of bursting. Since small changes in parameter values can affect bursting dynamics, we also computed bifurcation diagrams for these estimated parameters and compared them to their true counterparts. Figure 16 shows that in all four cases, the estimated models have the same qualitative bifurcation structure as the models that produced the data. The recovered parameter estimates were insensitive to the initial conditions for n and Ca, with 100 different initializations for these state variables sampled from U(0, 1) and U(0, 5), respectively. Note, most Fig. 15 Bursting model bifurcation diagrams and trajectories. The bifurcation diagrams (top row) depict stable fixed points (red), unstable fixed points (black), stable limit cycles (blue), and unstable limit cycles (green) of the fast subsystem (V , n) with bifurcation parameter z. The gray curves are the projection of the 3-D burst trajectory (V , second row; n, third row; Ca, fourth row) onto the (V , z) plane, where z is a function of Ca. (A) During the quiescent phase of the burst, Ca and therefore z are decreasing and the trajectory slowly moves leftward along the lower stable branch of fixed points until reaching the saddle-node bifurcation or "knee", at which point spiking begins. During spiking, Ca and z are slowly increasing and the trajectory oscillates while traveling rightward until the stable limit cycle is destroyed at a homoclinic bifurcation and spiking ceases. (B) During the quiescent phase of the burst, z is decreasing and the trajectory moves leftward along the branch of stable fixed points with small-amplitude decaying oscillations until reaching the Hopf bifurcation, at which point the oscillations grow in amplitude to full spikes. During spiking, z is slowly increasing and the trajectory oscillates while traveling rightward until the stable limit cycle is destroyed at a saddle-node of periodic orbits bifurcation and spiking ceases  Table 6 predominantly in the top right panel, the location of the bifurcations is relatively sensitive to small deviations in certain parameters, such as g KCa . Estimating g KCa is challenging due to the algebraic degeneracy of estimating both terms involved in the conductance G KCa = g KCa Ca/(Ca + 1), and the inherent time-scale disparity of the Ca dynamics compared to V and n. If one had observations of calcium, or full knowledge of its dynamical equations, this degeneracy would be immediately alleviated. To address difficulties in the estimation of bursting models, an approach that separates the estimation problem into two stages based on timescales-first estimating the slow dynamics with the fast dynamics blocked and then estimating the fast dynamics with the slow parameters held fixed-has been developed [34].

Results with 4D-Var
We also investigated the utility of variational techniques to recover the mechanisms of bursting. For these runs, we took our observations to be coarsely sampled at 0.1 ms, and our forward mapping is taken to be one step of modified Euler between observation times, as was the case for our previous 4D-Var Morris-Lecar results. We used 10,000 time points, which is one burst for the square wave burster, and one full burst plus another spike for the elliptic burster. We used the L-BFGS-B method [35], as we found it to perform faster for this problem than fmincon. This method approximates the Broyden-Fletcher-Goldfarb-Shanno (BFGS) quasi-Newton algorithm using a limited memory (L) inverse Hessian approximation, with an extension to handle bound constraints (B). It is available for Windows through the OPTI toolbox [36] or through a nonspecific operating system MATLAB MEX wrapper [37]. We supplied the gradient of the objective function, but allowed the solver to define the limitedmemory Hessian approximation for our 30,012-dimensional problem. The results are captured in Table 7. We performed the same tests with providing the Hessian; however, there was no significant gain in accuracy or speed. The value for g KCa for initializing with the square wave parameters and estimating the elliptical parameters is quite off, which reflects our earlier assessment for the value in observing calcium dynamics. Figure 17 shows that we are still successful in recovering the true bifurcation structure.

Discussion and Conclusions
Data assimilation is a framework by which one can optimally combine measurements and a model of a system. In neuroscience, depending on the neural system of interest, the data we have may unveil only a small subset of the overall activity of the system. For the results presented here, we used simulated data from the Morris-Lecar model with distinct activity based upon different choices for model parameters. We assumed access only to the voltage and the input current, which corresponds to the expected data from a current-clamp recording.  Table 7 We showed the effectiveness of standard implementations of the Unscented Kalman Filter and weak 4D-Var to recover spiking behavior and, in many circumstances, near-exact parameters of interest. We showed that the estimated models undergo the same bifurcations as the model that produced the observed data, even when the initial parameter guesses do not. Additionally, we are also provided with estimates of the states and uncertainties associated with each state and parameter, but for sake of brevity these values were not always displayed. The methods, while not insensitive to noise, have intrinsic weightings of measurement deviations to account for the noise of the observed signal. Results were shown for mild additive noise. We also extended the Morris-Lecar model to exhibit bursting activity and demonstrated the ability to recover these model parameters using the UKF.
The UKF and 4D-Var approaches implemented here both attempt to optimally link a dynamic model of a system to observed data from that system, with error statistics assumed to be Gaussian. Furthermore, both approaches try to approximate the mean (and for the UKF also the variance) of the underlying, unassumed system distributions. The UKF is especially adept at estimating states over long time courses, and if the algorithmic parameters such as the model error can be tuned, then the parameters can be estimated simultaneously. Therefore, if one has access to a long series of data, then the UKF (or an Unscented Kalman Smoother, which uses more history of the data for each update step) is a great tool to have at one's disposal. However, sometimes one only has a small amount of time series data, or the tuning of initial covariance, the spread parameter λ, and the process noise Q k associated with the augmented state and parameter system becomes too daunting. The 4D-Var approach sets the states at each time point and the parameters as optimization variables, transitioning the estimation process from the one which iterates in time to the one which iterates up to a tolerance in a chosen optimization routine. The only tuning parameters are those associated with the chosen optimization routine, and the weights Q −1 l,l , l ∈ [1 . . . L], for the model uncertainty of the state components at each time. There are natural ways to provide parameter bounds in the 4D-Var framework, whereas this is not the case for the UKF. However, depending upon the implementation choices and the dimension of the problem (which is extremely large for long time series data), the optimization may take a computing time scale of days to yield reasonable estimates. Fortunately, derivative information can be provided to the optimizer to speed up the 4D-Var procedure. Both the UKF and 4D-Var can provide estimates of the system uncertainty in addition to estimates of the system mean. The UKF provides mean and variance estimates at each iteration during the analysis step. In 4D-Var, one seeks mean estimates by minimization of a cost function. It has been shown that for cost functions of the form (36), the system variance can be interpreted as the inverse of the Hessian evaluated at minima of (36), and scales roughly as Q for large Q −1 [32]. The pros and cons of implementing these two DA approaches are summarized in Table 8. The UKF and 4D-Var methodologies welcome the addition of any observables of the system, but current-clamp data may be all that is available. With this experimental data in mind, for a more complex system, the number of variables increases, while the total number of observables will remain at unity. Therefore, it may be useful to assess a priori which parameters are structurally identifiable and the sensitivity of the model to parameters of interest in order to reduce the estimation state-space [38]. Additionally, one should consider what manner of applied current to use to aid in state and parameter estimation. In the results presented above, we used a constant applied current, but work has been done which suggests the use of complex time-varying currents that stimulate as many of the model's degrees of freedom as possible [39].
The results we presented are based on MATLAB implementations of the derived equations for the UKF and weak 4D-Var. Sample code is provided in the Supplementary Material. Additional data assimilation examples in MATLAB can be found in [40]. The UKF has been applied to other spiking neuron models such as the FitzHugh-Nagumo model [41]. A sample of this code can be found in [42], as well as further exploration of the UKF in estimating neural systems. The UKF has been used on real data from pyramidal neurons to track the states and externally applied current [43], the connectivity of cultured neuronal networks sampled by a microelectrode array [44], to assimilate seizure data from hippocampal OLM interneurons [15], and to reconstruct mammalian sleep dynamics [17]. A comparative study of the efficacy of the EKF and UKF on conductance-based models has been conducted [45].
The UKF is a particularly good framework for the state dimensions of a single compartment conductance based model as the size of the ensemble is chosen to be 2(L + D) + 1. When considering larger state dimensions, as is the case for PDE models, a more general Ensemble Kalman Filter (EnKF) may be appropriate. An introduction to the EnKF can be found in [46,47]. An adaptive methodology using past innovations to iteratively estimate the model and measurement covariances Q and R has been developed for use with ensemble filters [16]. The Local Ensemble Tranform Kalman Filter (LETKF) [48] has been used to estimate the states associated with cardiac electrical wave dynamics [8]. Rather than estimating the mean and covariance through an ensemble, particle filters aim to fully construct the posterior density of the states conditioned on the observations. A particle filter approach has been applied to infer parameters of a stochastic Morris-Lecar model [49], to assimilate spike train data from rat layer V cortical neurons into a biophysical model [50], and to assimilate noisy, model-generated data for other states to motivate the use of imaging techniques when available [51].
An approach to the variational problem which tries to uncover the global minima more systematically has been developed [32]. In this framework, comparing to (36), they define for diagonal entries of Q −1 that for α > 1 and β ≥ 0. The model term is initialized as relatively small, and over the course of an annealing procedure, β is incremented resulting in a steady increase of the model term's influence on the cost function. This annealing schedule is conducted in parallel for different initial guesses for the state-space. The development of this variational approach can be found in [52], and it has been used to assimilate neuronal data from HVC neurons [34] as well as to calibrate a neuromorphic very large scale integrated (VLSI) circuit [53]. An alternative to the variational approach is to frame the assimilation problem from a probabilistic sampling perspective and use Markov chain Monte-Carlo methods [54]. A closely associated variational technique, known as "nudging", augments the vector field with a control term. If we only have observations of the voltage, this manifests as follows: The vector field with the observational coupling term is now passed into the strong 4D-Var constraints. The control parameter u may remain fixed, or be estimated along with the states [55,56]. More details on nudging can be found [57]. A similar control framework has been applied to data from neurons of the stomatogastric ganglion [58]. Many other approaches outside the framework of data assimilation have been developed for parameter estimation of neuronal models, see [59] for a review. A prob-lem often encountered when fitting models to a voltage trace is that phase shifts, or small differences in spike timing, between model output and the data can result in large root mean square error. This is less of an issue for data assimilation methods, especially sequential algorithms like UKF. Other approaches to avoid harshly penalizing spike timing errors in the cost function are to consider spikes in the data and model-generated spikes that occur within a narrow time window of each other as coincident [60], or to minimize error with respect to the dV /dt versus V phase-plane trajectory rather than V (t) itself [59]. Another way to avoid spike mismatch errors is to force the model with the voltage data and perform linear regression to estimate the linear parameters (maximal conductances), and then perhaps couple the problem with another optimization strategy to access the nonlinearly-dependent gating parameters [3,61,62].
A common optimization strategy is to construct an objective function that encapsulates important features derived from the voltage trace, and then use a genetic algorithm to stochastically search for optimal solutions. These algorithms proceed by forming a population of possible solutions and applying biologically inspired evolution strategies to gradually increase the fitness (defined with respect to the objective function) of the population across generations. Multi-objective optimization schemes will generate a "Pareto front" of optimal solutions that are considered equally good. A multi-objective non-dominated sorting genetic algorithm (NSGA-II) has recently been used to estimate parameters of the pacemaker PD neurons of the crab pyloric network [63,64].
In this paper, we compared the bifurcation structure of models estimated by DA algorithms to the bifurcation structure of the model that generated the data. We found that the estimated models exhibited the correct bifurcations even when the algorithms were initiated in a region of parameter space corresponding to a different bifurcation regime. This type of twin experiment is a useful addition to the field that specifically emphasizes the difficulty of nonlinear estimation and provides a qualitative measure of estimation success or failure. Prior literature on parameter estimation that has made use of geometric structure includes work on bursting respiratory neurons [65] and "inverse bifurcation analysis" of gene regulatory networks [66,67].
Looking forward, data assimilation can complement the growth of new recording technologies for collecting observational data from the brain. The joint collaboration of these automated algorithms with the painstaking work of experimentalists and model developers may help answer many remaining questions about neuronal dynamics.  Tables A2 and A3 Lower Bound Upper Bound

Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.