Sparse identification of contrast gain control in the fruit fly photoreceptor and amacrine cell layer

The fruit fly’s natural visual environment is often characterized by light intensities ranging across several orders of magnitude and by rapidly varying contrast across space and time. Fruit fly photoreceptors robustly transduce and, in conjunction with amacrine cells, process visual scenes and provide the resulting signal to downstream targets. Here, we model the first step of visual processing in the photoreceptor-amacrine cell layer. We propose a novel divisive normalization processor (DNP) for modeling the computation taking place in the photoreceptor-amacrine cell layer. The DNP explicitly models the photoreceptor feedforward and temporal feedback processing paths and the spatio-temporal feedback path of the amacrine cells. We then formally characterize the contrast gain control of the DNP and provide sparse identification algorithms that can efficiently identify each the feedforward and feedback DNP components. The algorithms presented here are the first demonstration of tractable and robust identification of the components of a divisive normalization processor. The sparse identification algorithms can be readily employed in experimental settings, and their effectiveness is demonstrated with several examples.


Introduction
Sensory processing systems in the brain extract relevant information from stimuli whose amplitude can vary orders of magnitude [1][2][3][4]. Consequently, at each layer of processing, starting right from sensory transduction, neurons need to map their output into a range that can be effectively processed by subsequent neural circuits. As an example, photoreceptors [5][6][7][8] and olfactory receptor neurons [9,10] in both vertebrates and invertebrates respectively adapt to a large range of intensity/temporal contrast values of visual and odorant stimuli. Adaptation to mean and variance of the stimuli has been observed in the auditory system [11,12] as well. Further down the visual pathway, motion sensitive neurons in vertebrates and invertebrates have been shown to be robust at various brightness and contrast levels [13][14][15].
Early visual circuits, such as the photoreceptor/amacrine cell layer of the fruit fly brain, are believed to perform spatio-temporal intensity and contrast gain control for dynamic adaptation to visual stimuli whose intensity and contrast levels vary orders of magnitude both in space and time. The mechanism underlying temporal and spatio-temporal contrast gain control in the vertebrate retina [16][17][18][19][20] has often been characterized as a change of the receptive fields in response to changing stimulus statistics [21]. However, while linear receptive fields can be estimated at each transition between different gain values, characterizing the full dynamics of contrast gain control received little attention. Current theoretical methods for describing spatio-temporal gain control lack a systematic framework for characterizing its dynamics, and identification algorithms to estimate circuit components are generally not available.
One exception is the use of theory of dynamical systems for modeling nonlinear gain control. Examples include the Volterra series expansion [22,23] and the nonlinear autoregressive moving average model with exogenous inputs (NARMAX) [24,25]. Both formalisms exhibit extensive rigorous tools of functional identification. However, the former typically requires high-order Volterra kernels to model the highly nonlinear gain control, and its identification suffers from the curse of dimensionality. In practice, using the second-or third-order Volterra kernels is computationally not tractable with commodity hardware, while at the same time not fully capturing the dynamics of gain control. For the latter, an extension to the spatio-temporal domain is often out of reach.
Divisive normalization provides an alternative nonlinear operator to model gain control. Divisive normalization [26] has been proposed as a canonical circuit model of computation for many sensory processing circuits underlying adaptation and attention [4,[27][28][29][30][31]. Divisive normalization models in sensory systems are often associated with a population of neurons where each receives inputs from the pool [26,32]. Recent studies have shown that divisive normalization is a suitable candidate for describing the contrast gain control and its dynamics [33].
Existing modeling studies of divisive normalization in sensory systems mostly focus on establishing a connection between gain control and the statistical properties of natural sensory stimuli [28,[34][35][36]. There is a lack of general mathematical framework for identifying divisive normalization circuits from recorded data.
In this paper, we address the above issues by modeling the photoreceptor/amacrine cells layer of the fruit fly as a multi-input multi-output (MIMO) feedforward and feedback temporal and spatio-temporal divisive normalization processor (DNP). The MIMO DNPs are built upon temporal and spatio-temporal feedforward and feedback of divisive normalization operators constructed from low-dimensional Volterra operators. Combining with sparse identification methods [42], we provide efficient algorithms for identifying all the components of the temporal as well as spatio-temporal divisive normalization processors.
This manuscript is organized as follows. In Sect. 2 the overall architecture of the divisive normalization processor (DNP) is introduced and its power of modeling contrast gain control is demonstrated. We first describe in Sect. 2.1 the biological model of photoreceptoramacrine cell layer. In Sect. 2.2, we introduce a general model for divisive normalization in the time domain. The temporal DNP consists of the ratio of two nonlinear functionals acting on the input stimulus. In Sect. 2.3 we then extend the model to space-time domain to include models of lateral feedback from amacrine cells and demonstrate its processing power. In Sects. 3 and 4, we provide identification algorithms and show that the temporal and spatio-temporal DNPs can be efficiently identified. We demonstrate the effectiveness of the algorithms with several examples. We conclude the paper with a discussion in Sect. 5.

The architecture of divisive normalization processors
In Sect. 2.1 we start by motivating the present work. We then introduce the architecture of divisive normalization processors in the time domain (Sect. 2.2) and space-time domain (Sect. 2.3). Finally, in Appendix 1 we provide examples that characterize the I/O mapping of the class of temporal and spatio-temporal divisive normalization processors previously described.

Modeling the photoreceptors and amacrine cells layer
In what follows, we anchor the model description around the photoreceptor-amacrine cell layer of the fruit fly. The fly retina consists of ∼800 ommatidia, each of which hosts photoreceptors whose axons terminate in a secondary neuropil called lamina. There, they provide inputs to columnar large monopolar cells (LMCs) that project to the third visual neuropil, and to amacrine cells [43]. Amacrine cells are interneurons that innervate axon terminals of multiple photoreceptors. The photoreceptors, in turn, receive lateral feedback from the amacrine cells as well as feedback from LMCs such as L2 neurons [41,44,45].
A circuit diagram of the photoreceptor-amacrine cell layer is shown in Fig. 1. For the sake of clarity, we assume here that an ommatidium consists of a single photoreceptor. It has been shown that the outputs of photoreceptors exhibit rapid gain control through both the phototransduction process and the interaction with such feedback loops [25,[46][47][48][49].
In what follows we propose a model comprising nonlinear transformations combined with divisive normalization that can model such gain control and can account for diverse dynamics. We will also show that the model we propose can be systematically identified from observed input output pairs.

Divisive normalization processors in the time domain
In this section we present the modeling of temporal stimuli in 2.2.1 and introduce a class of temporal divisive normalization processors for modeling photoreceptors in Sect. 2.2.2.

Modeling temporal stimuli
We model the temporal varying stimuli u 1 = u 1 (t), t ∈ D ⊆ R, to be real-valued elements of the space of trigonometric polynomials [50]. The choice of the space of the trigonometric polynomials has, as we will see, substantial computational advantages. A temporal stimulus models the visual field arising at the input of a single photoreceptor.

Definition 1
The space of trigonometric polynomials H 1 is the Hilbert space of complexvalued functions Here, Ω denotes the bandwidth and L the order of the space. Stimuli u 1 ∈ H 1 are extended to be periodic over R with period S = 2π L Ω .

Temporal divisive normalization processors
We first consider single photoreceptors without feedback from the amacrine cells shown in Fig. 1. A schematic of the temporal divisive normalization processor (DNP) modeling the photoreceptor is shown in Fig. 2. For notational simplicity, we consider a single photoreceptor here. The input visual stimulus to the photoreceptor is denoted by u = u(t), t ∈ D, and the output electric current by v = v(t), t ∈ D.
Remark 1 Note that, in a single photoreceptor, photons are first absorbed by a large number of microvilli [47] (not shown). Microvilli generate "quantum bumps" in response to photons; the photoreceptor aggregates the bumps and in the process creates the transduction current. Calcium ion influx and calcium diffusion into the photoreceptor cell body may change the sensitivity of the transduction cascade. A high concentration of calcium (buffer) can result in a photon to be ineffective and may also affect the duration and the magnitude of quantum bumps [56].
The DNP consists of (1) a feedforward Volterra processor (VP) T 1 , (2) a feedforward normalization VP T 2 , and (3) a feedback normalization VP T 3 . The output of the photoreceptor amounts to where and Here, b l , l = 1, 2, 3, are the zeroth-order Volterra kernels (constants), h l 1 (t), l = 1, 2, 3, are first-order Volterra kernels (impulse responses of linear filters), and h l 2 (t, s), l = 1, 2, 3, are second-order Volterra kernels. As before, D denotes the domain of the input space, and Remark 2 For the sake of tractability, we limit each nonlinear functional in (6) and (7) to be composed of only first-and second-order Volterra kernels. The division in (5) allows us, however, to model nonlinear processing of much higher orders.
We note that v in (5) is invariant under scaling by the same factor of the numerator and denominator. Hence, without loss of generality, we will assume b 2 + b 3 = 1. We will also assume that the DNP is bounded-input bounded-output [57].

Modeling temporal DNP feedback filters
Here, we define the filter kernels in equations (6) and (7).

Definition 3
Let h l p ∈ L 1 (D p ), l = 1, 2, p = 1, 2, where L 1 denotes the space of Lebesgue integrable functions. The operator P 1 : L 1 (D) → H 1 given by is called the projection operator from L 1 (D) to H 1 . Similarly, the operator P 2 : L 1 (D 2 ) → H 2 given by is called the projection operator from L 1 (D 2 ) to H 2 .
Note that, for u n ∈ H 1 , P 1 u n = u n . Moreover, with u n 2 (t 1 , t 2 ) = u n (t 1 )u n (t 2 ), P 2 u n 2 = u n 2 . Thus, and by assuming that h l 1 ∈ H 1 , l = 1, 2, and h l 2 ∈ H 2 we recover the simple form of equation (6).
We model the output waveforms v o = v o (t), t ∈ D ⊆ R, to be real-valued elements of the space of trigonometric polynomials [50].

Definition 4 The space of trigonometric polynomials
Here, Ω o denotes the bandwidth and L o the order of the space. The output waveforms v o ∈ H o 1 are extended to be periodic over R with period H o 1 is a reproducing kernel Hilbert space (RKHS) [51] with reproducing kernel (RK) Note that dim(H o 2 ) = dim(H o 1 ) 2 = (2L o + 1) 2 .

Definition 6
Let h 3 p ∈ L 1 (D p ), p = 1, 2, where L 1 denotes the space of Lebesgue integrable functions. The operator P o 1 : L 1 (D) → H o 1 given by is called the projection operator from L 1 (D) to H 1 . Similarly, the operator P o 2 : is called the projection operator from L 1 (D 2 ) to H o 2 .
We note that If we now assume that h 3 3 2 , respectively, and the above equation is identical with equation (7) above.

Divisive normalization processors in the space-time domain
In Sect. 2.2, we described a temporal divisive normalization processor model. The normalization term was the sum of a processed version of the input and the output. However, many biological circuits are thought to exhibit lateral inhibition and gain control [26,[58][59][60][61]. An example is provided by the photoreceptor-amacrine cell layer shown in Fig. 1. In Fig. 3, we provide a model of the schematic in Fig. 1. This model is a 'circuit' extension of the one shown in Fig. 2 and accounts explicitly for lateral inhibition. We anchor the extension in the visual system where the input domain is spatio-temporal.
In Sect. 2.3.1 we model spatio-temporal stimuli and in Sect. 2.3.2 the spatio-temporal divisive normalization processors.

Modeling spatio-temporal stimuli
In this section we provide a model of the interaction between a group of photoreceptors and an amacrine cell. In each photoreceptor, the phototransduction process converts light into current and excites the membrane of the photoreceptor. The voltage signal is then propagated through its axon to the lamina. While photoreceptors provide inputs to amacrine cells, their axon terminals also receive amacrine cell input. Since an amacrine cell innervates multiple lamina cartridges, it provides spatial feedback to several photoreceptors in a small neighborhood.
We extend the temporal divisive normalization processor depicted in Fig. 2 to process spatio-temporal stimuli as shown in Fig. 3. Each spatially sampled point, or pixel, denoted as u i (t), i = 1, 2, . . . , N , is first processed by a temporal DNP. For simplicity rather than picking different Volterra kernels for each branch, T 1 , T 2 , and T 3 are shared across branches. In addition, we introduce a multi-input Volterra processor (MVP) to model the spatiotemporal feedback due to the amacrine cell. Each of the branches in Fig. 3, without the input of the MVP block, is equivalent to the model in Fig. 2.

Spatio-temporal divisive normalization processors
As shown in Fig. 3(b), MVPs are comprised of second-order filters acting on DNP output pairs in addition to linear filters independently acting on each DNP output. Thus, the inputs to the MVP are the DNP outputs v i (t), i = 1, 2, . . . , N , and the MVP output amounts to where b 4 is the zeros-th-order Volterra kernel (constant). Furthermore, h i4 1 ∈ H o 1 , i = 1, 2, . . . , N , are the first-order Volterra kernels whose inputs are v i , i = 1, 2, . . . , N , respectively, and . . , N , are the second-order Volterra kernels whose inputs are the pairs The full model in Fig. 3 thus consists of parallel channels as depicted in Fig. 2 with the added cross-channel feedback normalization/gain control provided by the MVP block.
The overall output of the spatio-temporal DNP can be expressed as W.L.O.G., we assume that b 2 + b 3 + b 4 = 1.

Spatio-temporal DNPs and contrast gain control
The

Sparse identification of temporal DNPs
In what follows we derive sparse identification algorithms for the components of spatiotemporal DNPs depicted in Fig. 2 and formally defined in equation (5). In what follows, we assume that during experimental trials, the single isolated photoreceptor n is presented with M test stimuli u nm = u nm (t), m = 1, 2, . . . , M, and for each trial the outputs v nm = v nm (t), m = 1, 2, . . . , M, are recorded. The objective is to identify the model components b 1 , h l 1 , h l 2 , l = 1, 2, 3, from the knowledge of the inputs and outputs.
Proof See Appendix 2.
Remark 3 From (21), it can be seen that the identification of the divisive normalization processor has been reformulated as a generalized sampling problem [62] Subsequently, the divisive normalization model can be identified by solving a system of linear equations.
In order to solve the system of linear equations in (21), we rewrite them first in matrix form. (21) can be expressed in matrix form as follows:

Sampling vectors
Proof See Appendix 3 for more notation and detailed proof.
A necessary condition on the number of trials and the number of measurements required for identifying the divisive normalization processor for solving the system of equations in Theorem 1 is that the number of trials M ≥ 3 + 2 · dim(H 1 ) and the number of It is easy to see that solving the system of equations above suffers from the curse of dimensionality. As the dimension of H 1 increases, the number of samples needed to identify components increases quadratically. Note that the second-order Volterra kernels h l 2 , l = 1, 2, 3, have unique symmetric forms with orthogonal expansions as follows [57]: where g kl 1 ∈ H 1 , k ∈ N, are orthogonal to each other. In what follows, we assume that the second-order Volterra kernels are sparse, i.e., λ kl = 0 for k > r l , where r l dim(H 1 ). Sparse kernels often arise in modeling sensory processing, e.g., in complex cells in the primary visual cortex [42]. By exploiting the sparse structure of the second-order kernels, the identification problem can be made tractable.
The sparsity of the kernels can be translated into a low-rank condition on the matrix representation of h l 2 , l = 1, 2, 3 (see also Appendix 3). Ideally, the optimization problem would be a rank minimization problem. But rank minimization being NP-hard, we use the surrogate of nuclear norm minimization instead, which is the convex envelope of the rank operator [63].
To perform sparse identification of the divisive normalization processor, we devised Algorithm 1. By optimizing over c 1 and C 2 and subsequently assigning the corresponding block entries according to (25), Algorithm 1 identifies b 1 , h i 1 , i = 1, 2, 3, and H i 2 , i = 1, 2, 3. As a surrogate of rank minimization, Algorithm 1 minimizes a linear combination of the nuclear norm of C 2 and the Euclidean norm of c 1 . The optimization constraints correspond (i) in (29) to the generalized sampling problem allowing certain amount of error, (ii) in (30) to zero mean slack variables, (iii) in (31) to the zeros in the two blocks in the top-right of C 2 , (iv) in (32) to the zeros in the block in the lower-left of C 2 , and (v) in (33), (34), and (35), respectively, H 1 2 , H 2 2 , and H 3 2 are Hermitian.
Algorithm 1 c 1 and C 2 are the solution to the following optimization problem: where · * denotes the nuclear norm defined as C 2 * = Tr((C H 2 C 2 ) 1 2 ), λ 1 , λ 2 are appropriately chosen hyperparameters, ε i ∈ R, i = 1, 2, . . . , MT , represent slack variables, 1 represents a vector of all ones, I p represents a p × p identity matrix, 0 p represents a p × p matrix of all zeros, 0 p×q represents a p × q matrix of all zeros, and 0 represents a matrix of all zeros with dimensions that make the equation consistent.

Theorem 1
The filters of the DNP are identified asb 1 =b 1 and Remark 4 By exploiting the structure of low-rank second-order Volterra kernels, Algorithm 1 provides a tractable solution to the identification of the components of the divisive normalization processor.

Examples of sparse identification of temporal DNPs
We provide here identification examples solved using Algorithm 1.
Example 1 Here, we identify a temporal divisive normalization processor in Fig. 2, where h 1 1 (t) = 2.472 × 10 10 t 3 e -100π t cos(36πt), h 1 2 (t, s) = 9.038 × 10 19 t 3 s 3 e -100π (t+s) cos(52πt) cos(52πs) We choose the input space H 1 to have L = 10, Ω = 100π . Thus S = 0.2s , dim(H 1 ) = 21, and dim(H 2 ) = 441. Note that all three second-order Volterra kernels exhibit low-rank structure. We presented the model with 25 stimuli from H 1 , whose coefficients were chosen to be i.i.d Gaussian variables. Then, a total of 425 measurements were used from the input and the observed output pairs to solve the identification problem using Algorithm 1. The results of the identification are shown in Fig. 4. As can be seen from the figure, Algorithm 1 was able to identify the model with high precision using only 425 measurements, much less than the 1387 measurements that would have been required to solve the generalized sampling problem directly. The factor of reduction in the required measurements is critical when the model needs to be identified in a much larger space, for example, a space of spatio-temporal stimuli as shown in the next example.
Example 2 Here, we identify a detailed biophysical model of Drosophila photoreceptors as a DNP with feedforward only (see Fig. 2). The detailed biophysical model consists of 30,000 microvilli [5,64]. The photon absorption in each microvillus is described by a Poisson process whose rate is proportional to the number of photons per microvillus incident on the ommatidium. Photon absorption leads to a transduction process governed by a cascade of chemical reactions. The entire transduction process is described by 13 successive differential equations and is given in [64]. The total number of equations of the photoreceptor model is 390, 000. For identifying the DNP, we used natural images from the van Hateren database [65] and simulated movements of a fly superimposed on the natural scenes. Each visual scene was captured by a previously developed retina model [64], endowed with realistic optics and the geometry of the fly retina. A 200-second stimulus was generated. We then divided the stimulus into several segments, each of which scaled to a different level of mean light intensity. The resulting stimulus is shown in Fig. 5(a). The bandwidth of the DNP input and output spaces was limited to 50 Hz.
The visual input was presented to the DNP photoreceptor model. We used 10 seconds of the stimulus for identifying the DNP filters. The identified filters are depicted in Fig. 6. The output of the photoreceptor DNP model when stimulated by the other 190 seconds of the input stimulus is shown in Fig. 5(b). We evaluated the identified photoreceptor DNP model by computing the SNR between the output of the detailed biophysical model and that of the identified DNP model. The SNR was 26.14 [dB].
We additionally trained a model without normalization (i.e., with only T 1 ). As shown in Fig. 5(b) the identified model without normalization does not match the output well across different light intensities. Compared with the output of the DNP model, the SNR was only 14.48 [dB].
The DNP filters were identified using naturalistic stimuli. To test if the identified filters can also match the output of other types of stimuli, we presented a Gaussian noise stimulus with a bandwidth of 50 Hz to the photoreceptor model. The resulting output was compared with the output of the detailed biophysical model. As shown in Fig. 7, the output of the DNP model closely follows the actual photoreceptor output, and the SNR was 15.04 [dB]. We note that since the input space is defined as an RKHS, the statistics of the input stimuli do not affect the quality of the DNP output. We also evaluated in Fig. 7 the identification model without normalization. The SNR of the output is 4.55 [dB], a significant decrease from the output of the DNP model.

Sparse identification of spatio-temporal DNPs
In what follows we derive sparse identification algorithms for the components of spatiotemporal DNPs. Given the spatio-temporal divisive normalization processor depicted in Fig. 3, we are interested in identifying all the filters from input and output observations. We formulate an optimization problem, which achieves such identification, with high fidelity and with a relatively small number of measurements.

Deriving the sparse identification algorithm for spatio-temporal DNPs
Here, we make the assumption that the filters h i4 (t), i = 1, 2, . . . , N , and h ij4 2 (t 1 , t 2 ), i, j = 1, 2, . . . , N , are followed by the LPF with bandwidth Ω o and that the output is sampled at a rate of 2f max , where f max = Ω 2π is the maximum of the bandwidth of the filters h i4 (t) and h ij4 2 (t 1 , t 2 ). By abuse of notation, we will use v i (t) to denote the low-passed version of the actual output. Note that, based upon the assumptions on the bandlimited nature of the feedback filters acting on the output, the responses of these filters to the low-passed outputs will be the same as the responses to the actual outputs.
We present M trials where input stimuli u nm are chosen to be elements of the space of trigonometric polynomials H 1 for n = 1, . . . , N , m = 1 We consider here the identification of the entire DNP circuit at once for two reasons. First, as all channels are connected in the spatial domain through the MVP, the inputs to the MVP are the outputs of the entire DNP circuit. Therefore, all outputs are required to identify the MVP. Second, since h i 1 , h i 2 , i = 1, 2, 3, are shared across all channels, fewer trials are needed to identify these filters. We present the following.
Proof See Appendix 4.
Remark 5 Theorem 3 suggests that identifying the lateral divisive normalization model is equivalent to solving a generalized sampling problem with noisy measurements. It also suggests that the output needs to be sampled at a high enough rate, and that the choice of the Hilbert space used to reconstruct the feedback filters is critical since incorrect choices for these parameters can negatively affect the identification by introducing 'noise' in the measurements.
We now present the following algorithm to identify the model that exploits the low-rank constraints imposed on the quadratic filters.
The constraints in Algorithm 2 are similar to those in Algorithm 1. Note that H ij4 are not constrained to be Hermitian. This follows from the assumption that the MVP block may perform asymmetric processing on any pair of inputs to the block.

Theorem 2
The identified spatio-temporal divisive normalization is specified asb 1 =b 1 and · e l 1 (t 1 )e l 2 (t 2 ), where and Remark 6 Assuming that all the second-order kernels in the lateral divisive normalization model have rank r, the expected number of measurements for Algorithm 2 to identify the model is of the order O(r · dim(H 1 ) + N 2 r · dim(H o 1 )). When N is large, the N 2 factor may become prohibitive in identifying the model. Additional assumptions on h ij4 2 may help mitigate this and maintain tractability of solving the identification problem. For example, with the assumption that h ij4 = h 14 1 if i = j and h ij4 2 = h 24 1 otherwise, the expected number of measurements required will be O(r · dim(H 1 ) + Nr · dim(H o 1 )).

An example of sparse identification of a spatio-temporal DNP
We now present an example of identification obtained by Algorithm 2. We demonstrate here that, in addition to the identification of the Volterra kernels operating within each channel, the MVP in the spatio-temporal DNP can be identified.
Example 3 Here, we choose the DNP in Fig. 3 with N = 4, and h ij4 2 (t 1 , t 2 ) = 5000e -1 4 (i-2) 2 e -1 4 (j-2) 2 · 25t 1 e -25t 1 25t 2 e -25t 2 , (68) and the other Volterra kernels are set to 0. Note that Yiyin, and h 1 1 , h 2 1 are shared across all channels. In addition, h ij4 = h ji4 in this case. We assumed knowledge of this symmetric structure of the model and adapted the identification algorithm accordingly and only identified six linear filters and ten quadratic filters.
We performed the identification in the Hilbert space of bandlimited functions with Ω = Ω o = 40π , and we chose the same space for the input stimuli and for both the feedforward filters and the feedback ones. We solved for the filters truncated to a period of 0.4 s, and thus there were 17 coefficients to be identified for the linear filters and 17 × 17 coefficients to be identified for each of the quadratic filters. A total of 1116 measurements (279 measurements from each cell) were used to perform the identification, and the results are depicted in Fig. 8. The average SNR of reconstruction across all filters was more than 150 [dB]. Note that solving the generalized sampling problem directly for the same problem would have required at least 3570 measurements.

Discussion
As already mentioned in the Introduction, the photoreceptor/amacrine cell layer of the early vision system of the fruit fly rapidly adapts to visual stimuli whose intensity and contrast vary orders of magnitude both in space and time.
In this paper we presented a spatio-temporal divisive normalization processor that models the transduction and the contrast gain control in the photoreceptor and amacrine cell layer of the fruit fly. It incorporates processing blocks that explicitly model the feedforward and the temporal feedback path of each photoreceptor and the spatio-temporal feedback from amacrine cells to photoreceptors. We demonstrated that with some simple choice of parameters, the DNP response maintains the contrast of the input visual field across a large range of average spatial luminance values.
We characterized the I/O of the spatio-temporal DNP and highlighted the highly nonlinear behavior of the DNP in contrast gain control. Despite the divisive nonlinearity, we provided an algorithm for the sparse identification of the entire DNP. We showed that the identification of the DNP components can be interpreted as a generalized sampling problem. More importantly, the sparse identification algorithm does not suffer from the curse of dimensionality that would otherwise require a large number of measurements that are quadratically related to the dimension of the input and output spaces.
Neural circuits are often noisy. Although the I/O characterization of MIMO DNPs provided in Sect. 2 is noise free, it can be easily extended to account for noise [52][53][54]. The identification of DNPs can be interpreted as a generalized sampling problem in an RKHS with noisy measurements. The sparse identification algorithms we provided in this paper include slack variables to account for these inaccurate measurements. Similar algorithms have been shown to be robust to noise [42].
Contrast gain control is an intrinsically nonlinear problem. Early approaches to modeling contrast gain control rely on analyzing the Volterra kernels, i.e., linear or nonlinear receptive fields identified when stimuli of different statistics are presented [19,20,23]. The objective of these studies is to observe and characterize the differences between the identified Volterra kernels. To fully capture the nonlinear effect in contrast gain control, these approaches would rely on identifying higher-order Volterra kernels that are too costly to compute and are simply ignored. The contrast gain control model advanced in this paper, however, results from the divisive normalization operation, albeit having Volterra kernels modeling internal filters in the DNP. The division largely expands upon DNP's ability to model higher-order nonlinearities in contrast gain control.
Biologically, divisive normalization can take place in feedforward or feedback circuits, or both [26]. Most divisive normalization models only consider the feedforward normalization circuit. Although normalization can be achieved by a multiplicative feedback mechanism, it mainly serves as a transformation leading to an equivalent form of feedforward divisive normalization when the input is a constant [27,67]. The MIMO DNP models described in this paper explicitly include divisive normalization with both feedforward and feedback paths that can be more readily employed in modeling underlying neuronal circuit mechanisms, such as those arising in the photoreceptor and amacrine cell layer.
Predictive coding often takes the form subtractive negative feedback [68]. The MIMO DNP models described in this paper represent an alternative form of predictive coding. Indeed, divisive input modulation has been proposed to implement predictive coding by calculating the residual errors using division rather than subtraction [69]. In flies, it has been suggested that inhibition in the lamina, where retina photoreceptors and amacrine cells interact, leads to predictive coding [70]. Our study advances a spatio-temporal MIMO model for predictive coding and provides an algorithm to identify the components of the MIMO DNP. The MIMO DNP model opens a new avenue for exploring and quantifying the highly nonlinear nature of sensory processing. The MIMO DNP in Fig. 3 can be further extended to allow the different transformations T i , i = 1, 2, 3, to incorporate spatio-temporal Volterra kernels, thereby making it more versatile for modeling other types of sensory processing, including (i) interactions between cones and horizontal cells in vertebrate retinas [58], (ii) channels/glomeruli in olfactory circuits and interactions between them through local neurons [9,71], and (iii) cross-suppression and gain control in the auditory [12,72], and visual cortices [73][74][75]. The sparse identification algorithms advanced here can be easily extended to identify the MIMO DNPs of these systems as well.

Appendix 1: Spatio-temporal DNPs and contrast gain control
Here, we characterize the I/O of simple DNPs stimulated with three different inputs. In the first example, we evaluate the response of a 1 × 4 DNP under different background light intensity levels. In the second example, contrast gain control exerted by the amacrine cells is demonstrated with the same DNP. In the third example, a DNP consisting of 16 × 16 DNPs tiling a 1536 × 1024 visual field is stimulated with a natural image taken at low, medium, and high luminance values. The DNP output is evaluated with and without the MVP block.
Example 4 Here, we consider a simple 1 × 4 DNP consisting of four photoreceptors and a single amacrine cell receiving inputs from and providing feedback to all four photoreceptors. The choices of the component filters of the DNP are as follows: for the transformations T 1 and T 2 , the kernels are for the transformation T 3 , the kernels are and for the transformation L 4 , the kernels are h i4 1 (t) = 10,000 Here, the bandwidth of H o 1 was chosen to be Ω o = 10 · 2π rad/s. The I/O of the fly's individual photoreceptors in steady state is described by a saturating nonlinearity similar in appearance to a sigmoid when plotted on a logarithmic scale [1]. Photoreceptors can deal with the orders of magnitude of the input stimuli while maintaining their output in a suitable range. In addition, the photoreceptors exhibit adaptation so that the sigmoidal curves shift to adjust to the mean local luminance value [26].
We examine here the steady state response of the above DNP under four different background light intensity levels. At the start of each trial, all four photoreceptors are first adapted to the same background light intensity level. One of the photoreceptors is then subjected to an additional flash of different light intensity with a duration of two seconds, while the inputs to the other three are kept at the same background level. We observe the value of the steady state response of the photoreceptor that receives the additional flash. Figure 9 depicts the relationship between the observed steady state response and the light intensity of the flash at the four background intensity levels. It demonstrates that the response of the DNP is background-dependent, and the overall slope and range are similar across different background levels. With the MVP block, the spatio-temporal DNP can reproduce responses of photoreceptors observed in experimental settings [26,46].
Since all the photoreceptors exhibit a sigmoid like nonlinearity, the output of the retina will be constrained in a suitable current range that can be processed by postsynaptic neurons in the lamina. However, without adaptation to mean local luminance, the saturating nature of the nonlinearities leads to a loss of spatial contrast. The spatial contrast is preserved by spatial gain control or adaptation modeled here with the MVP block.
Example 5 Here, the I/O of DNPs with and without the MVP block is evaluated. Using the same DNP as in the example above, we stimulated the DNP with 25 "images" with a resolution of 1 × 4 pixels. Each image has a different average luminance and root mean Figure 9 The spatio-temporal DNP model can exhibit adaptation to local luminance. A DNP with four photoreceptors and an amacrine cell (see Example 1 for details) is adapted to a background intensity level at the beginning of each trial. One of the photoreceptors is then provided a two-second long flash of light, while the inputs to the other photoreceptors are kept at the background level. The relationship between the steady state response of the photoreceptor and the light intensity of the flash is shown for each of the background levels square (RMS) contrast. The RMS contrast is defined as the standard deviation of pixel intensities normalized by the mean, i.e., and N = 4. These images are shown in the "input" block in Fig. 10, with each bar representing the input intensity to one photoreceptor. Note that the pixels are extended along the y-axis for a quick visual examination. In the "Photoreceptors" block in Fig. 10, the steady state responses of the DNP without the MVP block to the respective inputs are shown. Here, pure black and white represent a response of 0 and 1, respectively. This can be interpreted as a circuit in which the reciprocal connections between photoreceptors and amacrine cells are blocked.
In the "Photoreceptors + Am" block in Fig. 10, the steady state responses of the full DNP to their respective inputs are shown. Comparing the "Photoreceptors" and "Photoreceptors + Am" blocks, the responses of the DNP without feedback are washed out, or exhibit low contrast, particularly at three of the four corners of the 5 × 5 image array, i.e., when either the luminance or contrast is too high or too low. By contrast, the individual bars in the response of the full DNP model are more readily discernible across several orders of magnitude of luminance.  We note that a saturating nonlinearity can maintain its output in a constrained range even when the input varies by orders of magnitude. However, the nonlinearity may lead to a loss of spatial contrast. In contrast, the DNP constrains its output to a suitable range while maintaining the spatial contrast. This is demonstrated in Fig. 11, where the RMS contrast of the input images in Fig. 10 is plotted against the RMS contrasts of the responses for both cases-with and without MVP in the DNP. Spatial contrast, arguably, is an important feature of the image that should be preserved or even enhanced for subsequent stages of extraction of ethologically relevant information from the visual stimulus.
Example 6 Here, we apply a full-scale DNP model to a natural image. The image is taken in raw format so that its pixel values are proportional to the light intensity each pixel is exposed to [65]. The resolution of the image is 1536 × 1024. The image is first divided into 16 × 16 blocks with a four-pixel overlap in each direction. A DNP is assigned to each block, and the filters in the DNP are designed as follows: for the transformations T 1 and T 2 , the kernels are for the transformation T 3 , the kernels are and for the transformation L 4 , the kernels are where (x i , y i ) is the coordinate of pixel i, and (x 0 , y 0 ) is the coordinate of the center pixel in the 16 × 16 block. Here, the bandwidth of H o 1 was chosen to be Ω o = 40 · 2π rad/s. Figure 12 provides a graphical visualization of the feedback processing of the DNP. We note that the RHS of both equalities (83) and (84) can be separated into a temporal term and a spatial term. The temporal terms correspond to temporal Volterra kernels processing the output at each pixel (see cylinders in Fig. 12). The feedback term is then the weighted sum with a weight according to the spatial term. Each weight is the value of a Gaussian function evaluated at the distance between the location of the pixel and the center of the block. Due to the overlaps between blocks, there are pixels that belong to Figure 12 Graphical visualization of the feedback processor of the DNP used in Example 6. The value of the feedback is obtained by a weighted sum of the outputs at each pixel of a 16 × 16 block (only five pixels are shown in each direction) processed by a temporal Volterra operator (represented by a cylinder). Each weight is the value of a Gaussian function evaluated at the distance between the location of the pixel and the center of the block As can be seen from Fig. 13, the response of the DNP with MVP is robust across the different levels of luminance, showing the best quality for all three luminance levels. By contrast, the response of the DNP without MVP is either too dark for low contrast or saturated at high luminance. The contrast within each response is also significantly worse.
It is also not ideal to use only a logarithmic nonlinearity to process the images (second row), as the contrast of the images is not sharp enough.
To conclude we note that, with a simple choice of filters, the spatio-temporal DNP proposed in Sect. 2.3 operates in a range of luminance values spanning several orders of magnitude, much like the photoreceptor-amacrine cell layer. The contrast of the output of the DNP is not compromised while its range remains strongly bounded.