Skip to main content
Fig. 2 | The Journal of Mathematical Neuroscience

Fig. 2

From: Gradient estimation in dendritic reinforcement learning

Fig. 2

Learning to stay quiescent. (A) Learning curves for cell reinforcement (blue) and zone reinforcement (red) when the neuron should not respond with any somatic firing to one pattern which is repeatedly presented. Values shown are averages over 40 runs with different initial weights and a different input pattern. (B) Distributions of the performance after 1500 trials. (C) A bad run of the CR-rule where performance drops dramatically after the 397th pattern presentation. The grey points show the Euclidean norm of the change ∥ΔW∥ in the neurons weight matrix W, highlighting the excessively large synaptic update after trial 397. (D) Time course of the somatic potential during trial 397 (the straight line at t=219ms marks a somatic spike). As shown more clearly by the blow-up in the bottom row an NMDA-spike occurring at t ∗ =232ms yields a value of U which stays strongly positive for some 10ms. (U drops thereafter because a NMDA-spike in a different zone ends.) Improbably, however, the sustained elevated value of U after t ∗ does not lead to a somatic spike. Hence, the likelihood of the observed somatic response Z given the activity Y ν in the zone ν where the NMDA-spike at time t ∗ occurred is quite small, P( Z [ t ∗ , t ∗ + Δ ] | Y ν )=P( Z [ t ∗ , t ∗ + Δ ] | Y ν ∪{ t ∗ })≈0.017. Indeed, the actual somatic response would have been much more likely without the NMDA-spike, P( Z [ t s , t s + Δ ] | Y ν ∖{ t ∗ })≈0.72. The discrepancy between the two probabilities yields a large value of exp(− γ Y ν ( t ∗ )) in Equation 24, leading to the strong weight change. Error bars in the figure show 1 SEM.

Back to article page