Skip to main content
Fig. 3 | The Journal of Mathematical Neuroscience

Fig. 3

From: Gradient estimation in dendritic reinforcement learning

Fig. 3

Balanced cell reinforcement (bCR, Equation 26) compared to zone reinforcement. (A) Average performance of bCR (green) and ZR (red) on the same task as in panel 6A. (B) Performance when learning stimulus-response associations for four different patterns; bCR (green), ZR (red), a logarithmic scale is used for the x-axis. The inset shows the distribution of NMDA-spike durations after learning the task with bCR. The performance values in the figure are averages over 40 runs, and error bars show 1 SEM. (C) Development of the average reward signal R(Z) for bCR (green) and ZR (red) when the task is to spike at the mid time of the single input pattern (R(Z)=−2/(nT) ∑ i | t i sp − t targ |, where t i sp ∈Z, i=1,…,n, is the i th of the n output spike times, t targ =250ms the target spike time, and T=500ms the pattern duration; if there was no output spike within [0,T) we added one at T, yielding R(Z)=−1). (D) Spike raster plot of the output spike times Z with R(Z) shown in C using bCR. With ZR, the distribution of spike times after 3000 trials roughly corresponds to the one for bCR after 160 trials (vertical line at ∗), where the two performances coincide (see ∗ and black lines in C). The mean and standard deviation of the spike times at the end of the learning process, averaged across the last 300 trials, was 251±45 and 256±121ms for bCR and ZR, respectively.

Back to article page