Skip to content

Conversation

@AlexisWis
Copy link

Added polebalancing tutorial which currently includes:
-Simulation Renderer
-Physics Simulation
-Classical Q-based agent

@clinssen clinssen changed the title Added Polebalancing directory in doc/tutorials Add pole balancing/reinforcement learning tutorial Dec 16, 2024
@clinssen
Copy link
Contributor

See https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1003024:

We would like to point out the similarity of the TD-LTP learning rule to a reward-modulated spike-timing-dependent plasticity rule we call R-STDP [6], [16], [30][32]. In R-STDP, the effects of classic STDP [33][36] are stored into an exponentially decaying, medium term (time constant ), synapse-specific memory, called an eligibility trace. This trace is only imprinted into the actual synaptic weights when a global, neuromodulatory success signal is sent to the synapses. In R-STDP, the neuromodulatory signal is the reward minus a baseline, i.e., . It was shown [32] that for R-STDP to maximize reward, the baseline must precisely match the mean (or expected) reward. In this sense, is a reward prediction error signal; a system to compute this signal is needed. Since the TD error is also a reward prediction error signal, it seems natural to use instead of . This turns the reward-modulated learning rule R-STDP into a TD error-modulated TD-STDP rule (Figure 2A, bottom). In this form, TD-STDP is very similar to TD-LTP. The major difference between the two is the influence of post-before-pre spike pairings on the learning rule: while these are ignored in TD-LTP, they cause a negative contribution to the coincidence detection in TD-STDP.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants