RPE hypothesis of dopamine

Original citation

Montague, Dayan, and Sejnowski. "A Framework for Mesencephalic Dopamine Systems Based on Predictive Hebbian Learning", The Journal of Neuroscience, 1996.

Read on publisher site

The RPE hypothesis of dopamine is grounded in three main observations about the activity of dopamine neurons in animals learning to associate a stimulus with a reward:

Dopamine neurons are initially activated by rewards, but this response shifts to an earlier reward predicting stimulus over the course of learning.
After an association between stimulus and reward is established, omitting an expected reward causes a dip in dopamine neuron activity.
Unexpected positive stimuli, such as surprise rewards or unpredictable stimuli already associated with a future reward, reliably activate dopamine neurons at any point in learning.

To see how the RPE model captures these observations, use the buttons below to add and remove rewards in a simulation of TD learning.