Emerson Harkin

Interactive value prediction model

October 24, 2023

Last month I posted a new pre-print showing that the in vivo activity patterns of serotonin neurons during trace conditioning experiments are well-explained by an adapting code for future reward, a theory I call value prediction. This model is really simple — its behaviour is controlled by just three parameters: the strength of adaptation, the timescale of adaptation, and the timescale over which future rewards are discounted — but it can still be hard to wrap your head around how these parameters interact and how they’re affected by the design of the experiment itself. To get an intuition for value prediction, try playing with the parameters in the widget below!


Experiment parameters

Model parameters

View parameters

The black line represents a weighted average of discounted future reward called value in reinforcement learning theory. Value increases suddenly at the start of a reward trial because the animal realizes it will soon experience an appetitive stimulus. 5.0 seconds after the start of the trial, the reward begins and value starts to decrease as the reward is consumed (because less and less future reward is left to collect). Finally, 6.0 seconds after the start of the trial, the reward ends and value returns to baseline as the animal awaits the next trial.

The blue line represents the activity of a group of serotonin neurons, normalized so that the baseline aligns with value. The counter-intuitive increase in activity associated with adaptation is due to this normalization. To get a more intuitive effect of adaptation, try un-checking View parameters > Normalize value prediction.

Read the full article on bioRxiv. View the widget without the blog post here.

Copyright authorization: Graphs shown in the preset buttons of the widget above are modified from Zhong et al., J Neurosci, 2017, Cohen et al., eLife, 2015, and Matias et al., eLife, 2017 under the Creative Commons Attribution License (CC-BY 4.0).