Rolling variance estimates
How fast should animals learn?
Simple models of reward learning, such as the Rescorla–Wagner model, typically assume that animals use an exponentially-weighted moving average (EMA) of past rewards to learn the appetitive values of environmental cues. The behaviour of EMA-based learning models is governed by a single parameter called the learning rate or learning timescale.
The learning rate is usually interpreted as a parameter that sets the speed of learning (learning rate), but it can also be interpreted as a way of setting the timeframe over which rewards are averaged to learn value (learning timescale). A high learning rate means that only very recent rewards are taken into consideration, while a low learning rate means that rewards are averaged over a long period.
How fast should animals learn? The choice of appropriate learning rate is governed by a tradeoff between flexibility and precision: high learning rates allow animals to quickly re-learn associations when their environment changes at the cost of noisy value estimates when the environment is stable (since only a few rewards are being averaged). The opposite is true of slow learning rates.
Behavioural experiments suggest that animals adapt their learning rates to compensate for instability and noisiness of their environments. This has inspired the development of reward learning models that attempt to learn not just value, but also aspects of noise and stability relevant to setting the learning rate.
In this blog post, I’ll explore a couple of simple ways of modelling how animals might estimate reward variability based on past experience, focusing on rolling estimates of variance.
Boxcar variance
A particularly simple to gauge reward variability is to keep a memory of the past
few rewards received and calculate the sample variance where the subscript
denotes time,
is the number of rewards used for the
calculation, and
is the sample mean reward. If we update the sample variance
at each time step
using the most recent
rewards, then
this procedure amounts to using a Boxcar
filter to estimate the variance.
Use the slider below to set the size of the boxcar window , then click "Play
animation" to see how the variance calculated from the rewards at top (circles)
evolves over time. Notice how the estimate shown in blue bounces around the true variance shown in black.
Exponentially-weighted moving variance
An alternative way of measuring reward variability is to use an
exponentially-weighted moving variance. This approach might be even simpler
than the boxcar method above since it can be performed using an error
correction rule that doesn't require keeping a memory of past rewards where
is a learning rate (
).
Compared with the boxcar method, this approach has two important differences: First,
there needs to be some initial estimate of variance to build on. Second,
the effective window width is parameterized in terms of the learning rate
or corresponding learning timescale
where
instead of the explicit number of samples
. To see how these parameters
affect the exponentially-weighted moving variance, try adjusting the sliders
below. Notice how the weight given to each reward used to calculate the
variance, represented in the intensity of blue colour in the circles at top, gradually
drops off instead of ending abruptly as with the boxcar method.
Comparison of boxcar and exponential weighting
Both of the above approaches yield variance estimates that dance around the true value of one, reflecting noise in the estimation of, well, noise. An ideal technique for estimating variance would cling tightly to the true value represented by the horizontal black lines in the widgets above.
Interestingly, the boxcar and exponential weighting approaches don't come
equally close to this ideal at comparable window widths. The widget below shows
the distribution of estimated variances using the boxcar technique with a window
width of or the exponential weighting approach using a learning timescale
of
. (Monte–Carlo simulation, 5000 draws, exponential weighting
simulations were run for
samples to wash out initialization effects.)
Notice that the distribution of estimates using the exponential weighting approach
more closely hugs the true value represented by the vertical
black line.
Exponential
To understand why this happens, we can take a look at the variance of the estimated
variance for each technique. For the boxcar method, the variance is where
represents the variance of the squared error for a single reward
. For the exponential weighting method, the variance is
Ignoring differences in how the mean
is estimated for the sake of
simplicity, we can get a rough idea of how these two techniques compare by
looking at the ratio term in each equation — a sort of variance scaling
factor. The chart below shows the variance scaling factor for the boxcar method
in black and for the exponential weighting method
in blue.
The fact that the blue line
is lower in the chart above means that if and
are set to the same value,
the exponential weighting method will give a less noisy (ie, lower variance) estimate
of the variance.
Another way of looking at this difference between the boxcar and exponential weighting
methods is that any desired level of precision can be achieved using a lower value of
exponential weighting than boxcar
. How much lower? To find out, we can set the variance scaling factors of both methods equal to eachother and solve for
as follows
where we can approximate
as
when
or so. The graph of
vs.
from the above equation is shown below. The blue line represents the equation above and the gray line shows
as a point of comparison.
If we're happy with the accuracy of an boxcar filter, we should be able to accomplish
roughly the same thing using an exponentially-weighted moving variance with
. Not bad!
Adapting to changing noise
If animals calibrate their learning rates according to the level of noise in the environment, then a good noise estimate should not only be accurate in the long run, but should also be able to quickly adapt to changes in the level of noise in the environment. Since the analysis above suggests that we can get away with a smaller effective window width using the exponentially-weighted moving variance compared with the boxcar filter, the former may be able to adapt more quickly to changing noise levels while offering similar performance when the noise level is stable.
To see this effect, compare the evolution of the exponentially-weighted moving variance estimate to the equivalent boxcar filter estimate in the widget below. Notice how the exponential estimate initially adapts more quickly when the true level of noise changes.
Sum up
Estimating reward variability is difficult because measurements of noise are noisy themselves. In this post, I’ve explored and compared two simple approaches for online estimation of reward variance. Compared with the classic sample variance (boxcar filter), an exponentially-weighted moving variance estimate may be able to adapt more quickly to changing noise levels in the environment, providing a better basis for animals to adjust their learning rates.