Rolling variance estimates

March 17, 2025

How fast should animals learn?

Simple models of reward learning, such as the Rescorla–Wagner model, typically assume that animals use an exponentially-weighted moving average (EMA) of past rewards to learn the appetitive values of environmental cues. The behaviour of EMA-based learning models is governed by a single parameter called the learning rate or learning timescale.

The learning rate is usually interpreted as a parameter that sets the speed of learning (learning rate), but it can also be interpreted as a way of setting the timeframe over which rewards are averaged to learn value (learning timescale). A high learning rate means that only very recent rewards are taken into consideration, while a low learning rate means that rewards are averaged over a long period.

How fast should animals learn? The choice of appropriate learning rate is governed by a tradeoff between flexibility and precision: high learning rates allow animals to quickly re-learn associations when their environment changes at the cost of noisy value estimates when the environment is stable (since only a few rewards are being averaged). The opposite is true of slow learning rates.

Behavioural experiments suggest that animals adapt their learning rates to compensate for instability and noisiness of their environments. This has inspired the development of reward learning models that attempt to learn not just value, but also aspects of noise and stability relevant to setting the learning rate.

In this blog post, I’ll explore a couple of simple ways of modelling how animals might estimate reward variability based on past experience, focusing on rolling estimates of variance.

Boxcar variance

A particularly simple to gauge reward variability is to keep a memory of the past few rewards received and calculate the sample variance where the subscript denotes time, is the number of rewards used for the calculation, and is the sample mean reward. If we update the sample variance at each time step using the most recent rewards, then this procedure amounts to using a Boxcar filter to estimate the variance.

Use the slider below to set the size of the boxcar window , then click "Play animation" to see how the variance calculated from the rewards at top (circles) evolves over time. Notice how the estimate shown in blue bounces around the true variance shown in black.

Boxcar window width: 30

Exponentially-weighted moving variance

An alternative way of measuring reward variability is to use an exponentially-weighted moving variance. This approach might be even simpler than the boxcar method above since it can be performed using an error correction rule that doesn't require keeping a memory of past rewards where is a learning rate ().

Compared with the boxcar method, this approach has two important differences: First, there needs to be some initial estimate of variance to build on. Second, the effective window width is parameterized in terms of the learning rate or corresponding learning timescale where instead of the explicit number of samples . To see how these parameters affect the exponentially-weighted moving variance, try adjusting the sliders below. Notice how the weight given to each reward used to calculate the variance, represented in the intensity of blue colour in the circles at top, gradually drops off instead of ending abruptly as with the boxcar method.

Initial variance estimate: 1.0

Exponential time constant: 30

Comparison of boxcar and exponential weighting

Both of the above approaches yield variance estimates that dance around the true value of one, reflecting noise in the estimation of, well, noise. An ideal technique for estimating variance would cling tightly to the true value represented by the horizontal black lines in the widgets above.

Interestingly, the boxcar and exponential weighting approaches don't come equally close to this ideal at comparable window widths. The widget below shows the distribution of estimated variances using the boxcar technique with a window width of or the exponential weighting approach using a learning timescale of . (Monte–Carlo simulation, 5000 draws, exponential weighting simulations were run for samples to wash out initialization effects.) Notice that the distribution of estimates using the exponential weighting approach more closely hugs the true value represented by the vertical black line.

Exponential

To understand why this happens, we can take a look at the variance of the estimated variance for each technique. For the boxcar method, the variance is where represents the variance of the squared error for a single reward . For the exponential weighting method, the variance is Ignoring differences in how the mean is estimated for the sake of simplicity, we can get a rough idea of how these two techniques compare by looking at the ratio term in each equation — a sort of variance scaling factor. The chart below shows the variance scaling factor for the boxcar method in black and for the exponential weighting method in blue.

The fact that the blue line is lower in the chart above means that if and are set to the same value, the exponential weighting method will give a less noisy (ie, lower variance) estimate of the variance.

Another way of looking at this difference between the boxcar and exponential weighting methods is that any desired level of precision can be achieved using a lower value of exponential weighting than boxcar . How much lower? To find out, we can set the variance scaling factors of both methods equal to eachother and solve for as follows where we can approximate as when or so. The graph of vs. from the above equation is shown below. The blue line represents the equation above and the gray line shows as a point of comparison.

If we're happy with the accuracy of an boxcar filter, we should be able to accomplish roughly the same thing using an exponentially-weighted moving variance with . Not bad!

Adapting to changing noise

If animals calibrate their learning rates according to the level of noise in the environment, then a good noise estimate should not only be accurate in the long run, but should also be able to quickly adapt to changes in the level of noise in the environment. Since the analysis above suggests that we can get away with a smaller effective window width using the exponentially-weighted moving variance compared with the boxcar filter, the former may be able to adapt more quickly to changing noise levels while offering similar performance when the noise level is stable.

To see this effect, compare the evolution of the exponentially-weighted moving variance estimate to the equivalent boxcar filter estimate in the widget below. Notice how the exponential estimate initially adapts more quickly when the true level of noise changes.

Window width: 40
Time constant: 20.0

Noise period: 100

Sum up

Estimating reward variability is difficult because measurements of noise are noisy themselves. In this post, I’ve explored and compared two simple approaches for online estimation of reward variance. Compared with the classic sample variance (boxcar filter), an exponentially-weighted moving variance estimate may be able to adapt more quickly to changing noise levels in the environment, providing a better basis for animals to adjust their learning rates.