Monte-Carlo? More like Monte-Car-slow! Navigation using RL

April 1, 2025

How do animals learn to navigate from point A to point B?

Yesterday at COSYNE I listened to an interesting talk from Ann Hermundstad on a keypoint-based strategy for quickly learning how to navigate to a goal in a new environment. One of her slides showed an animation of a temporal difference learning agent slowly learning to cross an open area over the course of dozens or hundreds of trials — a stark contrast to real animals (and the Hermundstad lab’s new model) that can learn the same task in only a few trials.

The widget below is my version of that slide, minus the new model and plus an even slower Monte-Carlo agent. The heatmap shows the value function for a greedy policy. The white lines show the actual paths taken by the agent as it navigates from the starting point in the bottom left to the goal near the top right.

Monte-Carlo

TD(0)

Discount factor AU

Exploration rate AU

Learning rate AU