Raymond's Weblog^

Teaching Machines To Dream: Oneirology and Deep Learning

In 1968, Philip K Dick asked: Do Androids Dream Of Electric Sheep? In 2020, Erik Hoel answered: Yes, if they want to make generalised predictions.

There’s a neat thing that keeps happening where people try to get computers to do things that humans do, come up with weird hacky solutions, and accidentally reinvent fundamental components of the human experience. In this case, dreams. Which is to say, we’ve found some important tricks for making AI perform well, and the actual mechanical effects they have on a neural network seem to be pretty much the same as the effects dreams have on humans.

Overfitting

Modern deep learning basically amounts to giving an AI a bunch of training data and telling it to spot patterns. One of the risks here is ‘overfitting’ - that rather than discovering general rules, the AI will just memorise a specific pattern that works really well for the data you’ve trained it on.

This is an image I stole from google. Here, you give the AI some data points, and tell it to find a line that basically passes through all of them. If it can make a good line through those points, then it will be able to make good guesses for where other points would fall: in the case of this diagram, “what would the value be later in time?”

The problem is, if you let the AI train for long enough, it might be able to find a really weird line that goes perfectly through all the training points but actually totally fails on any points the AI didn’t get to train on. If you imagine the dots in the above picture stretching to the right, the ‘robust’ line would probably estimate their values pretty well, but the overfitted one would be way off.

How do you prevent overfitting? Well, two neat tricks are:

Weight decay, where you scale down all the numbers in the network, making the model a bit less accurate overall but also making sure there’s less of a disparity between the biggest and smallest weights. That means less of the sharp curves. If you picture the above curves as being polynomials, this would be a bit like forcing it to not have massive coefficients anywhere.
Noise injection, where you corrupt the inputs a little bit. In the above graph you could think about this as perturbing the x a bit, which is fine for a smooth-ish curve and terrible for a super jaggedy one.

Dreams

Two fun facts about sleep:

Over the day, as you learn, there is an increase in the strength of synaptic connections across your brain. When you sleep, these all go down. “Thus, sleep is the price we have to pay for plasticity, and its goal is the homeostatic regulation of the total synaptic weight impinging on neurons”.
People tend to dream about things they are in the process of learning a lot about. The classic example here is The Tetris Effect.

The Claim

I think it would be premature to say that sleep and dreams are just the human equivalent of weight decay and noise injection, not least because there are other things that also happen when you sleep.

However, it does seem really surprising to me that evolution would converge on a process for brains which seems so similar to one we have empirically found to be useful for deep learning networks. To my mind, this is the most convincing explanation I’ve seen for why we dream, and a point in favour of brains and deep learning networks not being so different.