Why Reinforcement Learning Will Never be Anything More than a Small Footnote in the History of Computer Science

Industry, government, and academic research organizations are spending billions of dollars looking at reinforcement learning (RL). They are throwing their money away. Ten years from now RL will be just a small blip on the computer science roadmap.

RL works only on limited scenarios which aren’t useful (basically computer games like Pong and StarCraft). And even when RL works, alternative techniques almost always work better. Put more bluntly, RL isn’t useful and it doesn’t work.

Researchers who are skeptical of RL never criticize RL openly. If your company or school or organization is flushing away huge amounts of money on RL, it’s not in your best interest to point out to the managers who funding RL that they’re . . well, not very smart. And nobody in research wants to be labeled as not having a grand vision for the future. The First Commandment of research is, “Thou shalt not ever criticize research.”


Left: RL can learn to play Atari Breakout — interesting but not useful. Center: RL research uses Greek letters so it must be profound and useful — not. Right: RL can train an animated puppet how to walk — impressive but ultimately not useful.


People are in love with RL for two main reasons. First, the idea that you can train a robot or system to do something human-like without a complex algorithm sounds like the way humans learn, and so if RL is transferable it could lead to general AI. Second, all the toy RL demos are very visual — RL playing a video game or RL teaching a puppet how to walk. Humans, even children and middle managers, understand visual.

RL has created a huge self-perpetuating ecosystem where researchers crank out papers that look at increasingly irrelevant details. Publishing papers is how a researcher is evaluated. Because RL is new, it’s very easy to spin up a research paper and get it published — the least publishable unit effect in all its glory. A continuous spew of meaningless RL papers leads to happy researchers, happy managers and school administrators, happy executives, happy conference organizers, happy publishers, . . but nothing productive.

The current state of RL is very much like the state of Game Theory in the 1960s and 70s. Looks great in principle, but doesn’t solve any useful real-world problems. Computers originated from specific problems like computing artillery trajectories. Rockets that now place communications satellites in orbit were created as weapons. A practical problem leads to a solution. Game Theory and RL were created as hypothetical solutions – – but solutions don’t always have an associated practical problem.

There are many reasons why RL doesn’t work for real problems. Briefly, it’s impossible to construct a good reward function, and it’s impossible to create a rich enough simulation model to train an RL system that works reliably.

OK, now the truth. All the things I’ve written above are correct (even if a bit exaggerated) but RL may yet prove to be useful in unexpected ways. For example, arithmetic number theory by itself wasn’t useful but the theory was used decades later for parts of cryptography. And if quantum computing becomes a reality, orders of magnitude greater computing power might enable RL for a real problem.

For now RL is quite a bit over-hyped. But don’t expect anyone to tell the Emperor he has no clothes.


Left: 3D television. A solution in search of problem. Center: Google Glasses. A solution in search of problem, but it’s not ceiling inspection. Right: The Mokase coffee-making phone case. A problem in search of an even worse problem.

This entry was posted in Machine Learning. Bookmark the permalink.

5 Responses to Why Reinforcement Learning Will Never be Anything More than a Small Footnote in the History of Computer Science

  1. Thorsten Kleppe says:

    On the other side you said that the breakthrough from your Zoltar prediction system was reached with RL. I’am still asking me what you have done to predict two from three.

    Alexander Amini from the MIT has recently posted how cars drive with RL.

    Maybe you underestimate today what you and others already reached in that field?

  2. Yes, one part of the Zoltar football prediction system uses RL but Zoltar’s RL isn’t exactly “real” RL. Self-driving cars are a good example of why RL, by itself, doesn’t work. You have to train RL using a simulation model, but then the behavior of the car is only as good as the model. And unexpected behaviors can be learned that wouldn’t appear until an unusual set of conditions occur, and that could have deadly consequences. For example, if a bouncing green-and-yellow stripped ball appeared, it’s possible that an RL-trained system would unexpectedly react by accelerating rather than by braking.

    RL will probably eventually gain some nice practical successes in industrial robots, or as a small part of a larger system.

  3. sau001 says:

    Interesting observations.

  4. Michael Prokofyev says:

    So you start with “RL isn’t useful and it doesn’t work” and end with “RL may yet prove to be useful”. You don’t have to over-hype your message to tell how over-hyped RL is.

Comments are closed.