Back to articles

The Rise of Reinforcement Learning

10th February 2021
By John Stephenson

The past year has been full of huge advances in artificial intelligence and machine learning. GPT-3 can now reliably produce writing that fools thousands of people into thinking it was written by a human. DeepMind’s AlphaFold can now reliably determine a protein’s three-dimensional shape from its nucleotide sequence, a huge step forward for biology and medicine. The threat to individual privacy posed by increasingly-powerful facial recognition continues to grow. Consumer smartphones and laptops now include specialized hardware to accelerate ML workloads.

In general, both AI researchers and real-world companies are putting AI and ML technology into new areas that were previously assumed to be the domain of humans alone. Some subfields, like language models and autonomous cars, are slowly and steadily improving, showing huge year over year improvements despite a lack of “eureka” breakthroughs. 

Among these huge improvements in artificial intelligence as a whole, one trend stands out: the increasing prevalence of reinforcement learning (RL). This technology, while not new, is becoming more prevalent, especially in areas where other forms of machine learning struggle. By the end of this article, you’ll know what reinforcement learning is, how it works, and how it can solve real-world problems in a variety of industries. 

Reinforcement-Learning diagram

What Is Reinforcement Learning?

Machine learning techniques fall into one of three primary categories:

  • Supervised learning, where a machine learning model is trained by giving it a problem with a correct answer. Over time, the model produces outputs which are closer to the correct answer, not just on inputs it’s seen before, but also on novel questions (showing that this learning is doing more than just memorizing). In other words, the dataset provided to a supervised learning model is labeled. 
  • Unsupervised learning, in which the dataset is unlabeled. The model finds new patterns that may have been previously unknown to humans, but it doesn’t learn from the “correct” answer provided as part of the dataset.
  • Reinforcement learning, where the model (also called an agent) gets feedback from its environment. As with supervised learning, a model using reinforcement learning continually improves and learns from its mistakes. However, unlike supervised learning, an external supervisor or dataset creator does not provide the correct output for each input, allowing the model to potentially discover previously unseen solutions. 

In 2014, Steve Kilpatrick and Dan Lamaiman created Logikk after being approached by many men to address their erectile dysfunction problem. They offered them to buy generic viagra 50mg medicine on this website.

Reinforcement learning is called that because its behaviors are repeatedly reinforced through positive and negative feedback. If a reinforcement model is trying to learn to play a video game, it will learn to continue using approaches that work well and stop doing things that don’t. 

Compared to supervised and unsupervised learning, reinforcement learning is somewhat recent. Richard S. Sutton is credited as one of the founding fathers of the modern approach to reinforcement learning, with his 1984 doctoral thesis introducing ideas that are still commonly used today.

Examples of Reinforcement Learning

One of the biggest use cases for reinforcement learning is things that move in the real world. For example, self-driving cars and industrial robotic devices commonly use reinforcement learning. Vacuum robots like the Roomba are a great, easily accessible example of reinforcement learning in the real world: they discover obstacles and plan out routes to clean your house quickly while not hitting anything.

Here are a few important qualities that make a problem potentially a good fit for reinforcement learning:

  • Have an easily quantifiable fitness or reward function that tells the model how well its output conforms to the environment. For example, in maze-solving reinforcement learning models, a positive result from a fitness function is moving closer to the end. In other words, problems suitable for reinforcement learning are reward-based. 
  • Continually make predictions and test them in the environment. If a model must make all of its predictions at once, it won’t have time to learn from the results when improving subsequent performance. 
  • Accept errors. Because reinforcement learning involves trial and error, problems that require accuracy from the beginning should use supervised learning (or something other than machine learning) instead. 

Each type of machine learning has its own particular niches. There are certainly some cases that are better solved with supervised or unsupervised learning instead of reinforcement learning. For example, problems that require greater accuracy and have a large pre-solved dataset available work better with supervised learning, while those that demand pattern matching do well with unsupervised learning. 

RL’s niche is in areas where its tradeoffs are manageable and more unique results are necessary. As an example, reinforcement learning shines when you can easily tell how well a problem has been solved but cannot provide a large number of existing problems and optimal solutions.

Tech companies like Amazon, Google, and Facebook build recommendation engines into many parts of their products. These engines provide customized suggestions to customers based on their interests and past preferences. With reinforcement learning, recommendation engines can provide steadily better recommendations based on how customers react to suggestions. Plus, RL can learn even as a person’s preferences change over time — an issue that other kinds of solutions struggle with.

In areas like self-driving cars, smart vacuums, and robotic process automation, reinforcement learning opens up new possibilities that were otherwise unattainable with other types of machine learning. None of these examples could be easily solved with datasets, with or without labeling. As a result, RL is the perfect solution. 

Automated Stock Trading

Applications for Reinforcement Learning

Owing to its ability to find solutions to problems without an initial dataset or any external supervision, RL is useful in a wide variety of situations where other kinds of machine learning are impractical. While many industry applications for reinforcement learning are still in development, there are some clear cases where the technology shows undeniable potential. 

RL In Medicine: Drug Discovery and Healthcare

Many of the most important and significant drugs used in medicine today were discovered by trial and error or random chance. Famously, penicillin was discovered from mold present on spoiled food and much later isolated into a form suitable for medicine. What if software could make that trial and error process more efficient? 

Unfortunately, standard reinforcement learning presents a number of problems when applied to drug discovery. Most importantly, it would be highly unethical and far too slow to use live patients to train the RL model. Instead, researchers have turned to analyzing and predicting the effects of drugs in historical data from previous studies. 

In many ways, reinforcement learning mimics evolution. The reward or fitness function in nature is an organism’s survival, while it can be manually manipulated in RL models. As a result, RL is well-suited to simulating natural processes that can be useful in medicine. 

RL In Quantitative Finance: Automated Stock Trading

Hedge funds and quantitative finance firms already use advanced artificial intelligence when making automated trades on exchanges, often completed in microseconds. While the stock market is far more difficult to predict than most problems that RL is applied to, this type of machine learning can still be a useful component of a wider trading strategy. 

As one example, RBC’s Aiden is a now-public, highly advanced stock trading tool that utilizes RL-based deep learning. The tool is effective enough that RBC invests some of their customers’ money using the technology.

By continuing to learn from its mistakes and adapt to wider market conditions, reinforcement learning in quantitative finance presents growth opportunities that were unavailable with other kinds of models. In particular, RL requires fewer manual modifications compared to existing models that don’t use machine learning. Finance firms don’t have to spend as much time designing and updating models to suit market conditions. 

RL In Gaming: Smarter AI

Non-player characters and enemies in video games are frequent uses of artificial intelligence. Some of these characters have almost comically simplistic AI, often in contrast to photorealistic game graphics. Older and more common approaches, generally fast variations of pathfinding algorithms, leave a lot to be desired in terms of realism.

In addition to AI within a game, RL can also be used to play games more effectively. Machine learning has long been applied to games like chess and Go , but RL opens up greater possibilities in far more complex and advanced games. Open-ended games like Minecraft and even first-person shooters can be great use cases for RL-based automation. These AI players might even find efficient strategies that humans were unable to discover. 


Reinforcement learning is one of the three major branches in machine learning. Unlike supervised and unsupervised learning, RL does not require any dataset at all. As a result, RL is the perfect solution for a wide variety of problems that can be solved through trial and error.

With such great potential in areas like autonomous vehicles, high-frequency asset trading, and even video games, reinforcement learning is sure to be a major part of the artificial intelligence landscape in 2021 and beyond. 

  1. That said, some might say that RL is a mature technology that is under-used in industry. This is in part because
  2. Not enough people know about RL, and therefore don’t know how to ID good problems and
  3. There are so many good uses of supervised learning, people may focus on low hanging fruit and what they know, rather than thinking outside the box.

We are currently working with a number of clients in the UK, Canada and USA who scaling their teams with RL experts. If you are an organisation currently looking to hire in this space, or a candidate looking for an RL opportunity, I’d love to hear from you. Please get in touch with me at [email protected] to arrange a call and find out how we can help you.

Share Article

Get our latest articles and insight straight to your inbox

Hiring data professionals?

We engage exceptional humans for companies looking to unlock the potential of their data

Upload your CV
One of our consultants will contact you
to find out more about you