Deep Reinforcement Learning Part 2: The Game of Stock Trading


The article was written by Hieu Nguyen, a Financial Analyst at I Know First.


  • What to look for in the game of stock trading
  • Several problems with Reinforcement Learning
  • How far can we go with Reinforcement Learning?

In the first article about Deep Reinforcement Learning (RL), we have discussed the basic concept of RL and how to find an optimal policy using Q-learning. However, applying RL in real world problems in general and in stock trading in particular is extremely difficult. It requires understandings of the trading theory as well as efforts to create an effective model to forecast the market movement. Today, we will discuss how effective RL is and how it can solve problems with stock trading.

What to look for in the game of stock trading

Every day, millions of traders around the world are trying to make money by trading stocks. However, it has never been easy to be a good trader. There are many questions a trader need to answer to maximize his or her profit. When to buy? When to sell? What is the target price? And how long to target? Moreover, since all of the market variables keep changing, the target price is also adjusted continuously. Supposed that you derive the target price of a stock with a lot of inputs such as interest rate, trading volume, and stock price. All of these variables are real-time variables that changes every second. Hence your target price will change every single second.

Even if they can answer these questions, they need to decide what to do if the price does not reach the target in time? Should they keep waiting or sell the asset? What should they do if the asset goes to the opposite side? A trader will normally set a stop-loss price to get out of the market if the price drops below the stop-loss price. However, the stop-loss price can not solve the whole problem. We will mention several scenarios ranging from best to worst for this strategy.

Best to worst scenarios for traders

  1. The price hit the target and overshot by a little or dropped back: you can exit with the target profit and be happy with your strategy.
  2. The price reached the target and overshot by a lot: you still exit because the stock reached your target price but you do not gain the potential extra-profit.
  3. The stcok price stayed between the target and the stop loss: Nothing really happens in this case but you will come back to the questions of whether to keep yourself in the market or exit?
  4. The price hit the stop loss and overshot by a lot: exit with pre-determined loss. You exit with the pre-determined loss
  5. The price hit the stop loss (you exited) and bounced back: Due to the strategy, instead of a potential gain, you suffered a loss.

Because of the complication of the stock trading problem, it is necessary to create a live system to learn and re-evaluate the strategy continuously. However, whether reinforcement learning can create such a system to solve this problem is still a big question.

Several problems with Reinforcement Learning

Large number of states

From the previous discussion about Q-learning, the algorithms will decide an action in a particular state based on the expected Q-value. There is an underlying assumption for the whole process is that we have a limited number of states as well as the algorithms have visited every state a significant number of time. In fact, the Q-learning can only guarantee its result if every state is visited an infinite number of times. If you only visit a state only one time and get a good reward, it does not mean you can expect the same reward every time.

How to set a good reward

Rewards is the second but may be the most dangerous problem. Due to a “not accurate enough” reward, the RL algorithms can make an unexpected, sometimes ridiculous, results. An ultimate example of the problem is “paperclip maximizer”. Let’s say we try to build a RL program to make as many paperclips as possible. The reward seems very simple: number of paperclip. The RL program will start to “explore” the world and find out the best way to earn the highest reward. It may try to collect paperclips, then earn money to buy paperclips, and manufacture paperclips. But things may start to get worse. The AI may realize that since human may try to turn it off, killing humans may be “efficient” to reach the goal. We can avoid this problem by setting some specific heuristics like an incredible penalty for harming human beings. But how about other risks that we have never thought about?

The above video is another example of consequences of setting a wrong reward. In this case, the goal is to put the red brick on top of the blue one. However, if the reward is set to be the height of the bottom side of the red brick, the algorithm may find out that flipping the red brick is another way to achieve the goal.

These problems along with stock trading

Same problems show up in stock trading. First of all, let’s talk about the problem with defining states. In stock trading, state is a combination of two parts: your portfolio, and a set of input variables such as stock price, trading volume, or interest rate. Actions will be buy, hold, or sell each stock. At any state, your action will change the portfolio structure as well as the market inputs also change. As a result, you jump into a new state. However, since there are thousands of stocks as well as possible inputs, there will be a huge amount of states. Visiting all of them will be a big problem for Reinforcement learning.

Second of all, the reward is also hard to define. A simple way to set up the reward is profit. It seems very reasonable as all traders are chasing profits. However, it’s not just about profit, it’s also about risk. Let’s consider Stock A and B that have the same expected return at 30% but standard deviation of Stock A is only 10% while that for Stock B is 40%. As a result, the probability of losing money for stock B is much higher than stock A. However, why does losing money matter? In fact, losing money can cause huge troubles for any portfolio managers as investors can start withdrawing money from the fund.

In order to take this risk into account, you may set up a penalty for a loss. But how to set up this number is also a problem. On the other hand, many portfolio managers can set up other number as reward for the algorithms such as Sharpe Ratio, Treynor Ratio, or Information Ratio. Hence, setting up a reward is subjective based on each portfolio manager.

How far can we go with Reinforcement Learning?

Real-life application of Reinforcement Learning

All of the above-mentioned drawbacks can not stop Reinforcement learning from earning a strong position in AI industry in general and financial market in particular. In real world, reinforcement learning has proved its effectiveness in many field. The video under shows a model that applied reinforcement learning in self-driving car. Before training, the agent knows nothing about the environment. The only thing it knows is the basic rules including rewards and actions. After training, the agent can now drive the car successfully

Another example is DeepMind, an AI company that has been acquired by Google for $660 million. DeepMind has successfully trained a Reinforcement Learning agent to outperform human in a lot of game such as Video Pinball, Boxing, and Breakout. Of course, Google did not acquire DeepMind to play video games. Acquiring DeepMind is such a huge move of Google in the AI industry. In fact, DeepMind has contributed a lot to Google virtual assistant, data centers as well as personalizing app recommendations.

Applications of Reinforcement Learning in Stock Trading

In stock market, I Know First becomes one of the very first examples of applying reinforcement deep learning into stock trading. In fact, I Know First’s algorithms is a complex combination of different AI methods. However, undoubtedly, reinforcement learning has contributed to the success of the algorithms. Over 8 years, the algorithm have successfully beat the market. Moving forward, the company algorithms are keep “exploring” the financial world and “exploiting” it to forecast the movement of stock markets.


It has never been easy to make money by stock trading. It requires a lot of time and efforts to answer tons of questions and make a good trading strategy. However, even with a good trading strategy, implementing real-time data into analysis is difficult. That’s when reinforcement learning step in. Despite of some drawbacks with defining states and rewards, reinforcement learning has proved its success in many fields of the real world. In the game of trading, I Know First has successfully applied reinforcement learning in stock forecast.

To subscribe today and receive exclusive AI-based algorithmic predictions, click here