# Machine Learning Trading, Stock Market, and Chaos

*Tali **Soroker** is a Financial Analyst at I Know First.*

**Machine Learning Trading, Stock Market, and Chaos**

**Summary**

- There is a notable difference between chaos and randomness making chaotic systems predictable, while random ones are not
- Modeling chaotic processes are possible using statistics, but it is extremely difficult
- Machine learning can be used to model chaotic processes more effectively
- I Know First has employed artificial intelligence and machine learning in order to make predictions in the stock market
- Definitions for underlined words can be found in the Glossary at the end of the article

**Chaos vs. Randomness**

** **Differences in the concepts of randomness and chaos are crucial in our abilities to make predictions about a system with such properties. A random system is unpredictable, as a given outcome does not rely on any previous event. A coin that is tossed seven times in a row, landing on heads each time, can be tossed an eighth time and the probability that it will land on heads again is still only 50%. Such __stationary processes__ do not have a change in statistical properties over time and, therefore, cannot be predicted.

Real world processes may seem random to the untrained eye, but upon closer examination, we see that such processes are in fact chaotic. Natural processes such as seismic events, population growth, and stock markets are all examples of such systems and can be predicted with reasonable accuracy. Chaotic processes are controlled by three competing paradigms: Stability, Memory, and Sudden and Drastic Change.

*Stability* is seen in the stock market as a stock trend either increases or decreases. While the share price of the stock changes over the given time period, the trend is unchanging. There is also a degree of *instability* here because of what is called a “tired trend.” As a stock is rising and continues to rise, there comes a point when investors start to question how long the trend can continue as it has. As people begin to lose confidence in the trend the stability decreases. In this case, a small event that would normally have a little effect can be substantial enough to reverse the trend entirely. This is referred to as the Sand Pile Avalanche Model when one grain of sand eventually causes the pile to collapse. *Memory* is the influence that past events have on a current trend. A stock that has been known to rise will likely continue to do so. *Drastic and unforeseen changes* can also occur, completely reversing a trend with little or no warning. Black Swan events, as they are referred to, are themselves unpredictable but are useful in making future predictions. The cycles of rising and falling trends that occur in chaotic processes have varying time periods, quiet periods can be followed by a large jump or vice versa. Together, these properties of chaotic processes make it possible to make predictions about the system using probability.

**Chaos Modeling with Statistics**

** **Creating a model of chaotic systems using mathematics is difficult due, in part, to what is commonly referred to as the Butterfly Effect. Smalls changes in __parameters__ can cause drastic changes in the outcome, just as something as simple as a butterfly fluttering its wings can ultimately result in something as monumental as a world war.

However, the presence of gradual trends and the rarity of drastic events, such as we see in the stock market, can be modeled using the “1/f noise model.” The basic principle behind this model is that the magnitude of the event is inversely proportional to its frequency. In other words, the more frequently an event occurs, the smaller its impact on the system. 1/f noise is created by random shocks to the system, as well as the combined effects of separate but interrelated processes. An example of this can be either independent news stories or a combination of news stories that all contribute to a common result. The exact cause and effect correlation is difficult to pinpoint and there can be any number of arguments to explain how each factor is influenced by the others. We see 1/f noise in many natural and social processes, and while its source is not well understood, this may be the reason for its existence. As such, 1/f is an intermediate between random white noise and random walk noise, and in most real chaotic processes the 1/f noise is overlapped by the random frequency-independent (white) noise.

In chaotic processes, past events influence current and future events. In Mathematics, this connection between a time series and its past and future values is called __autocorrelation__. While autocorrelation functions for random processes decay exponentially, for chaotic processes they have a certain degree of persistence which makes them useful for making predictions.

Looking at chaotic processes at different degrees of magnification shows that they retain a similar pattern regardless of scale. This __self-similarity__ introduces the subject of __fractals__ to our modeling. When we look at a relation such as:

f(x)=ax^{-k}

Scaling the argument x by a constant, c, simply causes a proportionate scaling of the original function. So, scaling a power-law relation by a constant, causes *self-similarity* which we see in both chaos systems and in fractals.

This property of self-similarity is crucial, as it allows us to examine the __linear relationship__ between the __logarithms__ of both f(x) and x on the log-log plot. The slope of the line on the rescaled range gives the *Hurst Exponent, H,* the value of which can distinguish between fractal and random time series or find the long memory cycles.

There are three different groupings of the Hurst Exponent: H is equal to ½, H is less than ½, and H is greater than ½ and less than 1. When the Hurst Exponent is exactly equal to ½, it is indicative of a random walk, unpredictable Brownian motion with a __normal distribution__. For H less than ½, there is high (white) noise and high __fractal dimension__ meaning there is a high level of complexity in the system’s values. Finally, for H is between ½ and 1, we get that there is less overlapping noise and a smaller, more manageable, fractal dimension. This is indicating a high level of persistence in the given data, leading to long-memory cycles. Ultimately, the Hurst Exponent is a measure of overall persistence in the system.

Another exponent, the Maximal Lyapunov Exponent (MLE), has a strong correlation to the Hurst Exponent and is a measure of sensitivity to initial conditions. The MLE can be examined by running the model outcome with small changes in the input, and then measuring the divergence of the output. This process is relatively simple for lower dimensional models but becomes complicated as the number of variables increases. By taking the inverse of the Lyapunov exponent, 1/MLE, we see a measure of the model’s predictability. The larger the MLE, the faster the loss of predictive “power.”

Fractal time series are complex systems, but they can be used to find good approximations of chaotic processes because the two have such similar properties. For Fractal fluctuations, we use the fat-tailed probability distribution because the normal distribution needs to have a fixed mean and is not useful for quantifying self-similar data sets. With the fat-tailed probability distribution, the __variance__ is representative of local irregularity characterized by the fractal dimension (D), while the __mean__ is representative of global persistence characterized by the Hurst Exponent (H). The fat tail accounts for the probability of extreme events occurring in the natural and social worlds.

**Chaos Modeling Using Algorithm**

** **Due to the complicated nature of modeling chaos using statistics, scientists look to computers to solve these types of problems. __Artificial intelligence__ and __machine learning__ have proven to be incredibly successful in modeling chaotic structures and ultimately in making predictions about these systems.

The purpose of machine learning is to generalize. The machine can take in an inordinate amount of data, find laws within the data and then predict change based on the hidden laws that it finds. Artificial Intelligence has been created in different forms: Rules Based, Supervised Learning, Unsupervised Learning, and Deep Learning.

In the *rules based* approach, man creates the rules and the machine follows them to get a result, but this is time-consuming and not very accurate. *Supervised learning* is example-based learning, with the examples being representative of the entire data set while *unsupervised learning* uses clustering to find the hidden patterns within the data. *Deep learning* machines are able to model high-level abstractions in the data by using multiple processing layers with complex structures. These machines can automatically determine which data points to consider and then find the relationship between them on its own, with no human involvement. One step beyond this is “Ultra Deep Learning” which combines all types of learning and is able to not only derive the rules but detect when the rules change.

Machine learning works by first providing a framework with mathematical and programming tools. Then, the data must be converted to more-or-less stationary data without the cycles and trends, this reduces the uniqueness of each data point. The model can then be either parametric or nonparametric. A parametric model has a fixed number of parameters while in a nonparametric model the number of parameters increases with the amount of training data. Next, create examples for the machine to learn from, this is an input and, in some methods, an output.

An __algorithm__ should be chosen based on factors such as the desired task, time available and the precision that is required to achieve relevant results. Local search algorithms use methods such as determining steepest decent, best-first criterion or stochastic search processes such as simulated annealing. Simulated annealing is done by making a random move to alter the state, then compare the new state to the previous state and determining whether to accept the new solution or reject it. This repeats until an acceptable answer is found. Global search algorithms use processes such as stochastic optimization, uphill searching and basin hopping to achieve desired results.

Genetic algorithms, a form of local search algorithms, have also been created by using techniques that parallel genetic processes. The algorithm improves the data, or gene pool, by utilizing combination, mutation, crossover and selection. In *combination*, the algorithm combines two or more solutions in the hope of producing a better solution. *Mutation*, just as in genetics, involves modifying a solution in random places to achieve a different result. Solutions can also be imported from a similar solution; this is called *crossover*. Ultimately, a selection is made using the principle of *“survival of the fittest,”* any suitable solution is selected and otherwise the process of manipulation continues.

**The I Know First Predictive Algorithm**

Most financial time series exhibit classic chaotic behavior, so it is possible to make predictions about their future behavior using machine learning techniques. This artificial intelligence approach is in the root of the I Know First predictive algorithm.

I Know First’s genetic algorithm tracks current market data adding it to the database of historical time series data. Then, based on our database of 15 years of stock share prices, the algorithm is able to make predictions over six different time horizons. With each additional data input, the algorithm is able to learn from its successes and failures and then improve subsequent results.

The I Know First algorithm identifies waves in the stock market to forecast its trajectory. Every day the algorithm analyzes raw data to generate an updated forecast for each market. Each forecast includes 2 indicators: *signal *and *predictability*.

__Signal__

The signal represents the predicted movement and direction, be it an increase or decrease, for each particular asset; not a percentage or specific target price. The signal strength indicates how much the current price deviates from what the system considers an equilibrium or “fair” price.

__Predictability__

The predictability is the __historical correlation__ between the past algorithmic predictions and the actual market movement for each particular asset. The algorithm then averages the results of all the historical predictions, while giving more weight to more recent performances.

By using this predictive algorithm, I Know First’s 2015 portfolio outperformed the S&P 500 picks by an impressive 96.4% margin.

The overall return in the period January 7^{th} 2016 – January 1^{st} 2017 ranges between 77.3% and 20.1% while the S&P 500 increased by 12.5%.

**Conclusion**

There are many systems in this world that we can predict due their chaotic nature, and we can benefit in many ways from our ability to do so. The stock market is just one example of these processes, with accurate predictions leading to financial gains. We make our predictions by first creating a model of the events in the system. We can do these using statistics or, to avoid the difficulty involved in this, using algorithms and artificial intelligence. I Know First has created an algorithm that is able to make accurate predictions of the stock market and has been able to use it to greatly increase the return on investments for their clients.

Part 2 Click Here

**Glossary**

- Stationary processes – A process with a fixed probability for each possible outcome (i.e. coin toss)
- Parameters – A numerical characteristic of a population
- Autocorrelation – Similarity between events as a function of the time lag between them
- Self-similarity – The property of an object that keeps the same shape regardless of scale
- Fractals – A natural phenomenon (or mathematical set) that has a repeating pattern at every scale
- Linear relationship – The relationship between two variables with direct proportionality (the graphical representation of this relationship is a straight line)
- Logarithms – The inverse of the exponential function
- Normal Distribution – The distribution of statistical probabilities for some scenario (more commonly known by its ‘bell curve’ representation)
- Fractal Dimension – The ratio comparing the detail in a fractal pattern with the scale at which it is measured
- Mean – The central tendency of the probability distribution (i.e. expected value)
- Variance – The measure of how far each number in the set is from the mean
- Artificial Intelligence – Intelligent behavior exhibited by machines or software
- Machine Learning – A subfield of computer science that explores the construction of algorithms that can learn from data and then make predictions on the data
- Algorithm – A procedure or formula designed for solving a problem