Deep Learning for Stock Price Prediction Explained
As part of my work on Quantized Classifier I have built a set of examples showing how to use Deep Learning to predict future stock prices. Tensorflow is a popular Deep Learning library provided by Google. This article explains my approach, some terminology and results from a set of examples used to predict future stock prices. The code is free so please download and experiment.
General Purpose Tensorflow CNN Classifier
The first step was building a general purpose utility that could read a wide variety of CSV input files submit them to Tensorflow’s CNN for classification. The utility needed to work across a wide range of inputs from classifying heart disease to predicting stock prices without any changes to the code. The utility I built is called CNNClassify.py When used for stock price prediction it reads two CSV files a training file and a Test file. It outputs classification results to the console. It builds a deep learning CNN using the training data and then uses the CNN to classify rows from the test data file. It generates output showing how successful it was in the classification process.
You need to install Tensorflow, TLearn and python 3.52 to run these samples.
In supervised learning we look at a given row and assign a class. You can think of a class as a grouping of rows. Any classification project requires at least two classes such happy or sad, Alive or Dead. For these examples I use class=0 to indicate a BAR that failed to meet the goal. Class=1 if the BAR did meet the goal.
We are seeking bars that will rise in price enough to meet a profit taker goal before they would encounter a stop loss or that rise at least 1/2 of the way to the goal before reaching a maximum hold time.
To determine the class we look ahead at future bars and determined if the price for that symbol has moved in a way that would meet our goal. For these stock examples we have three factors for each goal.
- A Percentage rise in price that would allow exit with a profit taker order.
- A point where if the price drops by more than a given percentage it will exit the trade with stop limit.
- A maximum hold time where if the price has not risen to at least 1/2 of the way to goal before the hold time expires it is considered a failure.
Bars that satisfy these rules are assigned a class of 1 while bars that fail are assigned class 0. The classes are used by the learning algorithm to train the machine learning model. They are also used to test verify the classification accuracy of the model when processing the test file. In a production trading system the predicted class is used to generate buy signals that could be executed either by a human or an automated trading system.
In our examples we seek to maximize precision and recall of class 1 to find successful bars. The premise is that we will eventually use the predicted class for current bars to generate buy signals.
In a more sophisticated system we could have dozens or hundreds of classes but in this instance we are only seeking a signal about whether it is a good time to buy a specific symbol using a specific goal set. It would be relatively easy to run the engine across hundreds of stocks so you always had something available to buy.
The utility we use to generate the training and test data files with the classes assigned is stock-prep-sma.py You need to have downloaded the Bar data before running it. You can use your own bar data or yahoo-stock-download.py will download it from yahoo. These utilities are only samples but they could be a good example of where to start building a more sophisticated system.
The class computation used in these tests is intentionally simple. Feel free to take these samples and extend them with your own creative enhancements. Unique combinations of different classification logic and different feature engineering can allow a single engine like Quantized classifier to produce millions of different trading signals all customized to each specific users trading preferences.
Base Probability and Lift
When evaluating any machine learning system there are several numbers we use to measure how effective the system is.
- Base Probability – Given a input test data set a certain portion of bars will be class 0 and a certain number will be class 1. If 33 bars out of 100 met the goal then the base probability that any bar will be a member of class 1 is 33 / 100 = 0.33
- Precision – When the system runs across test data it attempt to classify bars. How often the actual class of a given bar matches the predicted class is called precision. If the system predicted that 27 bars would be class 1 and only 22 bars actually were class 1 then the precision for class 1 is 22 / 27 = 0.8148 The precision can also be measured for all records in a system but in this context we care most about the precision of class 1 because we plan to use it to generate buy signals.
- Recall – When the system evaluates test data it will attempt to find all the bars that it should classify as a given class. In reality it will only find a fraction of the bars that are available. Recall is computed as a ratio of those it classified correctly and the total number of records of that class. A general rule is that you can increase recall at the expense of precision or increase precision at the expense of recall. Better engines and improved feature engineering are used to increase both. Recall is typically computed on a class by class basis. We care most about recall for class 1 because higher recall will generate more buy signals. If there were 33 bars available and the system correctly found 22 bars that it correctly classified as class 1 then recall for class 1 would be 22 / 33 = 0.66.
- Lift – Lift is the measure of how much better precision is than base probability. Lift is important because it allows the relative improvement in prediction accuracy to be compared even when base probability changes. If Base probability is 0.33 and precision is 0.8148 then lift = 0.8148 – 0.33 = 0.4848. I use lift to help guide exploration of features. If I can increase lift without a significant reduction in recall then it is normally a good change.
Understand Probabilities in context:
A common mistake is to look at a precision such as 55% out of context of the Goal. It is incorrect to look at 55% precision and say it is poor odds unless you understand how much you win versus how much you would loose. If the wins are bigger than the losses then you can remain profitable with a lower percentage of winning bars.
If you tell any gambler you will give them 50% odds and winning hands will earn 2 times as much as they loose with loosing hands they will gamble all week long.
The law of large numbers indicates that if gambler has a 55% chance of winning and they are betting $100 each time after a large number bets they they will have won 55 times and lost 45 times on average per 100 bets. They would have won 55 * $200 = $11,000 from the wining bets. They would have lost 45 * 100 = $4,500 from the loosing bets giving them a net profit of $6,500 As long as the magnitude of the win versus loss stays the same and the probability of win versus loss stays at 55% then they will continue making profit. The law of big numbers also indicates they could have a long run of losses in the short term and still average 55% wins in the long term so they need to manage the amount they bet using something like the Kelly criteria to avoid gamblers ruin. In this example even if the win % dropped little below 50% while the magnitude of win versus loss remained the same it would still be a winning system.
I used samples where the amount won is larger than the amount lost because I found that forcing a larger win magnitude helps isolate the signal from the noise. This helps the prediction system deliver greater lift. Greater lift normally comes at the cost of recall so we have fewer trades but I would rather run dozens of diverse strategies that earn more profit per trade than accept more losses with smaller profits per trade. We can still find enough trades but it may consume a bit more compute power to evaluate dozens or hundreds of strategies simultaneously.
Feature engineering is where some of the most important work is done in machine learning. Many data sets such as bar data yield relatively low predictive value in their native form. It is only after this data has been transformed that it produces useful classification.
In the context of our stock trading samples we used a few basic indicators which are applied across variable time horizons to produce machine learning features. They are:
- Slope of the price change compared to a bar in the Past
- Percentage above minimum price within some # of bars in the past
- Percentage below the maximum price within some # of bars in the past
- Slope of percentage above minimum price some # of bars in the past
- Slope of percentage below maximum price some # of bars in the past.
Each of these may be applied to any column such as High, Low, Open, Close or they can be applied against a derived indicators such as a SMA.
The utility that produces test and training files containing both the machine learning features and classes is called stock-prep-sma.py. It is only intended as an example that you can modify to add your own creativity. I do not claim these are great features but they were good enough to demonstrate Quantized classifier and Deep Learning CNN delivering some lift and reasonable recall.
Feature engineering is an area with nearly infinite potential for creative thought. By using different combinations of features ML classifiers can produce radically different trading signals for different users. I encourage you to explore this area there are hundreds of indicators explained across thousands of trading books most of which can be converted into machine learning friendly features.
Deep Learning number of epoch explained
The Tensorflow flow Deep learning CNN (Convolutional Neural Network) a learning strategy based on how scientists think brains learn. The C portion essentially adds multiple layers to the NN which can allow them to perform better in ares where decision trees have previously dominated.
Just as biological systems learn best with repetition the CNN needs to see data multiple times while it is building its internal memory model it uses to support the classification process. Each repetition where the training data is re-submitted to the CNN engine is considered one epoch.
I have generally found minimum acceptable results are at 80 repetitions while the CNN seems to perform best with at least 800 repetitions when working with stock data. Each of these repetitions is what the Tensor-flow libraries call a epoch. For these samples I chose to use 800 epoch. For those that didn’t produce good results I increased the number of epoch to 2800.
Have fun and Experiment
Before you ask if I have tried it with X data set? Or if it has been hooked to broker X. The Engine is free, the examples are free. You are free to take them and test them with any data or configurations you desire. Have fun and please let me know what you learn. If you want help then I sell consulting services.
CNNClassifyStockSPY1Up1DnMh3.bat OR python CNNClassify.py ../data/spy-1up1-1dn-mh3-close.train.csv ../data/spy-1up1-1dn-mh3-close.test.csv 800
CNN Related stock tests
Parm 0 = CNNClassify.py the script python will run, Parm 1 = Training File to use when building internal memory model. Parm 1 – Test file to use when testing the classification engine. Parm 3 – Number of Epoch to use for this test.
CNNClassifyStock-SLV-1p5up0p3dnMh5.bat – SLV (Silver) Goal to rise by 1.5% before it drops by 0.3% with max hold of 5 days.
python CNNClassify.py ../data/slv-1p5up-0p3dn-mh10-close.train.csv ../data/slv-1p5up-0p3dn-mh10-close.test.csv 800
CNNClassifyStock-SPY-2p2Up1p1DnMh6.bat – SPY Goal to rise to exit with profit taker at 2.2% with a 1% stop limit. Max hold of 6 days.
python CNNClassify.py ../data/slv-1p5up-0p3dn-mh10-close.train.csv ../data/slv-1p5up-0p3dn-mh10-close.test.csv 800
CNNClassifyStock-SPY-6p0Up1p0DnMh45.bat SPY goal to rise to exit with profit taker at 6% gain with stop loss at 1% and max hold time of 45 days.
python CNNClassify.py ../data/spy-6up-1dn-mh10-smahigh90.train.csv ../data/spy-6up-1dn-mh10-smahigh90.test.csv 800
CNNClassifyStock-SPY-8Up4DnMh90.bat SPY Goal to rise to exit with profit taker at 8% gain with stop loss at 5% and maximum hold time of 90 days.
python CNNClassify.py ../data/spy-8up1-4dn-mh90-close.train.csv ../data/spy-8up1-4dn-mh90-close.test.csv 800
CNNClassifyStock-SPY-1p0Up0p5DnMh4.bat SPY Goal to rise to exit with profit taker at 1% gain before it encounters a stop loss of 0.5%. Max hold time 4 days.
CNNClassifyStock-CAT-1p7up1p2dnMh2.bat CAT Goal to rise to exit with profit taker at 1.7% before it encounters a stop loss at 1.2% with max hold of 2 days.
CNNClassifyStock-CAT-6p0up1p0dnMh45.bat CAT Goal to rise to exist with profit taker at 6% before it encounters a stop loss at 1% with max hold of 45 days.
CNNClassifyStock-CAT-7p8up1p2dnMh5.bat CAT Goal to rise 7.8% to exit with profit taker before it encounters a stop loss at 1.2%. Max hold of 5 days.
Deep Learning Tensorflow disclaimer
Deep learning is a broad topic area. Tensorflow is a fairly large and complex product. I have used one configuration of a Tensorflow CNN for these examples. Tensorflow supports many other models and CNN can be configured in many ways including a different initialization and different layer configurations. There are most likely ways to configure Tensorflow to produce better results than I have shown in these samples. If you find them please let me know.
This work is only intended to provide a starting point from which you can easily branch out with your own discovery process. I do sell consulting services you can purchase if you would like to use my expertise to accelerate your own work.
I wrote these examples to test the Quantized classifier and for some of the samples it delivers better results than Tensorflow did. This may be due to selection bias where I used examples that performed well with the Quantized classifier. It may be possible to find a configuration of Tensorflow the would deliver superior results. I personally find that Quantized Classifier and Quantized filter make it easier to find profitable combinations. The Quantized library also seems to provide better support to help discover which features are adding predictive value. Having the engine help guide the feature selection is beneficial when there are millions of possible feature and indicator combinations available but only a small fraction of them will actually help predict future stock prices.
Please let me know if you would enjoy similar articles exploring the same examples using the Spark ML libraries or other popular ML libraries.
Thanks Joe Ellsworth. contact