The strategy of statistical arbitrage on the US stock market

This article is aimed on those who haven’t been familiar with the strategy of statistical arbitrage, pair trading, but would like to try this trading strategy in practice. I focused especially on the practice. In this article, I’ll give you all the tools you need so that you can quickly set up free and open source software, start trading, quickly evaluate a strategy, and decide whether it suits you or not. According to the theory of statistical arbitration, there are a lot of free access materials, and as long as you understand correlations, cointegration, stationary time series, highly specialized software, enthusiasm may disappear. I would not like your interest to disappear, because strategy is very interesting, especially in terms of stability of positive transactions execution.

First of all, I would like to say about the disadvantages of the pair trading strategy. This strategy requires significantly more capital than trading a single instrument. You need to open a position in long on one stock and in short on another stock, and with a different volume on each side. To diversify risk, you need to open positions in several pairs. Next, you need a broker who provides the opportunity to trade fractional lots so that you can set the exact ratio of the volume in the pair. Even fewer brokers can provide large shoulders (4th, 5th leverage) to transfer positions to the next day.

In practice, some of the limitations mentioned above can be avoided. For example, to select pairs from cheap stocks, they can be found quite a lot, especially if they are not quite perfect pairs in terms of statistical indicators. You can pick up pairs, the ratio of shares in which is close to the whole lot, you can try to trade pairs within the day, without shifting positions for the next day. You can use this strategy on futures, options, they require less capital. There are solutions, but they will narrow the possibilities of strategy.

The American stock market is interesting because there can be several hundreds of perfect pairs out of several thousand stocks. You can open several new positions every day and close several positions that have already made a profit. Theoretically, there can be several tens of millions between stocks, ETFs, indices, but in practice, with the right approach to the selection, it turns out that there are not so many pairs. The simplest thing is to start looking for correlated and cointegrated stock-stock pairs and ETF-ETFs from the same industry. In this case, you have no market risk and sector risk.

Certainly, you can find excellent in terms of statistics and good looking on the chart pairs from different industries of the same sector, or even from different sectors, but you need to understand that your risks are increasing. You can find a couple of stocks with excellent indicators of cointegration, with a good schedule, but from different sectors, and start trading it, but you are not surprised if this pair suddenly breaks down for a completely incomprehensible reason. At least It is necessary to understand how companies are fundamentally connected, whether there are, for example, some production chains between them. There must be some foundation under the statistics of this pair.

80% success will depend on the correctly selected pairs. I would recommend first concentrating on the search for pairs,
stock-stock and ETF-ETF in the financial and technological sectors. Select the best pairs to include a pair of stocks or ETFs from the same industry. You can later experiment with stocks from various industries, sectors, stocks and ETFs, stocks and indices. I tried the share-index pairs in order to reduce capital intensity, but I did not like such pairs. You should not trade couples from the biotechnology industry, also be wary of pairs from the oil and gas industry.

First of all, I would like to talk about the pair selection tools, and then we turn to the tools we need to trade. For the selection, we will use the thinkorswim public and free platform. If, after the first acquaintance with the strategy, you want to dive into this topic further, then you can find pair selection tools, for example, using MATLAB and select indicators for your trading platform. They are not complicated; the main thing is to understand the approach to trading pairs.

To get started, use the finviz scanner and select the sector and industry you are interested in. For example, by the link was selected financial sector and industry Acciden & Health Insurance. Copy the tickers, paste into the test file, then open Excel, paste the tickers from the text file into the Excel cell. We need to break tickers into columns, and Excel does not adequately work with data copied from finviz. It is necessary to copy tickers either through a text file or in Excel, manually remove the spaces between tickers and replace them, for example, with commas. After that, you can split the tickers into columns (data tab, test by columns). Next, we need to expand the tickers from row to column. Select the cells with tickers, right-click on the free cell and select the Paste Special tab and, further, the Forward button. Then, to get a pair of tickers, we need a special Excel file. We need it to get ticker combinations without permutations. That means if there are AIV and AVB promotions, then the AIV-AVB pair will be in this file in the list of pairs, and the AVB-AIV pair will no longer exist. So, we copy the column with tickers into the first column of the Excel file Combinations without permutations. Unfortunately, the file accepts only 25 tickers. Immediately, I did not find a better option on the network, perhaps you are more lucky, or you will create your own file, creating combinations of pairs. Then the button Press and from 25 tickers we get several hundred pairs. It remains only to make the replacement of the space between the tickers with the minus sign (go to the main tab, find and highlight, replace). Copy the resulting pairs to a text file. Unfortunately, here you have to write in detail about working in Excel, because if you don’t use it regularly, you will spend a lot of time searching for solutions to simple tasks that are not related to trading.

We got couples on the Acciden & Health Insurance industry. In the same way we need to go through all the industries we are interested in. I want to warn you that the work on the creation and selection of pairs is long and monotonous. But after that you will begin to distinguish good graphics of pairs from bad ones. Further, we will upload data with pairs from a text file to thinkorswim, creating a new date. We’ll have to manually view the graphs and statistics of thousands of pairs. Gradually, processing one industry after another, this task can be handled.

I will not write about registration in thinkorswim here, everything is simple. Now thinkorswim gives a trial period of two months. If it is not enough for you and you need to renew the registration, then you make a new registration on a new computer and use this new login and password for the next two months.

To select pairs in thinkorswim, we need to use a daily timeframe and a line chart. Also set on the chart a simple moving average with a period of 20.

At the first stage for the selection, I used two statistical indicators of the pair. The first indicator was the Pearson correlation.

The code for calculating the Pearson correlation for thinkorswim:

def period = 250;
plot corr = correlation(close(getsymbolPart(1))-close(getsymbolPart(1))[1],close(getsymbolPart(2))-close(getsymbolPart(2))[1],period);

To correctly display the chart of this indicator in the settings, you need to tick the option Left axis:

Setting Pearson Correlation

I selected pairs with a correlation of 0.7 and better. But you can set more strict selection conditions. For example, select pairs with a correlation of 0.8 or 0.85. The greater the correlation, the fewer pairs you will find. In addition, in pairs with quite good correlation, you can earn almost nothing.

The second indicator by which I selected pairs was the level of cointegration of the two shares included in the pair. You can read about cointegration, for example, in this article. From this article I took the code for thinkorswim. In this code, the level of cointegration is calculated through the difference between the two moving averages. The smaller the difference, the more stationary the spread between the moving average.

The code for calculating the level of cointegration for thinkorswim:

def Data1 = close(GetSymbolPart(1));
def Data2 = close(GetSymbolPart(2));
def kf = close(GetSymbolPart(1))[500];
def kff = close(GetSymbolPart(2))[500];
def bn = BarNumber();
def hbn = HighestAll(bn);
def bn_diff = hbn-bn;
def kf1 = GetValue(kf, -bn_diff);
def kf2 = GetValue(kff, -bn_diff);
def Data11 = (Data1/kf1-1)*100;
def Data22 = (Data2/kf2-1)*100;
def av1 = average(Data11,50);
def av2 = average(Data22,50);
def minus = sum(AbsValue(av1-av2),450)/450;
addLabel (yes, minus ,color.cYAN);

We look at the calculated level of cointegration of the pair on the chart with the daily timeframe. I selected couples with a score of 10 and below.

Of course, to assess the degree of cointegration between time series, there are more correct and more complex tests. From the point of view of mathematics and econometrics, our tools are rather primitive, but they work and help to find good pairs for trading. For further study of this issue, I would recommend, for example, to look at pairtradinglab service. It will be useful for you to study the tests that they use to select pairs. Besides this, some information on couples there can be obtained for free.

At the first stage, we can go over the sheet and select pairs according to the formal signs of correlation and cointegration. At the same stage, you can put down the volumes in the pair and immediately carry out a visual assessment of the pair chart. The volume of each half of the pair will be different, because the price and the volatility of the constituent shares of a pair are different. In the code below, the volume is calculated depending on the amount of capital you specified. By default it is set $ 1000. Then in the settings you can change this value to any other.

Code for calculating the volume of a pair for thinkorswim:

declare lower;
input BP = 1000;
def vola = ((high(GetSymbolPart(1))/low(GetSymbolPart(1))) – 1) * 100;
def volb = ((high(GetSymbolPart(2))/low(GetSymbolPart(2))) – 1) * 100;
def volaav = SimpleMovingAvg(vola,50);
def volbav = SimpleMovingAvg(volb,50);
plot posa = (BP/volaav)/(close(GetSymbolPart(1)));
plot posb = (BP/volbav)/(close(GetSymbolPart(2)));
#plot pair = (close (GetSymbolPart(1))*posa) – (close(GetSymbolPart(2))*posb);

This code counts the proportion correctly. Unfortunately, this code count the volumes not always correctly.

On the chart, the tools listed above will look like this:

Cointegration and Correlation Indicator

In the upper left corner, is displayed the cointegration index of the pair, at the upper right, data on the degree of correlation and a graph of the correlation change over time. Under the graph you can see the data on the volume of the pair, depending on the established capital. In the example, the capital is set at $10,000. After pairing, all these indicators can be removed and only those that are necessary for trading will be left. We will talk about these indicators below.
For a pair that is in your list and whose statistical indicators are suitable for you, you can immediately put in the sheet volumes based on the capital that you plan to use for one pair. For example, you change the AIV-AVB pair to AIV*122-AVB*29. Note that over time, the ratios of volumes change. Periodically sheet must be reviewed and volume of pairs must be adjusted.

Putting down the volumes, you will see that the pair schedule will change. At the same time, with the application of volumes, you can make a visual selection of pairs charts. You will notice that despite the good correlation and cointegration rates, many pairs cannot be traded. Couples with similar statistics can have very different charts.

Consider the examples of good and bad charts. The essence of the strategy is that we will trade the divergence of the pair from its average value. We are interested in how confidently the pair returns to its average value after a sharp deviation. Below are three screenshots of couples with fairly decent charts. You see that when the pair’s chart (white line) deviates from the simple moving average (yellow line) fairly quickly and strongly, then after a while it goes back to it. You just need to determine the deviation parameters, on the basis of which we will make a decision on opening positions.







I would also like to give an example of a pair chart, which will be a bad choice for trading. For example, BK*95-STT*53:

You see that in a pair of BK*95-STT*53, after a strong deviation, a quick return to the average does not occur. The discrepancy does not converge, but stops or simply continues.

So, you look at sheets with pairs in all industries you are interested in, select interesting pairs in terms of statistical indicators and beautiful graphically. It is also useful to review manually selected pairs by comparing the charts of a pair of stocks. Nobody has canceled the visual selection in this strategy and I it is better not to rely only on numbers.

As a result, you get a list of pairs. Now we need to install indicators that will help find good entry points. As I wrote above, we trade deviations from the average. I use several complementary indicators that track the magnitude of this deviation. These are indicators RSI, Distance from mean and Z-Score. Their codes will be below. I use the RSI in some pairs, where trend movements often appear on stories. I am waiting for the RSI line to go beyond 70 or 30 and begin to return to the range. The RSI shows a good turning point. For example, see a pair of gold and silver futures on the hourly timeframe. In the screenshot below, the hour chart of the pair 50*/GC-2562*/SI:

Pair GC-SI

Notice the sections of the chart where the RSI goes to the top line. The pair is interesting because it can be traded intraday with relatively small capital. Here I would like to make a small digression and show how we generally take positions in pairs using the example of a gold-silver pair. When the chart is on high, for GC we open the position of short, and for SI long. When the pair’s chart is low, we will buy GC and open short for SI.

A pair of GC-SI is quite difficult to trade, it is far from ideal from a statistical point of view, but among the futures there is a small selection of instruments. In this pair you need to wait for good situations with strong deviations. On this pair, you can often observe trend movements, when the pair makes, for example, several rising peaks in a row. RSI filters such moments in a good way.

The indicator Distance from mean considers the size of the deviation from the average. For example, if your pair consists of shares, then you see how much in dollars is the distance from the price at the moment to the average, what is the potential in the transaction. Also, in history, you can see how the pair usually deviates and how much it deviates at the moment.

The Z-Score indicator shows how far the current price has deviated from the average in standard deviations. The period for which the average is considered can be set independently. I set the period of the moving average, RSI, Distance from mean, Z-Score to 20. You can experiment with different periods, choose something more suitable for the pairs you trade. In Z-Score, I set bounds equal to the third deviation. We are interested in standard price deviations with values of 2.5 – 3. In general, there are many other indicators showing deviations from the average, for example, BollingerBands.

Pick the right one for you. You understand now what you are looking for on the chart. The RSI indicator is installed from the list of standard thinkorswim tools. By RSI indicator you can sort the list of your pairs during the trading session. For the indicators Distance from mean and Z-Score, I used the codes found in the open spaces of the network.

The indicator code Distance from mean for thinkorswim (taken here):

declare lower;
input Length=10;
def SMA = Average(Close, Length) ;
plot plot1 = (Open-SMA);
plot plot2 = (Close-SMA);
plot1.SetDefaultColor(getcolor(5)); plot2.SetDefaultColor(getcolor(2));
Plot zeroline = 0; ZeroLine.SetDefaultColor(GetColor(1));

Z-Score indicator code for thinkorswim (taken here):

#Computes and plots the Zscore
#Provided courtesy of
#Feel free to share the indicator, but please provide a link back to

declare lower;

input price = close;
input length = 20;
input ZavgLength = 20;

#Initialize values
def oneSD = stdev(price,length);
def avgClose = simpleMovingAvg(price,length);
def ofoneSD = oneSD*price[1];
def Zscorevalue = ((price-avgClose)/oneSD);
def avgZv = average(Zscorevalue,20);

#Compute and plot Z-Score
plot Zscore = ((price-avgClose)/oneSD);
Zscore.assignValueColor(if Zscore > 0 then else;

plot avgZscore = average(Zscorevalue,ZavgLength);

#This is an optional plot that will display the momentum of the Z-Score average
#plot momZAvg = (avgZv-avgZv[5]);

#Plot zero line and extreme bands
plot zero = 0;
plot two = 3;
plot negtwo = -3;

In the end, I would like to talk about some of the intricacies of trading pairs. As for the pair’s chart, after its deviation from the average you need to wait until the bend of the chart starts in the opposite direction. If you trade on a daily chart, then wait, for example, on the hourly chart, to fix this bend.

In the report season, look when a stock from a pair has a report. If one reporting stock has drastically changed its price and the discrepancy in the pair has increased dramatically, and the report from the other stock from this pair is still far away, then you can take this discrepancy based on its convergence. Also, if you know that tomorrow one of the shares in the pair has a report, then the positions for the pair need to be closed.

There is also a subtlety in how to take a pair, what orders. First of all, you try to take a short position as a limit order, and when you give it to you, then you buy a second part of the pair with a market order. With the help of a limit order, we save on the spread in the stock, and we open the short position first because we need to make sure that this action is given to us in short. You do not want to find yourself in a situation where you take the first share in a pair in Long, and you will not be given the second share, because for some reason at the moment it cannot be shorted.

In conclusion, I want to give you a list of pairs from financial sector stocks that I have compiled. Each share is paired from the same industry. I took shares with an average trading volume of 400 thousand per day. I selected pairs with a Pearson correlation of 0.7 and cointegration less than 10 (I also recommend checking the pairs with the Dickey—Fuller test). The volumes in each pair were calculated based on a capital of 10,000 dollars. Create a thinkorswim sheet with these pairs, select those that you think are more suitable for trading. I intentionally left in the list pairs not only with smart charts. Select a few pairs that you can understand, set up the indicators I mentioned above, and start trading.


You may also like