In this article we look at some of the issues and risks in identifying trading strategies suggested by the data.
There are two main ways of identifying trading strategies:
- Unique hypotheses
- Strategies suggested by data
Unique hypotheses are usually based on fundamentals. For example, if bond yields rise, then price of gold falls, could be part of such hypothesis. However, some hypotheses that are considered unique have essentially been deduced from the analysis of historical data. Therefore, it is hard to determine whether a hypothesis is unique or not.
In addition, even hypotheses based on price action can be classified as unique. For example, VIX falls on the average when the S&P 500 rises, could be considered a unique hypothesis but this could be the result of analyzing historical data. In general, in the era of big data, it is hard to formulate hypotheses that cannot be deduced from the analysis of price series. On the other hand, price series analysis may fail to identify some hypotheses that have economic value for trading due to lack of domain specific context. As it turns out, some of these hypotheses may have value in developing trading and tactical allocation strategies.
Due to the difficulties in identifying unique hypotheses, quant traders have resorted to data-mining and machine learning for hypotheses suggested by data. The hope is that by mining large amounts of data some hypotheses will be discovered that have economic value in trading and even investing. This has been the prevailing mode of operation in quant space in the last 20 years but in recent years due to availability of machine learning code and libraries this process has intensified. However, the effectiveness and robustness of strategies suggested by data are always highly questionable due to data-mining bias.
Data-mining bias is always present when some process is used to identify strategies suggested by the data. In Chapter 6 of my book Fooled by Technical Analysis: The perils of charting, backtesting and data-mining, I include three main sources of data-mining bias: curve-fitting, selection bias and data snooping.
Curve-fitting involves changing the parameters of a model until desired performance is realized, selection bias is due to rejecting failed models and accepting those that pass a set of criteria and data snooping is due to reuse of data.
Even if out-of-sample validation is used, due to multiple comparisons (testing multiple hypotheses) as the number of tests increases, the probability of finding a random strategy that passes out-of-sample tests but also any other validation tests converges to 1. In fact, as it turns out, frequent backtesting guarantees a strategy that passes all validation tests but it may be random. Is there a solution to these problems?
There may be a solution that depends on how unique is the application of the data-mining process in an analogous ways as when searching for unique hypotheses. It is highly unlikely there is an algo that can generate a winning strategy with a click of a button given the restrictions imposed from the inevitable data-mining bias.
This is why quant trading is hard. The edge in actually in finding unique ways of using data-mining. I have seen that working with people who have used data-mining software I have developed as far back as in early 2000s. There were two types of users: those who expected the program to find something automatically and those who found ways of using the program to discover things others could not. After all, all programs are just analysis tools unless a claim is made that they can generate automatically a usable strategy.
Below are some factors that limit the effectiveness of algos that discover strategies suggested by the data.
1.Some markets are too efficient
This is what we state in our website:
“[The program] will not find strategies that work in all markets especially if the markets are efficient and price action is dominated by noise.”
Some markets are too efficient and signal-to-noise ratio is close to 0. There is no way to discover any strategies suggested by the data and in that case any generated by the algos are over-fitted on noise.
2. Market prices are non-stationary
Non-stationarity can break trading strategies but it is also the main source of profits. If market conditions do not change much, the strategies may continue to work. But if market conditions change significantly, strategies usually fail. If prices were stationary, there would be no profit potential due to lack of tradable trends in any timeframe.
3. Validation becomes part of the process
Any validation scheme eventually becomes a part of the process and loses its effectiveness due to data snooping. Novel ways of validating strategies suggested by data must be used to minimize the effects of data snooping.
Below are some typical mistakes of quants trying to identify strategies suggested by data.
4. No algo will find gold where there is none
It is better to try to find markets that have inefficiencies rather than trying to force an algo on a market that is highly efficient, such as E-mini futures for example.
5. Searching for strategies with high win rate
Win rate must be high enough and usually more than 50% but there is a trade-off in the trading: the higher the win rate, the lower the sample size becomes. It is better to have a large sample size than a high win rate if the trade-off is present. In fact, quants need only a small edge to make money.
6. Underestimating the learning curve
Data-mining should not be used as primary source of strategies because the learning curve is steep and usually in the order of 2 – 5 years depending on time spent. With data-mining one is trying to identify strategies to add to a set of already used ones, not as a fundamental process. It is something quants would use as another source of ideas, for example a machine doing the data mining while they are trading. Some quant funds failed because they relied primarily on data mining. In fact, data mined strategy allocation should be small. The objective is to have a few that are uncorrelated to provide some diversification.
7. If there is no success, that is actually success
If a quant cannot find any strategies suggested by the data based on the validation schemes used, this is actually success, not failure. It may mean that the market is more suitable for buying and holding or some strategic allocation but not for market timing.
Finally, in the Trader Education section of the website there are several free articles under Perils of backtesting and machine learning that provide more details about this subject.
If you have any questions or comments contact us.