Trying to discover an edge by randomly backtesting ideas is equivalent to looking for a needle in a haystack. More than 25 years have passed since backtesting software became available to retail traders but the difficulty in finding an edge persists because of various issues related to data-mining and process pitfalls.
The fundamental problem of backtesting for the purpose of finding an edge in the markets is that it it introduces a dangerous form of data-mining bias caused by the reuse of data to test many different hypotheses. This is because when one finally discovers what appears to be an edge that even validates on out-of-sample data, this could be the result of curve-fitting in the in-sample and accidental good performance in the out-of-sample.
Things become a lot worse when many combinations of indicators, exit strategies and performance metrics are used with the data-mining bias increasing rapidly as a function of their number. Even with a few combinations of indicators and exit strategies, the probability of finding an algo that passes all validation tests in the in-sample and the out-of-sample after continuous use of backtesting is for all practical purposes near 1.
System developers who use backtesting programs may think they are improving their chances of finding an edge by repeatedly trying new ideas on historical data when in fact what is happening is that in effect they are increasing the data-mining bias with each subsequent trial and their chances of true success are diminishing. After a few years of using such programs, the probability that all systems found are worthless is for all practical purposes 1 unless there is a drastic paradigm shift in the way the user tries to solve the problem combined with a solid understanding of the issues involves and how one can can deal with them effectively.
The process described above, which basically involves a human developer and backtesting software, has been automated in recent years using genetic programming and various types of evolutionary algorithms. Essentially that amounts to software that combines indicators with exit signals according to evolutionary principles and does that repeatedly until some metrics of performances are optimized. However, these developments instead of taking years to increase the data-mining bias, they achieve that in a few minutes and the end result is the same; repeated use of historical data using many combinations of indicators, exit strategies and performance metrics guarantees that at the end some system(s) will be found that will pass all validation tests but could reflect spurious correlations that disappear when market conditions change, i.e., these systems do not possess any intelligence when generating entry and exit signals but they are just artifacts of curve-fitting in-sample data and just lucky on the out-of-sample data.
Some developers of quant software even claim that they can test trillions of combinations of systems until they find one that meets the search objectives while underestimating or being ignorant of the impact of data-mining bias. Traders who use such products without understanding these realities of repeated backtesting with many degrees of freedom become the victims of their own ignorance and have no chances of ever finding a true edge unless they get to the bottom of this but that requires another edge in the form of an understanding of what must be done to avoid the pitfalls of such processes.
In the mid 1990s when I was investigating automated methods for searching for an edge I was aware of some of these pitfalls and also some of the problems in commercial backtesting programs of the time that probably end up costing fortunes to their users due to fallacious results, mainly due to code limitations. Also, many users of such programs confused forward looking algorithms in historical data for intelligent prediction systems. After exhaustive tests and many months of work I decided that the system I will build to search for edges would deal only with pure price action, simple exit conditions and a few fundamental performance metrics. This is how Deep Learning Price Action Lab™ was born after utilizing Occam’s razor as a guiding principle.
See the full disclaimer here.