Results obtained from machine learning are in many cases fitted to the price series. It is often hard to differentiate the few potentially good systems from a very high number of random and curve-fitted ones. The data-mining bias introduced by the repeated use of an in-sample to search for profitable combinations of features and associated strategies almost guarantees that eventually, some random system(s) will pass the out-of-sample test(s) by luck alone. When market conditions change, the performance of these system(s) deteriorates fast. Essentially, those who choose the best performer(s) out of a large number of candidates that result from the repeated use of an in-sample are often fooled by randomness and data-mining bias.
Below are some criteria one can use to minimize the possibility of a random system due to selection and data-mining biases:
(1) The underline process that generated the equity curve must be deterministic. If randomness and stochasticity are involved and each time the process runs the system with the best equity curve is different, then there is a high probability that the process is not reliable or it is not based on sound principles. The justification for this is that a large number of edges can’t exist in a market and most of those systems must be flukes. DLPAL employs a unique deterministic machine learning algorithm and each time it uses the same data with the same parameters it generates the same output.
(2) The system must be profitable in an out-of-sample for a small profit target and stop-loss just outside the 1-bar volatility range. If not, then the probability that the system possesses no intelligence in timing entries is very high. This is because of a large class of exits, such as trailing stops, that curve-fit performance to the price series. If market conditions change in the future the system will fail.
(3) The system must not involve indicators with parameters that can be optimized to get the final equity curve. If there are such parameters, then the data-mining bias increases due to the higher number of parameters involved making it extremely unlikely that the system possesses any intelligence because it is most probably fitted to the data.
(4) If the results of an out-of-sample test are used to reset the machine learning process and to start a fresh run, data-snooping bias is introduced. In this case validation in an out-of-sample beyond the first run is useless because the out-of-sample has become already part of the in-sample. If an additional forward sample is used, then this reduces to the original in-sample design problem with the possibility of the performance in the forward sample obtained by chance.