AI's Risk Prediction: Was Its Trading Logic a Placebo?

A beginner-friendly summary of the verification: “AI’s Risk Prediction: Was Its Trading Logic a Placebo?”.

What’s the idea?

Most algorithmic trading strategies (EAs) focus on predicting which way the market will go – up or down. But what if we shifted our focus from direction to risk? That was the core idea behind this experiment. Instead of trying to predict future price movements, we aimed to predict future volatility (how much prices are likely to swing). If we know how risky the market is likely to be, we can adjust our trade size, or “leverage,” accordingly. High predicted risk? We take a smaller position. Low predicted risk? Maybe we can safely increase our leverage. This is called “leverage optimization,” and the goal is to smooth out returns and improve risk-adjusted performance. To do this, we turned to Machine Learning (ML), specifically an algorithm called LightGBM (a powerful and efficient gradient boosting framework). Think of it like a super-smart pattern recognition tool, designed to find hidden relationships in data.

How I tested it

So, what kind of information did our LightGBM model use to predict future risk?

Features (Inputs): We fed the model several key indicators:
Trailing Volatility (rv5-60): This measures how volatile the market has been over the past 5 to 60 days. It’s a fundamental measure of recent price swings.
Vol-of-Vol: This looks at how much the volatility itself is changing. Is the market becoming more or less volatile?
Momentum: The speed and strength of recent price changes.
Drawdown: How far prices have fallen from recent highs, indicating periods of stress.
US500 Distance: The relationship between the FX pair and the S&P 500 index, a broad market indicator, which can sometimes provide clues about risk appetite.
Target (What it predicted): The model’s goal was to predict the realized volatility (the actual volatility that occurred) over the next 1 to 20 periods (t+1 to t+20).
Methodology:
Walk-Forward Optimization: We didn’t just train the model once and test it. Instead, we continuously re-trained it on new, incoming data and tested it on the very next, unseen data. This mimics how an EA would operate in real-time, making it a much more realistic backtest.
Embargo and Leak Prevention: Crucially, we implemented strict measures to prevent “data leakage.” This means ensuring the model never accidentally “sees” future data during training, which would give it an unfair, unrealistic advantage. Imagine peeking at the answers before a test – it would make the results look great but wouldn’t reflect your actual knowledge!
Initial Test Period: Our first run covered the period from 2019 to 2025.

What happened?

Initial, “Too Good To Be True” Results

When we first ran the test for the 2019-2025 period, the results were incredibly exciting! The ML-driven risk sizing strategy achieved a monthly return of +3.12% while keeping drawdowns (peak-to-trough declines) under a 10% limit. To put that in perspective, a non-optimized, baseline strategy (like a simple buy-and-hold or a strategy without this dynamic leverage) during the same period only managed +1.16%. In other words, our ML risk sizing seemed to be delivering about 2.7 times better monthly returns! It looked almost too good to be true… and as it turns out, it was.

The Robustness Check: Placebo Effect Strikes!

This is where the real learning began. Whenever results seem “too good,” it’s time for rigorous validation. We subjected the strategy to two crucial robustness checks: (a) Extended Test Period: First, we extended the test period further back, from 2018 to 2025. The monthly return immediately shrunk to +1.64%. Still positive, but a significant reduction from the initial +3.12%. This was the first red flag. (b) The Placebo Test: This was the critical step. We introduced a “placebo” strategy. Instead of using the ML model’s actual risk predictions to adjust leverage, we simply randomly shuffled those predictions. The strategy still applied varying leverage, but the specific sizes chosen were based on zero useful information from the ML model. It was like giving a patient a sugar pill instead of real medicine – any perceived improvement would be due to other factors, not the medicine itself. And here’s the shocking part: this placebo strategy, using completely random and meaningless predictions for sizing, achieved a monthly return of +1.44%! Think about that for a moment: A strategy relying on random noise for its leverage decisions performed almost identically to our sophisticated Machine Learning model (+1.64% vs. +1.44%). This revealed a crucial insight: The vast majority of the “edge” we thought we had (whether the initial +3.12% or the later +1.64%) was merely a “side effect of the leverage distribution.” In other words, simply applying any varying leverage, even random ones, within a similar distribution of sizes, yielded most of the benefit. The actual true improvement contributed by our ML model’s predictions was a tiny +0.20% (+1.64% - +1.44%). That’s so small, it’s essentially noise and not a robust, tradable edge.

What I learned

The ultimate conclusion from this experiment is clear: our ML risk sizing strategy, despite its initial promise, proved not to be robust. It “peeled away” (as we say in Japanese, 剥落) when subjected to stringent placebo testing. Why did this happen? It seems that for predicting future volatility, simpler metrics like trailing volatility (past volatility) already capture most of the useful information. The complex LightGBM model, while powerful, didn’t really add a significant new “edge” on top of what was already apparent from basic historical data. This mirrors findings from a previous study (Research #34), where we saw that ML models often “rediscover” existing trends rather than finding genuinely new ones for direction. This was essentially the sizing version of that same phenomenon – no big, new edge was found. Our existing v1.4.0 strategy, which doesn’t use this ML risk sizing, remains our most robust and best-performing approach. The crucial lesson here is a powerful one for anyone building EAs: Comparing monthly returns with a fixed drawdown limit (like our “DD10% month” metric) can be highly misleading, especially when you’re adding a “sizing overlay” that dynamically adjusts leverage. The very act of adjusting leverage can introduce enough noise to make a random strategy look profitable! Therefore, you must always, always, always test your strategies with a placebo and across multiple time periods. This is the only way to truly verify if your strategy has a robust, genuine edge, or if you’re just chasing a mirage created by random chance and the mechanics of leverage.

How this connects

This verification builds on earlier ones (what failed before and what I tried this time, comparisons between approaches).

AI Unlocks FX Secrets: Machine Learning Discovers Hidden…