XGBoost ML
Machine learning predicts price direction using dozens of market features
Risk
High
Holding Period
5 days (rebalanced periodically)
Best For
Finding non-linear patterns across many features
How it works
Machine learning predicts price direction using dozens of market features
Mathematical Foundation
P(up) = XGBoost(features_t) signal = 1 if P(up) > 0.60, −1 if P(down) > 0.60Signal Generation Logic
- 1Extract 20+ technical features: RSI, MACD, Bollinger Band position, ATR, volume ratios, lagged returns (1d, 5d, 21d)
- 2Label each bar with the 5-day forward return sign (+1 up, −1 down, 0 neutral)
- 3Train XGBoost classifier on a 2-year rolling window using walk-forward cross-validation
- 4Predict probability of up/down move on the next 5 days for the current bar
- 5Enter long if P(up) > 0.60, short if P(down) > 0.60, flat otherwise
- 6Re-train the model every 21 bars (monthly) on expanded data
- 7Apply ATR-based stop loss on each position
Parameters Explained
train_windowRolling training window. Shorter windows are more adaptive but noisier; longer windows are more stable but may include outdated patterns.
Default
2 yearshorizonForward return prediction horizon in days. 5 days balances noise reduction with responsiveness to new market conditions.
Default
5confidence_thresholdMinimum probability needed to trigger a trade. Higher threshold means fewer but higher-confidence signals. Below 0.55 is likely noise.
Default
0.6When It Works
When there are genuine non-linear relationships in the feature data that simpler models cannot capture. Works best in markets with repetitive technical patterns and sufficient historical data for training. The walk-forward retraining helps it adapt to changing market conditions.
When It Fails
When market microstructure changes fundamentally (e.g., new regulation, exchange rule changes, post-COVID structural shifts). Also prone to overfitting when training data is limited. The model cannot predict true black swan events because they have no historical precedent.
Risks & Limitations
- Overfitting: the model may learn noise in historical data rather than genuine patterns
- Feature leakage: improperly constructed features can introduce look-ahead bias
- Walk-forward training can still be biased if the test window is too short
- Model decay: performance degrades as market conditions drift away from training distribution
- Black box risk: hard to interpret why a trade was entered, making risk management difficult
- High computational cost for retraining every 21 bars on large datasets
Implementation
Uses xgboost Python library with XGBClassifier. Features are computed using pandas TA (technical analysis) library. Walk-forward validation splits the training window into 5 folds for hyperparameter tuning. Predictions are recalculated daily after a monthly model retrain.
Model parameters
Training Window
Rolling walk-forward training set
Prediction Horizon
Forward return target
Probability Threshold
Minimum confidence to trade
Academic background
Academic Basis
Walk-forward methodology based on Lopez de Prado (2018), 'Advances in Financial Machine Learning'
Backtest this strategy
Run the exact model on your selected assets and date range. See trade-by-trade performance.
Backtest This →