Machine LearningRisk: High

XGBoost ML

Machine learning predicts price direction using dozens of market features

Risk

High

Holding Period

5 days (rebalanced periodically)

Best For

Finding non-linear patterns across many features

How it works

Machine learning predicts price direction using dozens of market features

Mathematical Foundation

P(up) = XGBoost(features_t) signal = 1 if P(up) > 0.60, −1 if P(down) > 0.60

Signal Generation Logic

  1. 1Extract 20+ technical features: RSI, MACD, Bollinger Band position, ATR, volume ratios, lagged returns (1d, 5d, 21d)
  2. 2Label each bar with the 5-day forward return sign (+1 up, −1 down, 0 neutral)
  3. 3Train XGBoost classifier on a 2-year rolling window using walk-forward cross-validation
  4. 4Predict probability of up/down move on the next 5 days for the current bar
  5. 5Enter long if P(up) > 0.60, short if P(down) > 0.60, flat otherwise
  6. 6Re-train the model every 21 bars (monthly) on expanded data
  7. 7Apply ATR-based stop loss on each position

Parameters Explained

train_window

Rolling training window. Shorter windows are more adaptive but noisier; longer windows are more stable but may include outdated patterns.

Default

2 years
horizon

Forward return prediction horizon in days. 5 days balances noise reduction with responsiveness to new market conditions.

Default

5
confidence_threshold

Minimum probability needed to trigger a trade. Higher threshold means fewer but higher-confidence signals. Below 0.55 is likely noise.

Default

0.6

When It Works

When there are genuine non-linear relationships in the feature data that simpler models cannot capture. Works best in markets with repetitive technical patterns and sufficient historical data for training. The walk-forward retraining helps it adapt to changing market conditions.

When It Fails

When market microstructure changes fundamentally (e.g., new regulation, exchange rule changes, post-COVID structural shifts). Also prone to overfitting when training data is limited. The model cannot predict true black swan events because they have no historical precedent.

Risks & Limitations

  • Overfitting: the model may learn noise in historical data rather than genuine patterns
  • Feature leakage: improperly constructed features can introduce look-ahead bias
  • Walk-forward training can still be biased if the test window is too short
  • Model decay: performance degrades as market conditions drift away from training distribution
  • Black box risk: hard to interpret why a trade was entered, making risk management difficult
  • High computational cost for retraining every 21 bars on large datasets

Implementation

Uses xgboost Python library with XGBClassifier. Features are computed using pandas TA (technical analysis) library. Walk-forward validation splits the training window into 5 folds for hyperparameter tuning. Predictions are recalculated daily after a monthly model retrain.

Model parameters

Training Window

Rolling walk-forward training set

2 years

Prediction Horizon

Forward return target

5 days

Probability Threshold

Minimum confidence to trade

0.60

Academic background

Academic Basis

Walk-forward methodology based on Lopez de Prado (2018), 'Advances in Financial Machine Learning'

Backtest this strategy

Run the exact model on your selected assets and date range. See trade-by-trade performance.

Backtest This