Research Note·2026-04-18·6 min read

MSR walk-forward? It depends.

We added an annual-refit mode to our Markov Switching strategy expecting it to dominate the 'fit once' version. It doesn't. On SPX it wins huge (+28% vs +1%). On gold it loses (+3% vs +33%). Engineering intuition betrayed us — here's what the 8-year data actually shows.

By Li Tan

Last week we shipped MSR (Markov Switching Regression) — a strategy that fits a different α and σ per market regime and trades based on filtered regime probabilities. The initial version used a one-shot fit: pass the full return history to statsmodels, get back parameters, use them for the entire backtest. Filtered probabilities keep things causal, but the PARAMETERS were estimated with hindsight.

The obvious upgrade: walk-forward. Re-estimate parameters monthly using only data up to that date — no information from the future leaks into anything. The catch is cost: statsmodels' EM converges in ~1-2 seconds per fit, so a 5-year monthly backtest means 60 fits × 2s = 2 minutes. Per asset. Per backtest run. Across a grid-search that's hundreds of runs, we'd be measuring in hours.

So we compromised on annual refit: re-estimate every 252 bars instead of every 21. 5× the cost of once-fit instead of 100×, but (we assumed) you get most of the walk-forward benefit since regime parameters move slowly. Chan (2009) calls this 'annual rolling window' and it's standard practice in academia.

The numbers don't agree

We ran both modes on 8 years of OANDA daily data (2018-07 through 2026-04) for three assets — S&P 500 CFD, EUR/USD, gold. Here's what happened:

SPX500_USD once-fit: 23 trades, total return +1.3%, Sharpe +0.06
SPX500_USD annual refit: 1 trade, total return +28.1%, Sharpe +0.72
EUR_USD once-fit: 8 trades, total return −0.4%, Sharpe −0.13
EUR_USD annual refit: 1 trade, total return +0.7%, Sharpe +0.11
XAU_USD once-fit: 8 trades, total return +32.8%, Sharpe +1.14
XAU_USD annual refit: 1 trade, total return +3.0%, Sharpe +0.63

On SPX the refit version wins by a factor of 20x in total return. On gold it loses by a factor of 10x. Neither version is 'strictly better' — they're different strategies, producing different trade counts and different P&L profiles on the same data.

What's actually going on

The key diagnostic is the trade count. Annual refit emits far fewer signals across the board — 1 trade vs 8-23 for once-fit. Why?

Once-fit uses parameters estimated on the FULL sample, which means the fitted regime means are 'calibrated' to the whole period. In a mean-reverting price series this produces a model that flips between regimes often — any short-term drift above or below the grand mean gets labeled. Annual refit re-anchors the mean every year, so within each year the 'current level' IS the regime mean — drift looks smaller, regime transitions rarer.

Which is 'right' depends on the asset's underlying behavior:

SPX has persistent trends punctuated by distinct bear markets (2018 Q4, 2020 Q1, 2022). Annual refit captures 'we're in a bull regime, stay long' cleanly, while once-fit keeps flipping out on every correction.
Gold has a long secular uptrend from 2019 onward. Once-fit's frequent re-entries on pullbacks happen to work because the asset keeps grinding up — you're basically trading a persistent bull. Annual refit, by re-anchoring, sees the recent past as 'the new normal' and misses entries.
EUR/USD is a mean-reverting range most of the sample. Neither mode makes meaningful money — regime models are the wrong tool for structural range-bound assets.

What we shipped

Both modes are available. MSRStrategy takes a refit_every_bars parameter. None = once-fit (default, fastest). 252 = annual refit. Anything 100-300 is defensible — below 100 the EM can fail to converge on the smaller window.

The takeaway I didn't expect when I started this project: 'more causal' is not strictly better. Walk-forward is the gold standard for statistical honesty, but at the strategy level, the right parameters for gold are not the right parameters for the S&P 500, and forcing the model to use only-recent-data can strip away edge that the full-history fit happens to capture.

The asymmetry between SPX and gold is a reminder that regime models are sensitive to how the underlying asset moves. If you're running MSR on a new instrument, always run both modes and inspect the signals before picking one. Don't assume walk-forward is automatically safer.

If you want to see the numbers yourself: scripts/msr_walk_forward_compare.py in the repo reproduces this whole table. Run it against whichever universe you like.

Like this kind of analysis? Upgrade to Learner for weekly articles, real-time signals, and full educational content.

See Learner plan