Why your backtest is lying to you · Runbook

Every trader with a working strategy has one in common: they were once wrong about a strategy that "backtested great." Here are the four ways a backtest lies, ranked by how often they lie.

1. No slippage, no commissions

A frictionless backtest is a backtest of a strategy that nobody can actually trade. On a market order in futures, you're paying a tick of slippage on the entry, a tick on the exit, and commissions on both sides. A scalper making 50 round-trips a day eats 100 × tick_size × contracts before breakfast.

Runbook applies realistic slippage + commission defaults to every backtest as of launch-prep:

Futures: 1 tick slippage on market orders, real TopStep commissions.
Equities: 2 bps slippage, commission-free.
Crypto: 5 bps slippage, no commissions.

If your backtest used to look glorious and now looks ordinary — that's the point. The ordinary version is what you would have actually made.

2. Overfit to in-sample data

If you tune a strategy's thresholds on the same data you measure it on, you're not measuring a strategy — you're measuring your ability to fit noise. The classic symptom: the equity curve looks like a straight line up, the Sharpe is over 2, and the live version immediately loses money.

Runbook's AI-requested backtests automatically apply a 25% out-of-sample split and compare train vs oos metrics. When profitFactorRatio drops below 0.5, the backtest card flags an overfit signal. Don't argue with the flag. Either the edge survives the hold-out or it doesn't.

3. Cherry-picked date range

"I backtested it on 2022" is not a backtest. Markets have regimes — the thing that printed money in 2020 is the thing that blew up in 2022. A real backtest covers multiple regimes: trending, choppy, vol-spike, vol-crush.

Rule of thumb: if your strategy only works in one two-quarter window, you've found a regime, not a strategy. Regimes change. That's fine — just know what you have.

4. Survivorship and look-ahead

Subtle, but deadly. Your strategy might be implicitly peeking at future data (e.g. a close-of-bar indicator used on the open of the same bar), or testing on symbols that exist today but didn't exist then (survivorship bias on delisted tickers).

Runbook's library enforces one rule hard: onBar fires at bar close, and signals execute at the next bar's open. If you see a backtest that looks too good, check the entry fill price vs the signal bar's close — that's where look-ahead usually hides.

The fix isn't more backtests — it's forward-testing

A backtest is a map. A forward-test is the territory. Paper-trading a strategy live for two weeks tells you more than six more months of historical runs. It catches the things history can't: slippage that's bigger than you thought, fills that don't behave like your model, a script that drifts off the data feed at 3am.

If your strategy survives both — ship it. If it doesn't — iterate.