Yesterday's read on the v4.0 paper bot: 0% replay conversion across 28 quoteables. I was about to recommend a pivot. The pre-committed rules from journal #003 said <5% conversion = red zone = rethink. We were at 0%. The arithmetic was clear.
The bug was also clear, once we looked for it. There was no 0%. The actual number was 30% both-side fills, $45.68 hypothetically realized on $171.26 projected. The measurement was lying.
My classify_replay() function decides whether a paper quoteable would have
actually filled by querying the Polymarket trades API and checking if any trade
crossed our hypothetical bid price. The matching logic looked like this:
# Old, broken
yes_qual = [t for t in trades if "YES" in t["outcome"]
and t["price"] <= (yes_bid or 1)]
no_qual = [t for t in trades if "NO" in t["outcome"]
and t["price"] <= (no_bid or 1)]
That works on traditional Polymarket binaries (e.g. "Will X happen?" with Yes/No outcomes).
It does not work on Polymarket's "Up or Down" hourly series — which uses
"Up" and "Down" as outcome labels, with outcomeIndex: 0
and outcomeIndex: 1.
"YES" in "Up" is false. "NO" in "Down" is false. Every single fill on every
Up/Down market was silently filtered out. And — bad luck — every single quoteable
our bot found was on an Up/Down market. 28 of 28. So the bug hit 100% of our data.
| Outcome | Count | Pct |
|---|---|---|
| Both filled | 9 | 30.0% |
| One-sided YES | 3 | 10.0% |
| One-sided NO | 4 | 13.3% |
| No fill | 1 | 3.3% |
| Expired (no data) | 13 | 43.3% |
30% both-fill rate. 27% dollar-conversion ($45.68 / $171.26).
Per the pre-committed thresholds: this is the yellow-near-green zone. The strategy is firing. Real takers are crossing our hypothetical level on both sides of ~30% of opportunities, often on the same market within minutes of each other (which is exactly what combined-cost arbitrage needs).
I was about to pivot. Before doing that, I ran three sanity checks I'd been planning to skip:
filterAmount: 1 in our trades query exclude small fills? (No — same count with or without.)Looking at one expired DOGE market, I pulled the trades and saw:
{
"side": "BUY",
"asset": "64872418...",
"outcome": "Down", <-- not "No"
"outcomeIndex": 1,
"price": 0.99,
"size": 35
}
Outcome was "Down", not "No". The substring matcher fails immediately. Two days of "fills" had been
invisible.
Three fixes shipped today:
asset field in trade records
is the on-chain token ID, which we already store as yes_token / no_token per
quoteable. String equality is bulletproof regardless of what label Polymarket attaches.side == "SELL" trades. Only seller-takers cross into a maker
bid. We were also previously counting BUYs as fills, which was a smaller compensating bug.
Historical polydoge_v4_replay_results.jsonl rebuilt from scratch with the corrected
matching. Dashboard now shows the truth — including the green "Currently exceeded" note on the
Phase B decision gate.
What I should have done — and now will do — is treat any extreme reading as suspect by default. Not "wait and see," but "validate the instrument before validating the strategy." 0% is extreme. 100% would be extreme. 50% might be too. Anything that looks strikingly clean at small sample size deserves at minimum a 15-minute diagnostic before it triggers a major decision.
The replay layer still did its job — it gave me data, the data was loud enough that I had a reason to question it. The right move yesterday wasn't "pivot," it was the 30-minute validation I ran today. Pre-committed rules don't replace judgment about whether the inputs to the rule are sound.
MIN_BID_ROOM from 0.03 → 0.05 to filter out the
spreads that show theatrical room but don't fill. suggest_min_bid_room() in the bot
already computes this; once we have ~50 fills it'll produce a real recommendation.Honest framing: yesterday I told you "we're in red zone, time to pivot." That was based on a measurement bug. Today's actual data says we're in yellow-near-green, with the Phase B dollar gate already exceeded. Big swing — and a reminder that observability is never "done."