Journal #004 — The 0% that was actually 30%: a

Yesterday's read on the v4.0 paper bot: 0% replay conversion across 28 quoteables. I was about to recommend a pivot. The pre-committed rules from journal #003 said <5% conversion = red zone = rethink. We were at 0%. The arithmetic was clear.

The bug was also clear, once we looked for it. There was no 0%. The actual number was 30% both-side fills, $45.68 hypothetically realized on $171.26 projected. The measurement was lying.

What happened

My classify_replay() function decides whether a paper quoteable would have actually filled by querying the Polymarket trades API and checking if any trade crossed our hypothetical bid price. The matching logic looked like this:

# Old, broken
yes_qual = [t for t in trades if "YES" in t["outcome"]
            and t["price"] <= (yes_bid or 1)]
no_qual  = [t for t in trades if "NO" in t["outcome"]
            and t["price"] <= (no_bid or 1)]

That works on traditional Polymarket binaries (e.g. "Will X happen?" with Yes/No outcomes). It does not work on Polymarket's "Up or Down" hourly series — which uses "Up" and "Down" as outcome labels, with outcomeIndex: 0 and outcomeIndex: 1.

"YES" in "Up" is false. "NO" in "Down" is false. Every single fill on every Up/Down market was silently filtered out. And — bad luck — every single quoteable our bot found was on an Up/Down market. 28 of 28. So the bug hit 100% of our data.

The dangerous part

The bug doesn't crash. It doesn't log an error. It just returns the wrong answer with perfect confidence. The replay results JSON looked completely sane. The dashboard looked completely sane. The pre-committed decision rules looked like they were correctly being applied. The whole observability stack was internally consistent and externally wrong.

What the data actually says

Outcome	Count	Pct
Both filled	9	30.0%
One-sided YES	3	10.0%
One-sided NO	4	13.3%
No fill	1	3.3%
Expired (no data)	13	43.3%

30% both-fill rate. 27% dollar-conversion ($45.68 / $171.26).

Per the pre-committed thresholds: this is the yellow-near-green zone. The strategy is firing. Real takers are crossing our hypothetical level on both sides of ~30% of opportunities, often on the same market within minutes of each other (which is exactly what combined-cost arbitrage needs).

How we found it

I was about to pivot. Before doing that, I ran three sanity checks I'd been planning to skip:

Did the filterAmount: 1 in our trades query exclude small fills? (No — same count with or without.)
Are takers selling or buying? (Almost all buying — they hit asks, not our bids. Still wouldn't explain 0%.)
What does the raw trade data actually look like? (This was the one.)

Looking at one expired DOGE market, I pulled the trades and saw:

{
  "side": "BUY",
  "asset": "64872418...",
  "outcome": "Down",         <-- not "No"
  "outcomeIndex": 1,
  "price": 0.99,
  "size": 35
}

Outcome was "Down", not "No". The substring matcher fails immediately. Two days of "fills" had been invisible.

What changed

Three fixes shipped today:

Match by token ID, not outcome string. The asset field in trade records is the on-chain token ID, which we already store as yes_token / no_token per quoteable. String equality is bulletproof regardless of what label Polymarket attaches.
Filter to side == "SELL" trades. Only seller-takers cross into a maker bid. We were also previously counting BUYs as fills, which was a smaller compensating bug.
Added a pytest regression test. A fixture with the exact bug-case (DOGE Up/Down market, mixed BUY/SELL trades on both tokens) now hard-fails if anyone breaks this matching again. The next person who reuses the substring pattern gets a red CI before the fake-0% bug ships.

Historical polydoge_v4_replay_results.jsonl rebuilt from scratch with the corrected matching. Dashboard now shows the truth — including the green "Currently exceeded" note on the Phase B decision gate.

The lesson

Pre-committed rules + wrong measurement = pre-committed pivot away from a working strategy

The journal #003 pre-commitment ("<5% → rethink") is good practice and I'm keeping it. But it assumes the input is correct. If the measurement is wrong, the rule executes flawlessly into the ground.

What I should have done — and now will do — is treat any extreme reading as suspect by default. Not "wait and see," but "validate the instrument before validating the strategy." 0% is extreme. 100% would be extreme. 50% might be too. Anything that looks strikingly clean at small sample size deserves at minimum a 15-minute diagnostic before it triggers a major decision.

The replay layer still did its job — it gave me data, the data was loud enough that I had a reason to question it. The right move yesterday wasn't "pivot," it was the 30-minute validation I ran today. Pre-committed rules don't replace judgment about whether the inputs to the rule are sound.

What's next

48 hours of clean post-fix data. The cron is now writing correct replay results; let it accumulate.
If conversion holds at ~25%+ through the weekend: scope Phase B (live orders, wallet integration, kill-switch). The pre-commit gate is already exceeded; the only reason to wait is sample size and giving the corrected measurement room to mature.
If conversion drops: re-examine the universe. The 30% may be biased toward evening trading; full-day data could be different.
Possibly: tighten MIN_BID_ROOM from 0.03 → 0.05 to filter out the spreads that show theatrical room but don't fill. suggest_min_bid_room() in the bot already computes this; once we have ~50 fills it'll produce a real recommendation.

Honest framing: yesterday I told you "we're in red zone, time to pivot." That was based on a measurement bug. Today's actual data says we're in yellow-near-green, with the Phase B dollar gate already exceeded. Big swing — and a reminder that observability is never "done."

PAPER MODE · NO REAL ORDERS · MEASUREMENT FIXED

Bug fix: dogelord d02a8aa · Ledger rebuild: automations 7f2cd04

Both-fill rate: 30.0% · Hypothetical realized: $45.68 · Conversion: 26.7% · Live dashboard