Journal — PolyDoge Lab Notebook

This experiment is closed. PolyDoge was shelved on 2026-06-29 (Experiment #018) after four months and two strategies — directional prediction (v3.x) and combined-cost market-making (v4.0) — both proved structurally −EV at our cadence, for zero real dollars risked. Read the full retrospective and the mechanism autopsy. The entries below remain exactly as written — no revisionism.

Entry #009 · Retrospective 2026-06-30

The full retrospective: inception to shelf.

Four months, two strategies, two honest noes, zero real dollars. The complete arc — with the one chart that explains it all: the P&L degradation cascade ($814 theoretical → $149 hypothetical → −$405 real). Each layer of realism we added was a truth filter, and the truth was negative. Inside: what we learned about the strategy (no informational edge, twice), about process (build the cheapest test of the load-bearing assumption first), and about ourselves (the pivot was infra-driven, not edge-driven; "the real shot" fired three times). Pros, cons, and the honest reason we shelved — it isn't impossibility, it's cadence and focus. The winners play the 30-second window; our hourly cron fishing day-long markets never could. Shelved by choice, door open.

retrospectivedegradation-cascadeadverse-selectionopportunity-costbuilding-in-public

Entry #008 · Post-Mortem 2026-06-29

We shelved it. Here’s the autopsy.

Both pre-registered gates failed a week early, so PolyDoge is shelved — not paused, shelved. Over the clean window: 0 pairs completed from 8 fills, $0 rebate captured, −$405 paper P&L. Root cause is one mechanism: combined-cost arbitrage needs both legs to fill, but adverse selection reliably fills only the leg about to go to zero — and you can’t redeem half a pair. The very edge Othman & Sandholm say rescues a prediction-market maker (CTF redemption) was built correctly and structurally unreachable at a 30-min cron with $50 a side. v3.x couldn’t predict the market; v4.0 couldn’t make it. Two honest noes beat one expensive maybe. The cron is off; forensics stay public.

post-mortemadverse-selectioncombined-cost-arbitrageshelvedbuilding-in-public

Entry #006 · Reliability 2026-05-31

We gave the bot a harness. It caught a lie on day one.

We wrapped the v4 engine in a harness: one command to verify everything (./init.sh) and a feature_list.json where a feature only turns green when its check actually passes. First run: 146/147. A test that had read “137/137 passing” in our notes for ten days was secretly reading the live replay file the */30 cron keeps growing — 62 records when we froze $65.98, 138 by the time the harness ran it ($77.28). Not an engine bug; a test reading a moving target and calling it correct. Fix: froze the original 62 records as a committed fixture and pointed the test at that. 147/147 now, zero engine changes. The harness’s whole job is making “green” mean something.

harness-engineeringreliabilityflaky-testfixturebuilding-in-public

Entry #005 · Experiments #015 + #016 2026-05-21

Phase B is live. On paper.

15 dispatched tasks across two stages. Phase B replaces the single-pass scanner with a stateful order machine: OrderClient interface, Inventory state file (seeded $65.98 from Phase A), 5 kill switches, pre-resolution cancel, one-sided fill resolution, Discord alerts. Phase B.2 Stage 1 expands the universe to 5 categories (was crypto-only), fixes a 5x undercount in the maker rebate constant, and adds queue-aware fill simulation — the truthful version of paper that drains existing maker depth before crediting our fills. Two pre-existing bugs caught: CoinGecko funding rate was 100x under-reported; migration could be poisoned by NaN/Inf. 137/137 tests. Cron bumped to */30. Phase C gate: 2026-06-04.

v4.0phase-bstate-machinequeue-aware-fills5-categorieskill-switchesrebate-fix

Entry #004 · Experiment #014 2026-05-18

The 0% that was actually 30%.

Yesterday's "0% conversion = pivot the strategy" read was wrong. The number was a measurement bug — classify_replay() substring-matched outcome strings against "YES"/"NO", but Polymarket's "Up or Down" series uses "Up"/"Down". 100% of our fills were invisible for two days. Actual conversion: 30% both-side fills, 27% dollar conversion ($45.68 / $171.26). The Phase B decision gate is already exceeded. Pre-committed rules + wrong measurement = pre-committed pivot away from a working strategy. Fixed, tested, and shipping with a pytest regression so this bug class fails loudly next time.

postmortemmeasurement-bugreplay-validationpre-commitv4.0

Entry #003 · Experiment #014 2026-05-16 (evening)

First quoteables, asset surprise, $50 stake.

Six hours after launch, v4.0 paper has 11 cycles, 180 markets scanned, 9 quoteable opportunities. The arbitrage thesis fired faster than expected — and not where expected. The biggest spreads are in DOGE (20¢ room) and XRP (19¢), not BTC (6¢). Bitcoin's hourly market is efficient; the alt-coin "Up or Down" series are wide open. Bumping stake $5 → $50/side so the daily numbers are loud enough to learn from. Phase B decision gate scales accordingly ($0.30/day → $3/day). Tomorrow morning the first overnight replay data tells us whether the spreads are tradeable or theatrical.

v4.0paper-resultsasset-mixstake-bumpdogexrp

Entry #002 · Experiment #014 2026-05-16 (afternoon)

v4.0 first quoteable opportunity.

The combined-cost arbitrage thesis fired its first signal. 2 markets crossed the MIN_BID_ROOM = 0.03 threshold within hours of the v4.0 deploy — bid_combined under $0.97 on Bitcoin and Dogecoin "Up or Down" hourly series. Paper mode only; the replay layer revisits these over the next ~8 hours to check whether actual taker volume crossed the hypothetical bid prices. "Quoteable" ≠ "would have filled" — that's the whole point of replay validation.

v4.0first-signalpaper-modeauto-journal

Entry #001 · Experiment #014 2026-05-16 (morning)

v3.x is dead. v4.0 lives. We're now a market maker.

After 83 days, 6,267 predictions, and $582 in losses, the directional prediction engine retired. We ran one final diagnostic and found a structural problem: across 4,294 shadow predictions, the algorithm never confidently disagreed with the market (0 instances). Every signal we use is public, so we reach the same conclusion as the market — there's no information edge to find. v4.0 pivots to combined-cost arbitrage on Polymarket binaries: market-neutral, no fair-value model, relies on Polymarket's maker rebate program. Currently paper-only via GHA hourly cron. Honest projection: $0.30-1.00/day on $500 capital. Science project, not business — at current scale.

pivotv4.0liquidity-provisioncombined-cost-arbitragehonest-loss

Next checkpoint: tomorrow morning (2026-05-17) — first overnight replay data lands. Either the spreads were real or they were theatrical.