Phase A (single-pass paper scanner) showed $65.98 hypothetical realized P&L over four days on the combined-cost CTF arbitrage thesis. Encouraging. Also not enough — paper that assumes our quote sat at the front of the queue forever doesn't tell us whether the strategy survives contact with real execution.
Today I shipped Phase B (12 tasks) and Phase B.2 Stage 1 (3 tasks) — 15 dispatched implementations, two stages of review per task, 137 tests, two pre-existing bug fixes, and one big strategic re-frame after some honest research on what other people running this strategy have actually learned. The wallet stays closed. The machinery is now real.
Phase A was a single-pass scan: every cron wakes up, looks for combined-cost < 0.97 opportunities, writes a "would-quote" row, exits. Stateless. Phase B is the real engine, running on paper:
OrderClient interface — abstract base class with
PaperOrderClient as the current impl and LiveOrderClient as a stub that
raises NotImplementedError on every method. The seam is live-compatible:
when we eventually wire py-clob-client + a funded wallet, the swap is one
constructor change, not an architectural rewrite.Inventory state class — JSON file (polydoge_v4_inventory.json)
holding cumulative_pnl_all_time, day_pnl, open_positions,
paused_until. Atomic save via .tmp + os.replace. Day-roll at UTC
midnight resets only the day counter — never touches the cumulative seed.cumulative_pnl_all_time = $65.98 across 62 replay records.
Write-once protected. We carry forward our paper history into the new state machine; Phase B
doesn't start from zero.polydoge_v4_killswitches.jsonl.main() — on each cron: reconcile fills against
the trades API, close any resolved one-sided positions, run pre-resolution cancel on
markets resolving within the next 35 minutes, check kill switches, cancel-and-replace
any order whose target price drifted >2¢ since posting, place new quotes via
place_maker_gtc.*/30 (was hourly) — the new state machine
needs sub-hour cadence to actually fire cancel-and-replace, pre-resolution cancel, and
kill-switch reactivity. About 48 runs/day, well within free tier. Workflow also now commits
the three new state files back to the repo so they survive across crons.After Phase B shipped, the research changed how I think about the strategy. Three tasks:
crypto (0.5-8hr), politics (24-168hr), sports (2-72hr),
econ (12-336hr), geopolitics (24-720hr). Each market gets tagged with
its category and that tag rides all the way through to the ledger, dashboard, and P&L attribution.MAKER_REBATE_RATE = 0.001
(0.10%) — a guess. Polymarket's published maker-rebate program is 20-50% of taker fees, which
given 1.0-1.8% taker fees translates to 0.20-0.90% rebates by category. I was undercounting
rebates by 2-5x. The corrected dict: {finance: 0.005, politics/sports/econ/geo: 0.0025,
crypto: 0.002}. Pair P&L on a [email protected] + [email protected] size=10 fill: politics now books $0.30425;
crypto books $0.2994; finance $0.3285. Same trade, different real economics depending on what
we were quoting on.analyze() already
captures yes_queue_ahead and no_queue_ahead — the existing maker depth at our
target price level. Phase B paper ignored those numbers entirely; if a taker SELL crossed our price,
we credited a fill. That's not how real maker queues work. The new reconcile_fills
drains queue_ahead_at_post from cumulative taker volume BEFORE crediting fills to our order.
If 500 shares of qualifying SELL volume arrive but there were 800 shares of maker depth ahead of us,
we get nothing.I asked an agent to find every public Polymarket market-making algorithm and tell me what other people running this strategy have actually learned. Two things came back that mattered.
Polymarket introduced dynamic fees on 5-minute and 15-minute crypto markets in 2025
specifically to curb latency arbitrage. Those markets are dominated by sub-10ms Rust
bots running on Polymarket API co-location. Our 30-minute cron cannot win there. The empirical
data confirmed it: 5-min and 15-min crypto markets sat at bid_combined ≈ 0.99 — 1¢
of room — for our entire Phase A window. Our actual wins came from 4-hour alt markets
(XRP best fill at $12.80, DOGE and SOL in the $5-8 range) where bid_combined opened
up to 0.95.
The structural sweet spot for a 30-min cron is politics, sports, econ, geopolitics — zero taker fees on most, higher rebate tiers, slower price movement, HFTs ignore them below $10K depth. That's the entire premise behind U1.
Othman & Sandholm (CMU 2010, 2012) prove that pure spread capture in prediction markets is negative-EV against informed flow — you lose to adverse selection every time. But CTF redemption is the structural edge that makes combined-cost arbitrage non-zero-EV even against informed takers. We're not running a generic MM strategy. We're running the one strategy on Polymarket the academic literature says can work without information edge, because of how the protocol settles.
Other published numbers: $20M+ in collective Polymarket MM profits in 2024. One profiled bot made $2.2M in two months — though that bot used an ensemble probability model + news signal, a different (directional) strategy than ours. The combined-cost arb cohort is smaller but real.
Two real bugs found while building Phase B that pre-dated this session and would have silently broken parts of the system:
binance.py correctly multiplied fundingRate by 100 to convert
decimal-to-percentage. The CoinGecko fallback path had a comment saying "CoinGecko returns
as percentage already" — wrong; CoinGecko returns decimal too. The funding-spike kill switch
was effectively disabled on any IP that hit the CoinGecko path. Fixed._compute_cumulative_pnl
summed every hypothetical_realized_pnl value through float() — which
happily accepts float("nan") and float("inf"). A single corrupt row
would have made cumulative_pnl_all_time a non-finite value that json.dump
would write as a non-standard JSON token. Added math.isfinite guard with a stderr
warning that skips the row.
The dashboard JSON at /polydoge/data/v4-paper-stats.json gained four new keys
(all additive — old keys preserved): phase_b (current inventory state),
phase_b_pnl_history (cumulative line with Phase A→B switchover marker),
phase_b_fills_summary (24h + 7d fill counts, rebate sums, gas), and
killswitches_recent (last 10 trips). Schema version bumped to 2.
RUN_LOG rows now carry: pre_resolution_cancelled,
one_sided_resolved_count, day_rebates_earned, day_gas_paid,
fills_count, phase_b_active, posting_paused. The bot is
self-attesting about whether each step actually ran.
| Date | Check | Decision |
|---|---|---|
| 2026-05-28 | 1-week check | Funnel widening >60% of crons quoteable? Category mix? Realistic P&L direction? |
| 2026-06-04 | Phase C gate | Pot >$200 OR 14d from switchover. Ship live with $50 wallet, or shelve. |
If queue-aware paper collapses the $16/day implied trajectory below $0.50/day after rest-of-May data lands, that's the honest answer that combined-cost arb at our latency tier was phantom all along — and the past five days of work was tuition I'd rather pay before funding a wallet than after.
If it survives, Phase B.2 Stage 2 ships next: UMA dispute screen (real existential risk per the research — UMA oracle was successfully attacked in March 2025), news-triggered quote withdrawal (the single biggest adverse-selection defense pro MMs use, not in any public bot), auto-redeem on pair completion, per-category dashboard breakdown.
The deeper synthesis is what surfaced the 5x rebate undercount, the wrong market universe, the opportunity-density bottleneck (68% of crons found zero quoteables — we're constrained by opportunity flow, not execution quality), and the realization that paper P&L is mostly machinery validation, not strategy validation. Without that pushback, the wallet might already be funded and bleeding.
Paper continues at */30 through the rest of May. The next decision lands June 4.