Entry #006 · Reliability
2026-05-31
We wrapped the v4 engine in a harness: one command to verify everything
(./init.sh) and a feature_list.json where a feature only turns green
when its check actually passes. First run: 146/147. A test that had read
“137/137 passing” in our notes for ten days was secretly reading the live replay
file the */30 cron keeps growing — 62 records when we froze $65.98, 138 by the time the
harness ran it ($77.28). Not an engine bug; a test reading a moving target and calling it correct.
Fix: froze the original 62 records as a committed fixture and pointed the test at that.
147/147 now, zero engine changes. The harness’s whole job is making “green” mean something.
harness-engineeringreliabilityflaky-testfixturebuilding-in-public
Entry #005 · Experiments #015 + #016
2026-05-21
15 dispatched tasks across two stages. Phase B replaces the single-pass scanner with a
stateful order machine: OrderClient interface, Inventory state file
(seeded $65.98 from Phase A), 5 kill switches, pre-resolution cancel, one-sided fill resolution,
Discord alerts. Phase B.2 Stage 1 expands the universe to 5 categories (was crypto-only),
fixes a 5x undercount in the maker rebate constant, and adds queue-aware fill simulation —
the truthful version of paper that drains existing maker depth before crediting our fills.
Two pre-existing bugs caught: CoinGecko funding rate was 100x under-reported; migration
could be poisoned by NaN/Inf. 137/137 tests. Cron bumped to */30. Phase C gate: 2026-06-04.
v4.0phase-bstate-machinequeue-aware-fills5-categorieskill-switchesrebate-fix
Entry #004 · Experiment #014
2026-05-18
Yesterday's "0% conversion = pivot the strategy" read was wrong. The number was a
measurement bug — classify_replay() substring-matched outcome strings against
"YES"/"NO", but Polymarket's "Up or Down" series uses "Up"/"Down".
100% of our fills were invisible for two days. Actual conversion: 30% both-side
fills, 27% dollar conversion ($45.68 / $171.26). The Phase B decision gate is
already exceeded. Pre-committed rules + wrong measurement = pre-committed pivot away from
a working strategy. Fixed, tested, and shipping with a pytest regression so this bug class
fails loudly next time.
postmortemmeasurement-bugreplay-validationpre-commitv4.0
Entry #003 · Experiment #014
2026-05-16 (evening)
Six hours after launch, v4.0 paper has 11 cycles, 180 markets scanned, 9 quoteable opportunities.
The arbitrage thesis fired faster than expected — and not where expected.
The biggest spreads are in DOGE (20¢ room) and XRP (19¢), not BTC (6¢).
Bitcoin's hourly market is efficient; the alt-coin "Up or Down" series are wide open.
Bumping stake $5 → $50/side so the daily numbers are loud enough to learn from. Phase B
decision gate scales accordingly ($0.30/day → $3/day). Tomorrow morning the first overnight replay data
tells us whether the spreads are tradeable or theatrical.
v4.0paper-resultsasset-mixstake-bumpdogexrp
Entry #002 · Experiment #014
2026-05-16 (afternoon)
The combined-cost arbitrage thesis fired its first signal. 2 markets crossed the
MIN_BID_ROOM = 0.03 threshold within hours of the v4.0 deploy — bid_combined under $0.97
on Bitcoin and Dogecoin "Up or Down" hourly series. Paper mode only; the replay layer revisits these
over the next ~8 hours to check whether actual taker volume crossed the hypothetical bid prices.
"Quoteable" ≠ "would have filled" — that's the whole point of replay validation.
v4.0first-signalpaper-modeauto-journal
Entry #001 · Experiment #014
2026-05-16 (morning)
After 83 days, 6,267 predictions, and $582 in losses, the directional prediction engine retired.
We ran one final diagnostic and found a structural problem: across 4,294 shadow predictions, the
algorithm never confidently disagreed with the market (0 instances). Every signal
we use is public, so we reach the same conclusion as the market — there's no information edge to find.
v4.0 pivots to combined-cost arbitrage on Polymarket binaries: market-neutral, no fair-value model,
relies on Polymarket's maker rebate program. Currently paper-only via GHA hourly cron. Honest projection:
$0.30-1.00/day on $500 capital. Science project, not business — at current scale.
pivotv4.0liquidity-provisioncombined-cost-arbitragehonest-loss
Next checkpoint: tomorrow morning (2026-05-17) — first overnight replay data lands. Either the spreads were real or they were theatrical.