Examples

AIs make more informed decisions with Credence than without.

One of these has now resolved — shown first. A single resolved call isn’t a track record; a live, audit-linked scoreboard is the real proof, and it’s coming. But here’s one that played out.

Resolved · 16 May 2026

Where should I host my Eurovision party?

A Denmark-themed party: commit to a premium venue (worth +€500 if Denmark finishes top-5, −€500 if not), or host at home. Before the final — forecast pulled from the production API on 2026-05-12, four days out:

Claude + Internet → book the venue.

Polymarket’s 66.5% is above the 50% breakeven, so expected value looks positive (+€165).

Claude + Credence → host at home.

Credence’s calibrated mean was actually higher — 72.7% — so on the point estimate it agreed: book. But with only n_effective ≈ 5 the posterior was wide; its 10th percentile sat at 45.9%, below breakeven. The robust call was to pass.

Why it mattered: Credence wasn’t “more right” about the odds — in fact it leaned more toward Denmark than the market did. The flip came from n_effective flagging that the evidence strength was too thin to bet on. The downside the wide posterior warned about — which even Credence’s own mean put near 1-in-4 — is the one that landed. The price said book; the confidence said don’t; the result said don’t.

What happened — 16 May 2026: Denmark finished 7th, outside the top five (Bulgaria, Israel, Romania, Australia, Italy). The venue bet would have lost €500. Claude + Credence’s call: €0.

Driving market: Denmark top-5 at Eurovision 2026. p_market 66.5% · p_calibrated 72.7% · n_effective 4.80, pulled from the production API 2026-05-12 — before the 16 May final. Outcome real (Denmark 7th); the venue payoff is an illustrative scenario. Not advice.

Show Claude’s math

EV (venue − home | p) = 1000 p − 500 ; breakeven p = 50%.

Market mean 0.665 → +€165 (venue). Credence mean 0.727 → +€227 (still venue).

Posterior Beta(3.49, 1.31), n_effective 4.80 → P10 = 45.9% → EV at P10 = −€41.

Robustness rule (commit only if P10 ≥ breakeven) → home.

The mean alone never flips this; the evidence does.

Assumption: 10th-percentile robustness rule.

Realized 2026-05-16: Denmark 7th, not top-5 → venue −€500, home €0.

The mean alone never flips this — and here the mean was on the wrong side. The evidence is what saved it.

Should I rent a tent for this weekend’s event?

An outdoor celebration spans two days. One wet day is fine; rain on both is a washout that forfeits $40,000 in deposits. A tent costs $6,000 and removes the risk.

Claude + Internet → skip the tent.

Treating the 35% daily-rain price as fixed, the washout chance is 12.3% → expected loss $4,900, cheaper than the tent.

Claude + Credence → rent the tent.

Same 35% mean — but a washout needs two bad draws, and uncertainty in the rate widens the posterior, lifting the washout chance to 16.8% → expected loss $6,720, now above the tent’s cost.

Why it flipped: The calibrated probability is identical to the market’s. The decision flips purely on posterior width through a compounding payoff — exactly what a single price can’t capture.

Driving market: rain at the venue on an event-weekend day. p_market 35.0% · p_calibrated 35.0% · n_effective 4.0. Illustrative construction; not advice.

Show Claude’s math

Washout = rain both days = p². Tent $6,000; washout loss $40,000; breakeven E[p²] = 0.15.

Plug-in: 0.35² = 0.1225 → EV (skip) = −$4,900 → SKIP.

Posterior Beta(1.40, 2.60), n_effective 4.0, Var(p) = 0.0455 → E[p²] = 0.1225 + 0.0455 = 0.168 → EV (skip) = −$6,720 → RENT.

Same mean as the market; the convex payoff makes the spread matter.

Assumption: both days share one uncertain rain-rate p, conditionally independent given p — so uncertainty about the rate correlates the days. (Independent day-specific rates would give E[p₁p₂] = 0.1225, not 0.168.)

Should I book refundable or non-refundable travel?

A big trip abroad. Non-refundable bookings are cheaper but forfeit $20,000 if the trip is derailed; fully flexible booking costs $6,000 more. The trip is derailed if any of four unrelated risks hits — a strike, a storm, an entry-rule change, or a transport disruption.

Claude + Internet → book non-refundable.

The four market prices compose to a 27.7% chance of derailment — below the 30% where flexibility pays for itself.

Claude + Credence → book refundable.

Each risk is calibrated a little higher; none decisive alone, but compounded across four they lift the derailment chance to 33.8%, past breakeven.

Why it flipped: No single market moves the decision. Small calibration corrections compound across unrelated markets into one that does.

Driving markets: four independent risks (strike / storm / entry policy / transport). Joint derailment: 27.7% market vs 33.8% calibrated. Illustrative construction; not advice.

Show Claude’s math

Derailed = any of four = 1 − ∏(1 − pᵢ); breakeven 30%.

Market: 1 − (.92)(.88)(.95)(.94) = 27.7% → EV (rigid) −$5,541 vs flex −$6,000 → RIGID.

Credence: 1 − (.90)(.86)(.93)(.92) = 33.8% → EV (rigid) −$6,755 → FLEX.

No single risk alone crosses 30% (each lands near 29%); only the four compounded do.

Assumption: the four risks are modeled as independent.

Should I realize capital gains this year or next year?

A near-retiree holds a concentrated position with a large unrealized gain. Defer 18 months hoping a tax change repeals a surtax, or realize now and diversify.

Claude + Internet → defer.

At the market’s 46.5% for Senate control, expected value slightly favors waiting (+$277).

Claude + Credence → realize now.

Calibrated to 53.8%, the repeal path is less likely than the market implies; expected value flips to favor realizing, with the posterior putting ~59% probability on realizing being the right call.

How to read it: repeal of the surtax runs through a Republican Senate, so a higher chance of Democratic control means a lower chance of repeal — Credence’s higher 53.8% therefore pushes toward realizing now, not deferring.

Why it flipped: Here it’s the calibrated mean that moves the decision — across a real economic threshold the market was sitting on.

Driving market: Democratic Senate control after the 2026 midterms. p_market 46.5% · p_calibrated 53.8% · n_effective 12.86. Inputs pulled from the production API, 2026-05-12. Illustrative; not advice.

Show Claude’s math

P(repeal) = 0.45·(1 − p_dem) + 0.02·p_dem.

EV (defer − realize) = $15,200 · P(repeal) + $5,769 − $9,293.

Market p_dem 0.465 → P(repeal) 25.005% → EV = $15,200(.25005) + $5,769 − $9,293 ≈ +$277 → DEFER.

Credence p_dem 0.538 → P(repeal) 21.866% → EV = $15,200(.21866) + $5,769 − $9,293 ≈ −$200 → REALIZE.

Breakeven p_dem ≈ 50.7%; posterior Beta(6.92, 5.94) → P(p_dem > 50.7%) ≈ 59.1%.

Assumption: repeal is worth $15,200 if it occurs.

How the agent decides.

It reasons over Credence’s full posterior, not a single number. For an expected-value choice it takes the expectation across the whole distribution — which already diverges from the market price when the payoff isn’t linear in the probability. For a choice with a real downside, it adds a robustness rule: commit only if the posterior’s lower tail (here, the 10th percentile) still clears breakeven. A raw market price alone gives the agent none of that structure.

There are several ways the full probability object changes the call, all shown above:

  • the corrected mean crosses a decision threshold — on one market, or by compounding across several
  • a convex payoff turns the posterior’s spread into a different expected cost
  • a downside-bounded rule turns on the posterior’s tail, not the price

A market price is a single number. Your AI does the reasoning and Credence supplies the probability objects behind it — and these examples are only a few of the decision types that object can change.

Illustrative demonstrations of how a probability object API changes a downstream decision — not tax, legal, investment, or event-planning advice. Figures marked ‘production API’ were pulled 2026-05-12; others are illustrative constructions. The Eurovision outcome (Denmark 7th, 16 May 2026) is a real resolved result and the forecast was a real production-API pull; the venue payoff is an illustrative scenario.