Overview · Methodology

Methodology

v1.0current

The full specification. Every published number on this site is reproducible from this document plus the open dataset. Disputes via GitHub issue; errata appear in the next weekly newsletter.

Effective 2026-05-26 · sha256 fa87446ef6e023ee60092ffd

Methodology v1.0

Effective: 2026-05-24

ai-receipts converts heterogeneous public AI-provider incident feeds into a single citable measurement: a User-Impact Score (UIS) 0-100 per incident and an impact-weighted uptime percentage per provider. This document specifies every formula, threshold, and confirmation rule used to produce the numbers published on this site. Nothing is proprietary; nothing is hand-waved.

License: this document is CC-BY-4.0. The reference implementation is MIT. The public dataset is CC-BY-4.0. Anyone may reproduce every published number from those three artifacts. If your reproduction produces a different number, file an issue on the public repository — methodology errata are published prominently in the next weekly newsletter.


1. Executive summary

We measure AI provider reliability across 14 monitored providers by aggregating each provider's public incident feed, applying alternative-signal corroboration (provider X accounts, GitHub SDK issue surges, HN keyword polling, X mentions), and scoring each confirmed incident on a published 0-100 User-Impact Score. Uptime is derived from impact-weighted incident duration over a reporting window. Every published number is request-weighted in intent and impact-weighted as a working proxy until independent probes are in place; every number is tagged with the methodology version that produced it. The revision protocol matches the S&P index restatement standard: changes that alter ≥0.1% of historical scores restate the prior 90 days; older data carries its prior methodology version forever.


2. Data sources

2.1 Per-provider feeds

Mistral added a Checkly-hosted status page in May 2026 (verified 2026-05-25). It does not expose RSS / Atom / JSON or webhooks — only email subscriptions — but the page is Nuxt-server-rendered and embeds full structured incident state in a __NUXT_DATA__ script tag. Our parser reads that payload. Only currently-unresolved incidents are present on this surface; historical incident records are not exposed and backfill via the status page is not possible.

DeepSeek hosts its status page on Flashcat (not Statuspage), but the /history.rss feed schema is identical to Statuspage's so the existing parser handles it without modification. No JSON API is exposed, so backfill is RSS-only (~20 most recent items).

Pre-launch we evaluated xAI and OpenRouter for V0 and deferred both. xAI's status page (status.x.ai) is Cloudflare-bot-blocked to server-side fetchers; OpenRouter publishes only current state with no machine-readable incident history. Both will be revisited when a parseable feed becomes available.

2.2 Alternative signal sources

Five auxiliary signals feed the multi-source confirmation gate described in §6. Together with each provider's own status page, they form the six independent observation channels used to confirm incidents.

Signal sourceTierMechanism
Provider's own status pageStrongPolled per §2.1
Provider's official X account (verified, posted ≤30 min ago)StrongFiltered search of the verified handle
GitHub SDK issue surge (≥3 issues in 30 min naming the provider, OR rate ≥2σ above baseline)MediumRolling-rate monitor per official SDK repository
Multi-region reachability disagreement (≥1 region reports unreachable AND ≥1 other region reports reachable within 30 min)MediumIndependent per-region probes from Vercel iad1 / sfo1 / fra1, every 15 min
HN front-page keyword match (provider or model named in front-page title within 30 min)WeakAlgolia HN API keyword scan
≥3 distinct X mentions of provider plus issue in a 15-min windowWeakDistinct-author search via the X API

Removed in v1.2: Downdetector. Prior pre-launch internal versions listed Downdetector's per-company RSS as a Medium source. In practice the endpoint is gated behind Cloudflare bot protection that responds 403 to any server-side fetcher, regardless of headers or originating IP class (verified from Vercel and Supabase networks). Circumventing this protection would be adversarial against Downdetector's explicit access posture and would contradict this project's brand voice; instead the source is removed. No historical incident in this dataset was ever confirmed by a Downdetector signal (zero rows of source = 'downdetector' ever landed in signal_observations), so no UIS or confirmation label is restated. If access becomes available later — official API, X-account redistribution, or otherwise — the source can be reinstated under a future bump.

Added in v1.0 launch: multi-region disagreement. Independent reachability probes from three Vercel-pinned regions (iad1 / sfo1 / fra1) run every 15 minutes against each provider's status-page URL. When at least one region reports unreachable AND at least one other region reports reachable in the same 30-minute window, this counts as a Medium-tier confirmation source. With the multi-region channel wired, the §6.2 "2 distinct Medium sources" branch of the gate is now reachable (github_sdk + multi_region_disagreement co-firing produces a Reported label).

2.3 Backfill ceilings

Source classAvailable history
Statuspage RSS feeds~20 most recent items; deeper history via the Statuspage incidents API
Google Cloud incidents.jsonFull dump
AWS Health JSONLimited; per-region and per-service filtering required
Perplexity summary endpointCurrent state only
OpenRouterNo public history
xAI status pageNo machine-readable history

3. Parser inventory

Five parser shapes plus a small bespoke bucket cover every monitored provider. All parsers normalize to a single internal incident shape; the original provider payload is stored verbatim and never overwritten, which is what makes data versioning (§9) possible.

  1. Statuspage RSS — OpenAI, Anthropic, Cohere, DeepSeek, Groq, Cloudflare, Azure OpenAI, Replicate (DeepSeek is Flashcat-hosted but publishes a Statuspage-compatible RSS schema, so the same parser handles it)
  2. Better Stack RSS — Fireworks, Together AI (different vendor and item-field schema than Statuspage)
  3. Google Cloud JSON — Google AI / Vertex / Gemini
  4. AWS Health JSON — AWS Bedrock (region + service filter)
  5. Bespoke — Perplexity (current-state summary), Mistral (Nuxt SSR payload from status.mistral.ai)

4. User-Impact Score (UIS)

4.1 Formula

UIS = clamp(0, 100,
  base_severity
  × component_weight
  × scope_multiplier
  × duration_multiplier
  × tier_multiplier
)

The structure follows CVSS v3.1/v4.0 — the most widely deployed scoring system in technology — for a reason: a multiplicative composite with published weights is auditable and survives. All five factors below have explicit tables; nothing is proprietary.

4.2 base_severity (from the provider's own classification)

Provider declaresbase_severity
none / not classified10
minor25
major60
critical90

Provider classifications are taken as given. They are never adjusted upward — substituting our judgment for the provider's would replace their evidence with our inference. The remaining four factors capture context the provider's own classification does not.

4.3 component_weight

Component classWeight
Non-customer-facing (internal admin, monitoring dashboard)0.5
Auxiliary customer-facing (console UI, billing portal)0.8
Secondary API (embeddings, moderation, fine-tuning)1.0
Primary inference endpoint (chat / completions, generate)1.3
Multi-component (≥3 distinct components, or "all components")1.5

Per-provider component classifications are maintained in the public provider registry. When a feed lists multiple affected components, the highest-weight one applies; the multi-component weight (1.5) applies when three or more components are listed.

4.4 scope_multiplier

<a id="regional-scope"></a>

Regional scopeMultiplier
Single region1.0
Multi-region, same continent1.2
Global / multi-continent1.4
Region not specified by provider1.1

The multiplier is determined in priority order, with the source of the determination surfaced on every incident page, OG image, and LinkedIn alert template (the disclosure is non-negotiable per §10.2):

  1. multi_region_probes — when our independent regional probes (§6.1) measured reachability disagreement across regions, the unreachable regions define the scope empirically. This is the strongest source because it's a measurement we made ourselves.
  2. structured — when the provider's feed explicitly identifies regions (Google Cloud incidents.json, AWS Health JSON), we use those.
  3. text_inferred — when we can extract region codes (us-east-*, eu-west-*, ap-*) from the incident title, description, or component names.
  4. unknown — when no region signal is available, the multiplier defaults to 1.1 (the "unspecified" bucket). The incident page surfaces this as "scope unspecified — see methodology limitations" rather than implying a scope claim.

4.5 duration_multiplier

duration_multiplier = clamp(0.7, 1.5, 0.7 + 0.0067 × minutes_since_started)
DurationMultiplier
0 min (formula floor)0.70
10 min0.77
30 min0.90
1 hr1.10
≥2 hr1.50 (cap engaged)

The cap at 1.5 reflects that beyond roughly two hours, individual-incident severity saturates the other factors. Extended duration past that point is captured by the count of incidents in trend data, not by individual UIS growth. UIS is recomputed every five minutes for unresolved incidents.

4.6 tier_multiplier

Model tier of the affected endpointMultiplier
Deprecated / legacy (e.g., GPT-3.5, Claude 1)0.8
General-purpose API (default when unclear)1.0
Flagship inference (GPT-5, Claude Opus 4.7, Gemini Ultra, equivalents)1.2

The flagship list is maintained in the public provider registry and refreshed quarterly as providers release new models. Refreshing the list is a configuration change, not a methodology change, and does not trigger a version bump (§9.3).

4.7 Worked examples

IncidentbasecompscopedurtierRawUISBand
Minor admin dashboard glitch, 10 min, single region250.51.00.771.09.610Informational
Major sustained chat/completions outage, US only, 90 min, flagship601.31.01.31.2121.7100Critical (clamped)
Rolling multi-component flagship incident, ~1 hr601.51.21.11.2142.6100Critical (clamped)
Major embeddings degradation, EU only, 45 min601.01.01.01.060.060Significant
Minor flagship slowdown, US, 20 min251.31.00.831.232.533Degraded

The clamp at 100 is intentional: flagship + multi-component + global is always Critical. The range across non-extreme cases gives the dashboard meaningful signal.

4.8 UIS bands

UISLabel
0-19Informational
20-49Degraded
50-79Significant
80-100Critical

4.9 What UIS does not capture

  • Quality drift — model output regressions that do not surface as errors. Requires canary prompt suites; planned for a future version.
  • Latency degradation — a slow API can be effectively unusable while UIS reads zero. From v1.1, independent latency probes are collected as a parallel observation stream (§4.10). They do not feed UIS until a future major version.
  • Customer-segment differences — without first-party telemetry, we cannot weight by customer type.
  • Silent rate-limit changes — providers who tighten limits without filing an incident are invisible to this pipeline.

4.10 Latency observation (v1.1, parallel-run)

From v1.1, ai-receipts collects an independent latency observation stream against the primary inference endpoint of each probed provider. This is a quarantined parallel-run for the first 30 days after a provider's first probe: latency observations are surfaced on the dashboard but do not feed UIS, do not trigger auto-posts, and are not used in any published uptime number. The quarantine exists so a noisy onboarding does not contaminate the credibility moat. The integration of latency into UIS (or as a sibling metric) is a separate methodology bump that follows the §9 protocol.

Probe mechanics:

  • One small inference call per provider+model every 15 minutes (10-token input, 10-token output).
  • Round-trip latency measured wall-clock from the application layer (does not isolate network, framework, or provider time).
  • Success / failure recorded with an error class on failure (timeout / http_5xx / http_4xx / parse_error / rate_limited / auth_error).
  • A failed probe records latency_ms = null and does not count toward the latency baseline; it counts toward the per-provider error-rate observation only.

Baseline:

  • Per-provider+model, the baseline is the rolling 7-day distribution of per-15-min-bucket p95 latency, excluding the trailing 60 minutes (so a current degradation does not contaminate its own baseline).
  • A bucket with fewer than 2 successful probes is dropped from the baseline (insufficient sample).
  • Reported on the dashboard as baseline_p95 (a scalar — the median across baseline buckets) and baseline_sigma (standard deviation of those bucket-p95 values).

Drift flag:

  • A 15-min observation window is flagged when its p95 latency exceeds baseline_p95 + 2 × baseline_sigma AND baseline_n ≥ 50 buckets (about 12 hours of data; below this n the flag is suppressed as low-confidence).
  • A flagged window is rendered on the dashboard with a "latency drift" tag and the percentage delta. Flagged windows do not produce posts, do not change UIS, and do not change uptime numbers during the parallel-run period.

What this exists for in v1.1:

The point of the parallel-run is to accumulate 30 days of probe data alongside the existing incident-based observation stream so the next methodology bump (v1.2 or v2.0) can specify a latency factor with calibrated thresholds, baseline windows, and integration rules informed by real data — not guesses. Until then, latency is a dashboard signal only.


5. Uptime definition

5.1 Working formula (impact-weighted incident duration)

uptime_pct(window) = max(0, 1 − Σ_incidents( (UIS_i / 100) × duration_i_minutes / window_minutes ))

Where:

  • window_minutes is the length of the reporting window (24h = 1440, 30d = 43200).
  • duration_i_minutes is the duration of incident i within the window, clipped to window edges for incidents that span the boundary.
  • UIS_i is the User-Impact Score of incident i at the time of computation.

Every published uptime number under this version is tagged "impact-weighted" to make clear that this is a proxy for true request-weighted uptime, not a direct measurement of it.

5.2 Worked examples

ScenarioCalculationResult
30-day window, zero incidents1 − 0100.00%
30-day window, one UIS-50 incident, 60 min1 − (0.50 × 60 / 43200)99.93%
30-day window, one UIS-100 incident, 4 hr1 − (1.0 × 240 / 43200)99.44%
30-day window, continuous UIS-30 degradation for 2 days1 − (0.30 × 2880 / 43200)98.00%
30-day window, rolling critical incident (UIS 100, ~3 hr)1 − (1.0 × 180 / 43200)99.58%

Numbers are calibrated to be meaningful but not screaming-headline. A perfect month reads 100%; a bad month with a single critical outage drops roughly half a percentage point; persistent degradation costs whole percentage points.

5.3 Bridge to true request-weighted uptime

Once independent probes provide actual request counts, the formula switches to the AWS Bedrock / Google SRE standard:

availability_per_interval(5min) = successful_requests / total_requests
monthly_uptime = mean(availability_per_interval) across all intervals in month

Transition protocol:

  1. Run both formulas in parallel for 30 days on the same providers.
  2. Publish the correlation coefficient and any divergence as a v1.x note.
  3. At the v2.0 bump, switch the dashboard and reports to the request-weighted formula.
  4. Apply the restatement rule (§9.1): if the switch alters ≥0.1% of historical scores, restate the prior 90 days and tag both old and new values. Older data carries v1.0 forever.
  5. Announce v2.0 in the next newsletter with the divergence data and the document hash.

5.4 Zero-traffic interval policy

When request-weighted measurement is in place and an interval has zero requests — quiet hours, a dormant provider — that interval is counted as 100% available, with no penalty. This mirrors the AWS Bedrock SLA. The alternative of excluding the interval from the denominator is rejected because it makes the formula non-monotonic with traffic volume, which is unintuitive and disputable.


6. Multi-source confirmation

A single source is not enough to call something an outage. Confirmation is tiered: stronger sources need fewer of them, weaker sources cannot confirm on their own, and sources that reflect the same underlying signal are discounted.

6.1 Source tiers

TierSourceWhy
StrongProvider's own status pageNamed, accountable, authoritative
StrongProvider's verified X account (posted ≤30 min ago)Official channel, real-time — often leads the status page by 5-15 min
MediumGitHub SDK issue surgeDirect developer-impact signal; can occasionally conflate user error with provider fault
MediumMulti-region reachability disagreementEmpirical signal (we measured it from N regions) but indirect (probes hit the status page, not the API endpoint). Source: multi_region_disagreement, derived from the region_probes table on every confirmation gate evaluation.
WeakHN front-page keyword matchHigh-visibility but a pure social signal
Weak≥3 distinct X accounts naming the provider and the issue in a 15-min windowSocial signal, prone to viral correlation

6.2 Confirmation rule

An incident reaches confirmed status — and only confirmed incidents trigger an auto-post — when any of the following is true:

  • 1 Strong source — confirmed (high confidence).
  • 2 Medium sources from genuinely distinct channels within 30 min — Reported (moderate confidence). Two Medium channels are wired at launch: github_sdk and multi_region_disagreement. When both co-fire for a single incident in the same 30-minute window the gate produces a Reported label.
  • 1 Medium + 1 Weak does not confirm. Weak sources corroborate, never confirm.
  • 2 Weak sources does not confirm (see the correlation discount in §6.3).

Confirmation is a live-detection concept, not a historical labeling concept. Historical incidents loaded from provider archives are always stored as unconfirmed regardless of how strongly the archive itself agrees. Pre-launch we were not capturing X mentions, HN threads, or SDK issue surges; claiming retroactive multi-source confirmation would be fabricated provenance. Archival rows are instead surfaced with explicit provenance flags (see §8.1).

6.3 Correlation discount

Multiple sources are not independent evidence when they reflect the same underlying signal.

  • An HN front-page match and a wave of X mentions about the same news count as one Weak source. Same audience, same upstream.
  • Multiple X mentions from accounts that follow each other count as one Weak source. The heuristic is mutual-follow overlap above 30%.
  • Different observation channels are not subject to the correlation discount across types. User-report data, developer-error data, and provider-published data are fundamentally different channels and would each count as one source if both were present. The discount applies within a channel type (e.g., HN + X mentions) where the underlying signal is the same.

6.4 Time windows

Confirmation typeWindow
Strong + Strong15 min
Medium + Medium30 min
Weak corroborationWithin 30 min of the Strong or Medium source it corroborates

6.5 Output labels

LabelTriggerBehavior
Confirmed≥1 Strong sourceAuto-posts to X (UIS ≥ 50) and LinkedIn (UIS ≥ 70)
Reported2 Mediums but no StrongAuto-posts to X if UIS ≥ 50, with a "reported by community signals" suffix. Does not auto-post to LinkedIn — LinkedIn requires Confirmed.
UnconfirmedSingle source, or only Weak corroborationShown on the dashboard, never auto-posted

6.6 Worked examples

ScenarioSources observedResult
OpenAI status page posts "investigating elevated errors"1 StrongConfirmed
OpenAI X account tweets degradation; status page silent1 StrongConfirmed
Single GitHub SDK issue surge with no Strong source1 MediumUnconfirmed (single Medium does not confirm)
HN front-page "Claude is down" plus 8 X mentions1 Weak (correlation discount)Unconfirmed
GitHub SDK surge plus HN thread plus 10 X mentions1 Medium + 1 WeakUnconfirmed
GitHub SDK surge plus multi-region disagreement within 20 min2 distinct MediumsReported (the second Medium activates the §6.2 "2 Mediums" branch)
Multi-region disagreement alone with no other signal1 MediumUnconfirmed
OpenAI status page plus 200 X mentions1 StrongConfirmed

7. Per-model claims

Per-model claims — "GPT-4o degraded 14:00-14:23 UTC" — are the most consequential and the most legally exposed category. The policy is deliberately conservative.

7.1 When a per-model claim is allowed

A per-model claim may be published only when one of the following is true:

  1. The provider's own status page or official X account names the model. ("Issue affecting GPT-4o" appears in the post body or the component name.) — a Strong source naming the model.
  2. A GitHub SDK issue surge contains the model identifier in issue titles. At least two issues filed in 30 min, each with an HTTP 5xx pattern and the model name (gpt-4o, claude-opus-4-7, gemini-2.5-pro, etc.) in the title or body. — a Medium source naming the model.

7.2 When it is not allowed, and required framing

A per-model claim is not allowed when:

  • The model is implicated only by HN, X, or Reddit chatter.
  • The provider acknowledges an API-level issue without naming the model.
  • Aggregate user-report signals are seen (these have no per-model granularity).
  • User reports anywhere.

When per-model degradation is suggested by these disallowed sources, the required framing is:

"Community reports of <model> degradation; not officially confirmed by <provider> as of <time UTC>. We are watching for stronger signal."

Never an assertion. Never a UIS score attached to a per-model claim that lacks a gate-passing source.

7.3 Why this is the right line

  • Defensibility. Both allowed gates put the assertion on the provider's own evidentiary surface — either their status page or their SDK issue tracker.
  • Reproducibility. Both gates are observable in raw public data; anyone can re-verify.
  • Citability. A journalist citing "ai-receipts confirmed GPT-4o degradation per the openai-python issue tracker" is on solid ground.
  • The cost. Some real per-model degradations that never reach a provider's official surfaces will be missed. That is acceptable.

7.4 Future evolution

Once independent probes are in place, a third gate opens: per-model probe results showing error-rate or latency degradation for a specific endpoint. That becomes the dominant confirmation source — first-party measurement — and is documented at the next major version bump.


8. Known limitations

Being explicit about what we cannot measure is the credibility move.

  • Per-model granularity. Provider status pages report at the API or service level ("API operational"), not per model. §7 closes part of this gap; independent probes will close it fully.
  • Regional granularity. Only AWS and Google Cloud expose per-region incidents in their feeds. Anthropic, Mistral, and most others have global endpoints with no per-region feed. We do independently measure per-region reachability from three Vercel-pinned regions (iad1, sfo1, fra1) every 15 minutes — when those probes disagree, we surface the scope empirically (§4.4, §6.1). What we cannot publish for non-feed-disclosing providers is per-region UPTIME, since uptime requires durational data the providers don't expose; reachability disagreement at probe-time is the closest substitute.
  • Severity sandbagging. Providers historically classify conservatively. We do not unilaterally adjust upward; UIS is computed from declared severity plus observable factors (duration, scope), never from inference about what the provider "should have" classified.
  • Backfill ceiling per provider. Some providers cap history at roughly 20 incidents. The full incidents API extends this but does not eliminate the cap. We document the per-provider history ceiling and never claim coverage beyond what the source provides. DeepSeek in particular publishes RSS only (no /api/v2/incidents.json), so backfill is capped at ~20 most recent items.
  • Providers evaluated and deferred from V0. Pre-launch we evaluated xAI and OpenRouter for V0 and deferred both. xAI's status page (status.x.ai) is Cloudflare-bot-blocked to server-side fetchers; OpenRouter publishes only current state with no machine-readable incident history. Both will be revisited when a parseable feed becomes available. We do not list either on the dashboard since "no data" rows muddy the brand promise.
  • Bespoke feeds may drift. Provider schemas can change without notice; schema-drift alerts route to internal monitoring so the parser is updated.
  • Latency and quality drift are not measured today. A slow but error-free API has UIS 0 under this version. Independent probes will add latency observation.
  • No customer-segment differentiation. Enterprise-tier outages and free-tier outages register the same.
  • Silent rate-limit changes are invisible. Providers who tighten rate limits without filing an incident are not visible to this pipeline.

8.1 Historical record provenance

Historical incidents loaded from provider archives are subject to the same UIS formula as live incidents, but they are surfaced with explicit provenance discipline so readers can tell apart "we measured this in real time with multi-source confirmation" from "we re-derived this from the provider's public archive after the fact."

ProvenanceConfirmationProvenance tagConfidenceDashboard rendering
Live ingestionper §6 gatenonenoneStandard rendering with Confirmed / Reported / Unconfirmed label
Historical from provider APIalways unconfirmedprovider archivehighReduced opacity, "historical record" label
Historical from web archivealways unconfirmedweb archivemediumReduced opacity, "historical record" label

Aggregates that combine live and historical data declare the mix in any published number (for example, "30-day impact-weighted uptime, 12 live-confirmed incidents and 3 historical records from the provider archive"). The dashboard offers a "live-only" filter so any aggregate can be replicated without historical data.


9. Versioning protocol

9.1 Semver and the restatement rule

  • Versions follow semver: v1.0, v1.1 (additive non-breaking refinement), v2.0 (formula change that may restate numbers).
  • Every version change is recorded with version, effective date, the sha256 hash of this document at that version, a changes summary, and a link to the newsletter post announcing it.
  • Restatement threshold: 0.1% of historical scores. Before merging a new version, the new formula is run against the prior 90 days of incidents. If at least 0.1% of scores change per-incident, those incidents are restated at the new version and both old and new values are tagged in a public audit table. If fewer than 0.1% change, the new version applies only to incidents going forward.
  • Older data carries its prior version forever. No silent restatement.
  • Every version bump is announced — even non-restating ones — with a Methodology Notes section in the next weekly newsletter, including the document hash and a one-paragraph plain-English diff.

9.2 What triggers a version bump

  • Adding a new factor to the UIS formula → major version.
  • Changing weights in an existing UIS factor → minor version if the expected impact is below 0.1% of historical scores, major otherwise.
  • Adding a new confirmation source → minor version.
  • Changing the confirmation rule itself → major version.
  • Adding a new per-model claim gate → minor version.

9.3 What does not trigger a version bump

  • Adding a new provider to the registry. This is configuration, not methodology.
  • Updating model-tier classifications as providers release new flagship models. Configuration, not methodology.
  • Fixing parser bugs. The original payload is preserved; re-derived values use the current methodology version.

9.4 Announcement protocol

Every version bump produces three artifacts:

  1. A Methodology Notes section in the next weekly newsletter with the version, effective date, document hash, plain-English diff, and (if restating) a link to the audit table for the restated incidents.
  2. A versioned methodology page showing the diff against the prior version.
  3. For restatements, an audit entry per restated incident showing old and new values.

10. Claims policy

10.1 What we publish

  • Per-provider UIS scores for any Confirmed or Reported incident.
  • Per-provider impact-weighted uptime over 24-hour, 7-day, 30-day, 90-day, and quarterly windows.
  • Cross-provider comparison tables with full methodology citation.
  • Per-model claims when the §7.1 gate passes.
  • Time-to-acknowledgment claims ("provider's status page acknowledged X minutes after our detection") backed by confirmation timestamps.
  • Multi-source confirmation timing as a data-credibility signal ("Confirmed by two sources within 14 minutes").
  • Quarterly trend analysis ("Provider X impact-weighted uptime improved from 99.41% in Q1 to 99.67% in Q2").
  • Cross-provider correlation observations ("Anthropic and Mistral both flagged inference issues in the same 6-hour window — possible shared dependency").
  • The open dataset.

10.2 What we do not publish

  • Latency or output-quality claims — no independent probe data yet.
  • Per-region uptime for providers that do not expose per-region feeds. (Per-region reachability IS surfaced via the multi-region probe stream §6.1, but reachability ≠ uptime — see §8.)
  • Customer-segment differentiated impact — no data.
  • Root-cause attribution beyond what the provider stated. The sourcing standard requires the provider's own root-cause statement.
  • Predictions ("next outage likely within 30 days"). That is not what this is.
  • Comparative rankings in any post under 1,000 words of context.
  • Per-model claims that fail the §7.1 gate. The "community reports" framing is the only allowed alternative.
  • Methodology-version-mismatched aggregates. When a quarter spans a version bump, computations are performed separately per version, or restated per §9.1 if the change crossed the 0.1% threshold.

11. Replication

11.1 Download the dataset

CSV and JSON exports of the full dataset are available at /data. The exports include:

  • incidents.csv — every incident with the computed impact score, the methodology version that produced it, and the provenance flags described in §8.1.
  • incidents.json — the same rows plus the original provider payload, sufficient to re-derive every published number under any methodology version.

11.2 Re-derive a published UIS

The reference implementation is in the public repository under the methodology module. Each incident's UIS is the multiplicative composite defined in §4.1:

UIS = clamp(0, 100,
  base_severity            // §4.2
  × component_weight       // §4.3
  × scope_multiplier       // §4.4
  × duration_multiplier    // §4.5
  × tier_multiplier        // §4.6
)

The computed value must match the score in the dataset row for the same incident at the same methodology version. If it does not, file an issue.

11.3 Dispute a number

File an issue on the public repository with:

  1. The dataset row (incident id and methodology version).
  2. Your re-derivation steps.
  3. The number you computed versus the number we published.
  4. Optionally, the methodology section you believe is ambiguous.

Methodology errata are published prominently in the next weekly newsletter. We do not silently fix disputed numbers.


12. Version history

VersionEffectiveChanges
v1.02026-05-26First locked methodology, shipped at launch. UIS multiplicative formula with five published weights (base, component, scope, duration, tier), clamped 0-100. Impact-weighted uptime as a working proxy for request-weighted until full V1 active-region probing provides real request counts. Tiered Strong, Medium, Weak multi-source confirmation with correlation discount and a 30-minute detection window. Per-model claims gated on the provider naming the model on its own status page or X account, OR on ≥2 GitHub SDK issues filed within 30 minutes naming the model with an HTTP 5xx pattern. Restatement threshold 0.1%. Live confirmation gate excludes archival rows by design. Sources inventory: 2 Strong (provider status page, provider X account) + 2 Medium (GitHub SDK issue surge, multi-region reachability disagreement) + 2 Weak (HN keyword, X mentions). Multi-region passive reachability probes run every 15 minutes from three Vercel-pinned regions (iad1, sfo1, fra1) against each provider's status page; regional disagreement feeds both the §6.1 confirmation gate and the §4.4 scope_multiplier source-attribution layer. Includes the latency observation parallel-run on §4.10 as a dashboard signal only, not used in UIS or auto-posts. Mistral status page parsed from the Nuxt SSR payload on status.mistral.ai. AWS Bedrock fans out per-region RSS feeds as supplementary ingestion under one provider_id, deduped on the composite incident PK. Provider list adjusted pre-launch: xAI and OpenRouter deferred pending parseable feeds; DeepSeek added (Flashcat-hosted, Statuspage-compatible RSS at status.deepseek.com/history.rss). 14 active providers total at launch.

13. License

  • This document: CC-BY-4.0.
  • Reference implementation: MIT.
  • Public dataset: CC-BY-4.0.

Reproduce any published number by combining the three. If your reproduction differs, file an issue.