Estoppl Score Methodology
Version: v0.1.0 Status: Published. Stub implementation — production weights are seeded estimates pending corpus validation. Audience: Insurance carrier data science teams, enterprise CISO data teams, Estoppl engineering. Scope: Defines how the Estoppl Score and three subscores are computed from telemetry, how identity propagates across agents, and how events decay.
1. Design principles
The Estoppl Score predicts the probability that an AI agent will be involved in a measurable incident over a forward-looking 90-day window, normalized to 0-1000. It is intentionally:
- Open and auditable. Every input, weight, and rule in this document is published. No proprietary scoring black box. Any party (CISO, insurer, deployer) can re-implement the verifier from this spec.
- AARM-conformant. Inputs are drawn from AARM v1.x receipt fields. Any AARM-conformant verifier can read the underlying telemetry.
- Anti-gaming by design. Several common pitfalls (rewarding low usage, rewarding new identity rotation, rewarding long time-in-production) are deliberately structured out. See §8.
- Decaying, not permanent. No event affects the score forever. Maximum 10-year decay. See §5.
2. The three subscores
Each subscore is an integer in [0, 100]. The overall Estoppl Score is a weighted combination (§3).
2.1 Governance Discipline
Did the operator follow its own declared governance controls?
| Input | Symbol | Direction | v0.1.0 implemented? |
|---|---|---|---|
| HITL bypass rate, last 30d | hitl_bypass_rate_30d | Lower is better | partial — proxy for HITL volume only |
| Policy evaluation coverage, last 30d | policy_eval_coverage_30d | Higher is better | no |
| Evidence chain continuity (intact prev_hash linkage) | chain_continuity | Higher is better | no — assumed 1.0 |
| HITL response p95 latency, seconds | hitl_response_p95s_30d | Lower is better | no |
| Active policy version age, days | policy_version_age_days | Lower is better | no |
| Proxy uptime, last 30d (fraction of expected sync windows) | proxy_uptime_30d | Higher is better | no |
v0.1.0 stub formula (implemented in internal/api/standing.go):
hitl_rate = HumanRequiredEvents / TotalEvents
if hitl_rate < 0.001:
governance_discipline = 70 # suspiciously low — HITL likely not configured
elif hitl_rate > 0.5:
governance_discipline = 60 # suspiciously high — policy likely misconfigured
else:
governance_discipline = 95
v1.0 target formula:
governance_discipline = clamp(0, 100,
100
- 50 * hitl_bypass_rate_30d # bypass is the worst signal
- 30 * (1 - policy_eval_coverage_30d)
- 20 * (1 - chain_continuity)
- 5 * sigmoid((hitl_response_p95s_30d - 600) / 600) # > 10 min response
- 5 * sigmoid((policy_version_age_days - 90) / 90)
- 10 * (1 - proxy_uptime_30d)
)
2.2 Scope Adherence
Did the agent's actual behavior match the operator's declared scope manifest?
| Input | Symbol | Direction | v0.1.0 implemented? |
|---|---|---|---|
| Scope drift events, last 30d (calls outside declared manifest) | scope_drift_count_30d | Lower is better | no — proxy is block_rate |
| State-transition anomaly count, last 30d (privilege-escalation sequences) | state_anomaly_count_30d | Lower is better | no |
| Operator-declared scope manifest age, days | manifest_age_days | Lower is better | no |
| Block rate, last 30d (fraction of calls blocked by policy) | block_rate_30d | Lower is better (above zero) | yes |
| Unauthorized credential use count, last 30d | unauth_credential_count_30d | Lower is better | no |
| Tool diversity outside declared manifest, last 30d | tools_outside_manifest_30d | Lower is better | no |
v0.1.0 stub formula:
block_rate = BlockedEvents / TotalEvents
if block_rate > 0.30:
scope_adherence = 60
elif block_rate > 0.10:
scope_adherence = 80
else:
scope_adherence = 90
v1.0 target formula:
scope_adherence = clamp(0, 100,
100
- 15 * scope_drift_count_30d
- 25 * state_anomaly_count_30d # privilege escalation is worst
- 5 * sigmoid((manifest_age_days - 180) / 90) # stale manifests are suspicious
- 30 * sigmoid((block_rate_30d - 0.20) / 0.10) # high blocks suggest persistent scope drift attempts
- 20 * unauth_credential_count_30d
- 10 * tools_outside_manifest_30d
)
2.3 Anomaly Load
Statistical anomalies in agent behavior that don't fit the agent's own historical baseline.
| Input | Symbol | Direction | v0.1.0 implemented? |
|---|---|---|---|
| Decision volume, last 30d (used for normalization, NOT as a reward) | volume_30d | Neutral (denominator only) | yes |
| Volume z-score vs trailing-90d baseline | volume_z90 | Lower is better (above 2σ) | no |
| Tool diversity z-score vs trailing-90d baseline | tool_div_z90 | Lower is better | no |
| Time-of-day anomaly count, last 30d | tod_anomaly_count_30d | Lower is better | no |
| Lifetime incident count, decay-adjusted | incidents_lifetime_decayed | Lower is better | no |
| Upstream latency p95 anomaly count, last 30d | latency_anomaly_count_30d | Lower is better | no |
v0.1.0 stub formula:
anomaly_load = 90 # constant baseline — no anomaly detection in v0.1.0
v1.0 target formula:
anomaly_load = clamp(0, 100,
100
- 25 * sigmoid((volume_z90 - 2) / 1)
- 15 * sigmoid((tool_div_z90 - 2) / 1)
- 10 * sigmoid((tod_anomaly_count_30d - 5) / 5)
- 30 * sigmoid((incidents_lifetime_decayed - 3) / 2)
- 10 * sigmoid((latency_anomaly_count_30d - 5) / 5)
)
The volume_30d input is included as a denominator (anomalies are normalized per-volume) but never as a positive contributor. This deliberately blocks the "hide your usage to look clean" gaming strategy (see §8).
3. Overall score computation
The Estoppl Score is a fixed-weight linear combination of the three subscores, scaled to 0-1000.
overall_score = round(
governance_discipline * 0.35 +
scope_adherence * 0.35 +
anomaly_load * 0.30
) * 10
Weights are immutable per methodology version. Changes require a version bump and 60-day insurer notice (§6).
Score bands (rendered in the certificate's score_band field):
| Range | Band | Recommended downstream action |
|---|---|---|
| 800-1000 | low_risk | Standard processing |
| 500-799 | medium_risk | Heightened review; consider additional controls |
| 0-499 | high_risk | Block or escalate |
| Any (TotalEvents == 0) | no_history | Conservative defaults; do not assume low_risk |
no_history is structurally distinct from high_risk. Both produce conservative downstream defaults, but for opposite reasons (insufficient data vs. evidence of problems). Insurance carriers should treat them differently in pricing.
4. Anti-Sybil identity propagation
A naive scoring system creates a "rotate the agent identity to reset the score" gaming opportunity. We block this with operator-level propagation.
4.1 Identity model
operator_id ──┬── agent_id_1 (current)
├── agent_id_2 (current)
└── agent_id_3 (retired, but score history retained)
Every agent registers under an operator_id (an Estoppl-issued UUID derived from the operator's verified business identity at signup). The operator_id is persistent and cannot be self-rotated.
4.2 Penalty propagation
Adverse events on any agent under an operator propagate to the operator-level reputation:
operator_penalty_score = max(
individual_agent_penalties,
sum(individual_agent_penalties) * 0.4
)
The first term ensures a single bad agent's penalty fully applies. The second term ensures multiple bad agents under one operator compound (40% of their sum, to avoid double-counting tightly-correlated incidents).
4.3 What this means in practice
- A new agent registered under a clean operator inherits the operator's full reputation (no zero-history penalty).
- A new agent registered under an operator with a recent incident inherits the propagated penalty until decay (§5) reduces it.
- A new operator (no prior identity) receives the
no_historyband — notlow_risk. They have to earn the score, not get it for free.
4.4 Death certificate event
The single most punitive event is self-report falsification — the operator reports action A, Estoppl-attested telemetry shows action B. This:
- Triggers a hard
score = 0for the originating agent for 30 days. - Sets
operator_penalty_score += 50permanently capped to a 10-year decay window with a 3-year half-life (slower than other events; see §5). - Flags the operator's record with a
falsification_event_countthat surfaces in the evidence pack indefinitely (until decayed below 1.0).
There is no "permanent ban" — but the recovery cost is high enough that the cheaper rational response is honest disclosure of the underlying issue. This mirrors how human credit bureaus handle confirmed fraud: severely punitive, but not eternal.
5. Decay rules
All score-affecting events decay over time.
5.1 Standard decay
For a generic adverse event with raw weight w₀:
w(t) = w₀ * exp(-ln(2) * t / 365) # 1-year half-life, days
After ~10 years (~3650 days), w(t) ≈ w₀ / 1024 — effectively zero. Events older than 10 years are dropped from the input set entirely (computational simplification, not a methodology change).
5.2 Falsification decay (slower)
For self-report falsification events (§4.4):
w(t) = w₀ * exp(-ln(2) * t / (3 * 365)) # 3-year half-life
Cap at 10 years like all other events.
5.3 No event is permanent
Maximum decay window is 10 years for every event type, including falsification. This is a deliberate design choice to:
- Avoid Sybil-incentive cliff (where the rational move is to spin up a new operator after some time anyway)
- Match human credit bureau practice (Chapter 7 bankruptcy falls off in 7-10 years)
- Allow operators to genuinely improve over time
6. Versioning and update cadence
6.1 Versioning
Methodology versions follow vMAJOR.MINOR.PATCH:
- PATCH (
v0.1.0 → v0.1.1): bug fixes, no weight changes, no input additions or removals. No notice required. - MINOR (
v0.1.x → v0.2.0): weight adjustments within existing inputs OR addition of new inputs (additive only). 60-day notice to insurer integrators. - MAJOR (
v0.x → v1.0): structural change (subscore restructuring, removal of inputs, formula change). 90-day notice + parallel-run period (old + new versions both queryable).
6.2 Insurer change-management
Active insurer integrations are notified via:
- Email to the integration's registered contact
- Banner in the queryable certificate response (
methodology_change_noticefield, set 60+ days before activation) - Deprecation header in the API response (
Deprecation: <RFC9745 date>)
The previous methodology version remains queryable via ?methodology_version=v0.1.0 for 12 months after a new MINOR ships, and 24 months after a MAJOR.
6.3 Annual re-weighting
Beginning v1.0, weights are re-evaluated annually based on the prior 12 months of corpus data and observed incident outcomes. The re-evaluation produces a published validation backtest report.
7. How to verify / integrate
7.1 CISO independent verification
A CISO with a Standing Certificate JSON can verify the score is internally consistent without trusting Estoppl's cloud:
- Fetch the public key for
public_key_idfromhttps://api.estoppl.ai/.well-known/jwks.json(TODO STD.4). - Verify the Ed25519 signature over the canonical JSON of all certificate fields except
signature(TODO STD.4). - Walk the evidence chain (linked from
evidence_url) usingestoppl verify-certificate(TODO STD.4). - Re-compute the subscores from the evidence drill-down (TODO TRY.1) by applying the formulas in §2 to the published inputs.
If steps 1-3 succeed and step 4 produces the same subscore values as in the certificate, the score is internally consistent.
7.2 Insurer integration
Insurance carriers integrate the score as an underwriting input by:
- Querying
GET /v1/standing/{deployer_id}at quote time and renewal. - Mapping
score_bandto the carrier's pricing tiers. - Optionally drilling into
subscoresto apply carrier-specific weights (e.g., a carrier may weightgovernance_disciplinemore heavily than the published 35%). - Optionally querying
GET /v1/standing/{deployer_id}/evidence(TODO TRY.1) for incident-level drill-down used in claims-handling.
The published subscore values are the carrier's contractual signal; raw inputs are advisory. A carrier may not apply a re-weighting that contradicts the published methodology without renegotiating their data-feed contract.
8. Anti-patterns we deliberately avoid
We have explicitly structured the methodology to prevent the following gaming strategies:
| Anti-pattern | How we block it |
|---|---|
| "Hide your usage to look clean" — agent reduces tool calls to lower the chance of incidents | Volume is a denominator, never a positive contributor. Subscores are rates and z-scores, not raw counts. |
| "Rotate identities to reset the score" — operator spins up a new agent_id after a bad incident | Operator-level propagation (§4). New agent inherits operator penalty. |
"New operator gets a free perfect score" — fresh operator registers, gets low_risk immediately | New operators get no_history band, not low_risk. Insurance carriers treat the two distinctly. |
| "Old agent in production with many incidents outscores young clean agent" | No formula contains a time_in_production divisor or a 1/incidents_per_year term. Decay (§5) reduces old penalties, but never rewards age in absolute terms. |
| "Self-report falsification to hide bad behavior" | Death certificate event (§4.4): hard zero for 30d + slow-decay operator penalty + permanent flag in evidence pack until decayed below 1.0. |
| "Game the policy threshold" — operator sets policy thresholds artificially low so blocks never trigger, looking compliant on paper | manifest_age_days and policy_eval_coverage (v1.0 inputs) penalize stale or skipped policy evaluations. CISO drill-down surfaces thresholds. |
9. v0.1.0 stub vs v1.0 target — honest gap analysis
The v0.1.0 implementation in internal/api/standing.go is a publishable stub. It computes a real score from real telemetry, but uses a small subset of the v1.0 input set:
| Subscore | v0.1.0 inputs (count) | v1.0 target inputs (count) | Confidence |
|---|---|---|---|
| Governance Discipline | 1 (HITL rate proxy) | 6 | Low — directionally correct, magnitudes uncalibrated |
| Scope Adherence | 1 (block rate) | 6 | Low — same |
| Anomaly Load | 0 (constant 90) | 6 | None — placeholder until anomaly detection ships |
Decay rules (§5) and identity propagation (§4) are specification-only in v0.1.0 — the runtime does not yet apply them. They are documented now so insurer integrators can plan for them; the implementation roadmap is in §10.
What this means for early users:
- Insurance carriers in pre-revenue research mode can review the methodology and design-partner the integration. They should NOT use v0.1.0 scores as a contractual underwriting input.
- Deployers and their customers (CISOs) can use v0.1.0 scores as a directional signal in security review. The
methodology_versionfield in the certificate is honest about the maturity. - The
aarm_conformancefield isaligned_extended_review_pendingin v0.1.0 — formal AARM Extended conformance review (CSA) is in flight (TODO).
10. Roadmap
| Version | Target | Major changes |
|---|---|---|
| v0.1.x | NOW | Stub. Three subscores, simplified formulas, seeded weights. |
| v0.2.x | NEXT (months 3-6) | Add scope_drift_count, state_anomaly_count, manifest_age_days. Implement chain_continuity input. Publish first validation backtest against the four major 2026 incidents (Meta, McKinsey Lilli, Mercor/LiteLLM, Step Finance). |
| v0.3.x | THEN (months 6-9) | Implement operator-level identity propagation (§4). Implement decay rules (§5). |
| v1.0 | LATER (months 9-15) | Re-weight all subscores against accumulated 6-12 months of corpus data. Annual revision cycle begins. Vertical-specific subscore variants (FS / healthcare / federal) ship as v1.0+x. |
Appendix A: Field reference
Every input symbol used in this methodology, with its data source.
| Symbol | Type | Source | Aggregation window |
|---|---|---|---|
hitl_bypass_rate_30d | float [0, 1] | events table where policy_decision='HUMAN_REQUIRED' AND review status='timeout_proceed' | 30 days |
hitl_rate_30d (v0.1.0 proxy) | float [0, 1] | events table where policy_decision='HUMAN_REQUIRED' / total | 30 days |
policy_eval_coverage_30d | float [0, 1] | events table where policy_decision IS NOT NULL / total ingested events | 30 days |
chain_continuity | float [0, 1] | computed via internal/chain.WalkSegment | last sync |
hitl_response_p95s_30d | int seconds | reviews table, decided_at - requested_at p95 | 30 days |
policy_version_age_days | int days | policies table, now() - max(activated_at) | snapshot |
proxy_uptime_30d | float [0, 1] | events table, fraction of expected 5-min sync windows present | 30 days |
scope_drift_count_30d | int | events table where tool_name NOT IN declared manifest | 30 days |
state_anomaly_count_30d | int | computed from state-transition graph (NEXT) | 30 days |
manifest_age_days | int days | manifests table, now() - max(updated_at) | snapshot |
block_rate_30d | float [0, 1] | events table where policy_decision='BLOCK' / total | 30 days |
unauth_credential_count_30d | int | events table where authorizing_credential IS NULL AND tool requires credential | 30 days |
tools_outside_manifest_30d | int | DISTINCT count of tool_name where tool NOT IN declared manifest | 30 days |
volume_30d | int | total event count | 30 days |
volume_z90 | float | z-score of volume_30d vs trailing 90d mean/std | 30 + 90 days |
tool_div_z90 | float | z-score of unique tool count vs trailing 90d | 30 + 90 days |
tod_anomaly_count_30d | int | events outside operator-declared operating hours | 30 days |
incidents_lifetime_decayed | float | sum of all incident events with §5 decay applied | lifetime |
latency_anomaly_count_30d | int | events with actual_latency_ms > p99(operator's baseline) | 30 days |
Appendix B: Change log
| Version | Date | Changes |
|---|---|---|
| v0.1.0 | 2026-05-10 | Initial publication. Stub implementation. |