Estoppl Score Methodology

Version: v0.1.0 Status: Published. Stub implementation — production weights are seeded estimates pending corpus validation. Audience: Insurance carrier data science teams, enterprise CISO data teams, Estoppl engineering. Scope: Defines how the Estoppl Score and three subscores are computed from telemetry, how identity propagates across agents, and how events decay.

1. Design principles

The Estoppl Score predicts the probability that an AI agent will be involved in a measurable incident over a forward-looking 90-day window, normalized to 0-1000. It is intentionally:

Open and auditable. Every input, weight, and rule in this document is published. No proprietary scoring black box. Any party (CISO, insurer, deployer) can re-implement the verifier from this spec.
AARM-conformant. Inputs are drawn from AARM v1.x receipt fields. Any AARM-conformant verifier can read the underlying telemetry.
Anti-gaming by design. Several common pitfalls (rewarding low usage, rewarding new identity rotation, rewarding long time-in-production) are deliberately structured out. See §8.
Decaying, not permanent. No event affects the score forever. Maximum 10-year decay. See §5.

2. The three subscores

Each subscore is an integer in [0, 100]. The overall Estoppl Score is a weighted combination (§3).

2.1 Governance Discipline

Did the operator follow its own declared governance controls?

Input	Symbol	Direction	v0.1.0 implemented?
HITL bypass rate, last 30d	`hitl_bypass_rate_30d`	Lower is better	partial — proxy for HITL volume only
Policy evaluation coverage, last 30d	`policy_eval_coverage_30d`	Higher is better	no
Evidence chain continuity (intact prev_hash linkage)	`chain_continuity`	Higher is better	no — assumed 1.0
HITL response p95 latency, seconds	`hitl_response_p95s_30d`	Lower is better	no
Active policy version age, days	`policy_version_age_days`	Lower is better	no
Proxy uptime, last 30d (fraction of expected sync windows)	`proxy_uptime_30d`	Higher is better	no

v0.1.0 stub formula (implemented in internal/api/standing.go):

hitl_rate = HumanRequiredEvents / TotalEvents

if hitl_rate < 0.001:
    governance_discipline = 70   # suspiciously low — HITL likely not configured
elif hitl_rate > 0.5:
    governance_discipline = 60   # suspiciously high — policy likely misconfigured
else:
    governance_discipline = 95

v1.0 target formula:

governance_discipline = clamp(0, 100,
    100
  - 50 * hitl_bypass_rate_30d                   # bypass is the worst signal
  - 30 * (1 - policy_eval_coverage_30d)
  - 20 * (1 - chain_continuity)
  -  5 * sigmoid((hitl_response_p95s_30d - 600) / 600)   # > 10 min response
  -  5 * sigmoid((policy_version_age_days - 90) / 90)
  - 10 * (1 - proxy_uptime_30d)
)

2.2 Scope Adherence

Did the agent's actual behavior match the operator's declared scope manifest?

Input	Symbol	Direction	v0.1.0 implemented?
Scope drift events, last 30d (calls outside declared manifest)	`scope_drift_count_30d`	Lower is better	no — proxy is `block_rate`
State-transition anomaly count, last 30d (privilege-escalation sequences)	`state_anomaly_count_30d`	Lower is better	no
Operator-declared scope manifest age, days	`manifest_age_days`	Lower is better	no
Block rate, last 30d (fraction of calls blocked by policy)	`block_rate_30d`	Lower is better (above zero)	yes
Unauthorized credential use count, last 30d	`unauth_credential_count_30d`	Lower is better	no
Tool diversity outside declared manifest, last 30d	`tools_outside_manifest_30d`	Lower is better	no

v0.1.0 stub formula:

block_rate = BlockedEvents / TotalEvents

if block_rate > 0.30:
    scope_adherence = 60
elif block_rate > 0.10:
    scope_adherence = 80
else:
    scope_adherence = 90

v1.0 target formula:

scope_adherence = clamp(0, 100,
    100
  - 15 * scope_drift_count_30d
  - 25 * state_anomaly_count_30d                 # privilege escalation is worst
  -  5 * sigmoid((manifest_age_days - 180) / 90) # stale manifests are suspicious
  - 30 * sigmoid((block_rate_30d - 0.20) / 0.10) # high blocks suggest persistent scope drift attempts
  - 20 * unauth_credential_count_30d
  - 10 * tools_outside_manifest_30d
)

2.3 Anomaly Load

Statistical anomalies in agent behavior that don't fit the agent's own historical baseline.

Input	Symbol	Direction	v0.1.0 implemented?
Decision volume, last 30d (used for normalization, NOT as a reward)	`volume_30d`	Neutral (denominator only)	yes
Volume z-score vs trailing-90d baseline	`volume_z90`	Lower is better (above 2σ)	no
Tool diversity z-score vs trailing-90d baseline	`tool_div_z90`	Lower is better	no
Time-of-day anomaly count, last 30d	`tod_anomaly_count_30d`	Lower is better	no
Lifetime incident count, decay-adjusted	`incidents_lifetime_decayed`	Lower is better	no
Upstream latency p95 anomaly count, last 30d	`latency_anomaly_count_30d`	Lower is better	no

v0.1.0 stub formula:

anomaly_load = 90   # constant baseline — no anomaly detection in v0.1.0

v1.0 target formula:

anomaly_load = clamp(0, 100,
    100
  - 25 * sigmoid((volume_z90 - 2) / 1)
  - 15 * sigmoid((tool_div_z90 - 2) / 1)
  - 10 * sigmoid((tod_anomaly_count_30d - 5) / 5)
  - 30 * sigmoid((incidents_lifetime_decayed - 3) / 2)
  - 10 * sigmoid((latency_anomaly_count_30d - 5) / 5)
)

The volume_30d input is included as a denominator (anomalies are normalized per-volume) but never as a positive contributor. This deliberately blocks the "hide your usage to look clean" gaming strategy (see §8).

3. Overall score computation

The Estoppl Score is a fixed-weight linear combination of the three subscores, scaled to 0-1000.

overall_score = round(
    governance_discipline * 0.35 +
    scope_adherence       * 0.35 +
    anomaly_load          * 0.30
) * 10

Weights are immutable per methodology version. Changes require a version bump and 60-day insurer notice (§6).

Score bands (rendered in the certificate's score_band field):

Range	Band	Recommended downstream action
800-1000	`low_risk`	Standard processing
500-799	`medium_risk`	Heightened review; consider additional controls
0-499	`high_risk`	Block or escalate
Any (TotalEvents == 0)	`no_history`	Conservative defaults; do not assume `low_risk`

no_history is structurally distinct from high_risk. Both produce conservative downstream defaults, but for opposite reasons (insufficient data vs. evidence of problems). Insurance carriers should treat them differently in pricing.

4. Anti-Sybil identity propagation

A naive scoring system creates a "rotate the agent identity to reset the score" gaming opportunity. We block this with operator-level propagation.

4.1 Identity model

operator_id  ──┬── agent_id_1 (current)
               ├── agent_id_2 (current)
               └── agent_id_3 (retired, but score history retained)

Every agent registers under an operator_id (an Estoppl-issued UUID derived from the operator's verified business identity at signup). The operator_id is persistent and cannot be self-rotated.

4.2 Penalty propagation

Adverse events on any agent under an operator propagate to the operator-level reputation:

operator_penalty_score = max(
    individual_agent_penalties,
    sum(individual_agent_penalties) * 0.4
)

The first term ensures a single bad agent's penalty fully applies. The second term ensures multiple bad agents under one operator compound (40% of their sum, to avoid double-counting tightly-correlated incidents).

4.3 What this means in practice

A new agent registered under a clean operator inherits the operator's full reputation (no zero-history penalty).
A new agent registered under an operator with a recent incident inherits the propagated penalty until decay (§5) reduces it.
A new operator (no prior identity) receives the no_history band — not low_risk. They have to earn the score, not get it for free.

4.4 Death certificate event

The single most punitive event is self-report falsification — the operator reports action A, Estoppl-attested telemetry shows action B. This:

Triggers a hard score = 0 for the originating agent for 30 days.
Sets operator_penalty_score += 50 permanently capped to a 10-year decay window with a 3-year half-life (slower than other events; see §5).
Flags the operator's record with a falsification_event_count that surfaces in the evidence pack indefinitely (until decayed below 1.0).

There is no "permanent ban" — but the recovery cost is high enough that the cheaper rational response is honest disclosure of the underlying issue. This mirrors how human credit bureaus handle confirmed fraud: severely punitive, but not eternal.

5. Decay rules

All score-affecting events decay over time.

5.1 Standard decay

For a generic adverse event with raw weight w₀:

w(t) = w₀ * exp(-ln(2) * t / 365)        # 1-year half-life, days

After ~10 years (~3650 days), w(t) ≈ w₀ / 1024 — effectively zero. Events older than 10 years are dropped from the input set entirely (computational simplification, not a methodology change).

5.2 Falsification decay (slower)

For self-report falsification events (§4.4):

w(t) = w₀ * exp(-ln(2) * t / (3 * 365))   # 3-year half-life

Cap at 10 years like all other events.

5.3 No event is permanent

Maximum decay window is 10 years for every event type, including falsification. This is a deliberate design choice to:

Avoid Sybil-incentive cliff (where the rational move is to spin up a new operator after some time anyway)
Match human credit bureau practice (Chapter 7 bankruptcy falls off in 7-10 years)
Allow operators to genuinely improve over time

6. Versioning and update cadence

6.1 Versioning

Methodology versions follow vMAJOR.MINOR.PATCH:

PATCH (v0.1.0 → v0.1.1): bug fixes, no weight changes, no input additions or removals. No notice required.
MINOR (v0.1.x → v0.2.0): weight adjustments within existing inputs OR addition of new inputs (additive only). 60-day notice to insurer integrators.
MAJOR (v0.x → v1.0): structural change (subscore restructuring, removal of inputs, formula change). 90-day notice + parallel-run period (old + new versions both queryable).

6.2 Insurer change-management

Active insurer integrations are notified via:

Email to the integration's registered contact
Banner in the queryable certificate response (methodology_change_notice field, set 60+ days before activation)
Deprecation header in the API response (Deprecation: <RFC9745 date>)

The previous methodology version remains queryable via ?methodology_version=v0.1.0 for 12 months after a new MINOR ships, and 24 months after a MAJOR.

6.3 Annual re-weighting

Beginning v1.0, weights are re-evaluated annually based on the prior 12 months of corpus data and observed incident outcomes. The re-evaluation produces a published validation backtest report.

7. How to verify / integrate

7.1 CISO independent verification

A CISO with a Standing Certificate JSON can verify the score is internally consistent without trusting Estoppl's cloud:

Fetch the public key for public_key_id from https://api.estoppl.ai/.well-known/jwks.json (TODO STD.4).
Verify the Ed25519 signature over the canonical JSON of all certificate fields except signature (TODO STD.4).
Walk the evidence chain (linked from evidence_url) using estoppl verify-certificate (TODO STD.4).
Re-compute the subscores from the evidence drill-down (TODO TRY.1) by applying the formulas in §2 to the published inputs.

If steps 1-3 succeed and step 4 produces the same subscore values as in the certificate, the score is internally consistent.

7.2 Insurer integration

Insurance carriers integrate the score as an underwriting input by:

Querying GET /v1/standing/{deployer_id} at quote time and renewal.
Mapping score_band to the carrier's pricing tiers.
Optionally drilling into subscores to apply carrier-specific weights (e.g., a carrier may weight governance_discipline more heavily than the published 35%).
Optionally querying GET /v1/standing/{deployer_id}/evidence (TODO TRY.1) for incident-level drill-down used in claims-handling.

The published subscore values are the carrier's contractual signal; raw inputs are advisory. A carrier may not apply a re-weighting that contradicts the published methodology without renegotiating their data-feed contract.

8. Anti-patterns we deliberately avoid

We have explicitly structured the methodology to prevent the following gaming strategies:

Anti-pattern	How we block it
"Hide your usage to look clean" — agent reduces tool calls to lower the chance of incidents	Volume is a denominator, never a positive contributor. Subscores are rates and z-scores, not raw counts.
"Rotate identities to reset the score" — operator spins up a new agent_id after a bad incident	Operator-level propagation (§4). New agent inherits operator penalty.
"New operator gets a free perfect score" — fresh operator registers, gets `low_risk` immediately	New operators get `no_history` band, not `low_risk`. Insurance carriers treat the two distinctly.
"Old agent in production with many incidents outscores young clean agent"	No formula contains a `time_in_production` divisor or a `1/incidents_per_year` term. Decay (§5) reduces old penalties, but never rewards age in absolute terms.
"Self-report falsification to hide bad behavior"	Death certificate event (§4.4): hard zero for 30d + slow-decay operator penalty + permanent flag in evidence pack until decayed below 1.0.
"Game the policy threshold" — operator sets policy thresholds artificially low so blocks never trigger, looking compliant on paper	`manifest_age_days` and `policy_eval_coverage` (v1.0 inputs) penalize stale or skipped policy evaluations. CISO drill-down surfaces thresholds.

9. v0.1.0 stub vs v1.0 target — honest gap analysis

The v0.1.0 implementation in internal/api/standing.go is a publishable stub. It computes a real score from real telemetry, but uses a small subset of the v1.0 input set:

Subscore	v0.1.0 inputs (count)	v1.0 target inputs (count)	Confidence
Governance Discipline	1 (HITL rate proxy)	6	Low — directionally correct, magnitudes uncalibrated
Scope Adherence	1 (block rate)	6	Low — same
Anomaly Load	0 (constant 90)	6	None — placeholder until anomaly detection ships

Decay rules (§5) and identity propagation (§4) are specification-only in v0.1.0 — the runtime does not yet apply them. They are documented now so insurer integrators can plan for them; the implementation roadmap is in §10.

What this means for early users:

Insurance carriers in pre-revenue research mode can review the methodology and design-partner the integration. They should NOT use v0.1.0 scores as a contractual underwriting input.
Deployers and their customers (CISOs) can use v0.1.0 scores as a directional signal in security review. The methodology_version field in the certificate is honest about the maturity.
The aarm_conformance field is aligned_extended_review_pending in v0.1.0 — formal AARM Extended conformance review (CSA) is in flight (TODO).

10. Roadmap

Version	Target	Major changes
v0.1.x	NOW	Stub. Three subscores, simplified formulas, seeded weights.
v0.2.x	NEXT (months 3-6)	Add scope_drift_count, state_anomaly_count, manifest_age_days. Implement chain_continuity input. Publish first validation backtest against the four major 2026 incidents (Meta, McKinsey Lilli, Mercor/LiteLLM, Step Finance).
v0.3.x	THEN (months 6-9)	Implement operator-level identity propagation (§4). Implement decay rules (§5).
v1.0	LATER (months 9-15)	Re-weight all subscores against accumulated 6-12 months of corpus data. Annual revision cycle begins. Vertical-specific subscore variants (FS / healthcare / federal) ship as v1.0+x.

Appendix A: Field reference

Every input symbol used in this methodology, with its data source.

Symbol	Type	Source	Aggregation window
`hitl_bypass_rate_30d`	float [0, 1]	events table where `policy_decision='HUMAN_REQUIRED'` AND review status='timeout_proceed'	30 days
`hitl_rate_30d` (v0.1.0 proxy)	float [0, 1]	events table where `policy_decision='HUMAN_REQUIRED'` / total	30 days
`policy_eval_coverage_30d`	float [0, 1]	events table where `policy_decision IS NOT NULL` / total ingested events	30 days
`chain_continuity`	float [0, 1]	computed via internal/chain.WalkSegment	last sync
`hitl_response_p95s_30d`	int seconds	reviews table, `decided_at - requested_at` p95	30 days
`policy_version_age_days`	int days	policies table, `now() - max(activated_at)`	snapshot
`proxy_uptime_30d`	float [0, 1]	events table, fraction of expected 5-min sync windows present	30 days
`scope_drift_count_30d`	int	events table where tool_name NOT IN declared manifest	30 days
`state_anomaly_count_30d`	int	computed from state-transition graph (NEXT)	30 days
`manifest_age_days`	int days	manifests table, `now() - max(updated_at)`	snapshot
`block_rate_30d`	float [0, 1]	events table where `policy_decision='BLOCK'` / total	30 days
`unauth_credential_count_30d`	int	events table where `authorizing_credential IS NULL` AND tool requires credential	30 days
`tools_outside_manifest_30d`	int	DISTINCT count of tool_name where tool NOT IN declared manifest	30 days
`volume_30d`	int	total event count	30 days
`volume_z90`	float	z-score of volume_30d vs trailing 90d mean/std	30 + 90 days
`tool_div_z90`	float	z-score of unique tool count vs trailing 90d	30 + 90 days
`tod_anomaly_count_30d`	int	events outside operator-declared operating hours	30 days
`incidents_lifetime_decayed`	float	sum of all incident events with §5 decay applied	lifetime
`latency_anomaly_count_30d`	int	events with `actual_latency_ms` > p99(operator's baseline)	30 days

Appendix B: Change log

Version	Date	Changes
v0.1.0	2026-05-10	Initial publication. Stub implementation.