MethodologyScore Changes
Versioned History
Score Change Log
Every change to CostGuard Safety Score weights, thresholds, or term catalogs increments analysis_version. This log records what changed, when it changed, and the benchmark pass rate at release.
patch bumpBug fix with no scoring behavior change
minor bumpNew ambiguity term or volatility phrase added to catalogs
major bumpWeight or bucket threshold change — full benchmark review required
v1.0.0
2026-03-09
Initial scoring release. Establishes CostGuard Safety Score (CSS) as a deterministic, heuristic-based measure of prompt security and operational reliability.
Fixtures
5
Passed
5
Scoring Components
- · Prompt Injection (structural, 20%)
- · System Override (structural, 20%)
- · Jailbreak Behavior (ambiguity, 20%)
- · Token Cost Explosion (length + context + volatility, 60% combined)
- · Tool Abuse (ambiguity + structural, 40% combined)
Threat Intelligence Adjustments
Critical+10 risk_score
High+7 risk_score
Medium+3 risk_score
Notes
- · Score bands established: Hardened (91–100), Safe (71–90), Needs Hardening (41–70), Unsafe (0–40)
- · Ambiguous term catalog (v1.0): improve, optimize, better, good, high quality, fast, efficient, robust, flexible, clean, scalable, advanced, modern
- · Volatility phrase catalog (v1.0): write a detailed, comprehensive, in depth, as much as possible, thoroughly explain
- · Threat intelligence CVE adjustments introduced and bounded
- · All benchmark fixtures passing at 5/5 (100%)