Calibration Report
Every value in topos/evaluation/policies/calibration.py is downstream of the evidence below. This report re-syncs the
live leaderboard corpus, shows the distribution it produces, and derives the calibration
values that distribution now recommends. Add more packages and languages and the picture
— and the thresholds it justifies — sharpen.
- 1
Experimental results
Every file in the cohort is evaluated by Topos and recorded in the leaderboard corpus.
- 2
Empirical distribution
Per-metric histograms and per-dimension ECDFs reveal where typical code ends and outliers begin.
- 3
Calibration values
Elbows and percentiles recommend the gates & floors committed to
calibration.py. - 4
Evaluations & medals
Topos applies those values; generator achievements meet on the lattice to award a medal.
1 · The experimental corpus
This snapshot drives everything downstream. It grows along two axes — more packages and more languages — and each addition tightens the distribution.
Language coverage: python (1220). Packages such as boto3, certifi, idna, packaging, requests, urllib3.
2 · The distribution it produces
Topos thresholds are tuned on the top-downloaded PyPI packages: each file is evaluated with Topos probes, raw metric gates are set from distribution percentiles, and normalized score floors are chosen from ECDF elbows and pass-rate analysis.
Composable
instability ∈ [0.3, 0.7], fan-in ≤ 15, fan-out ≤ 15Secure
dangerous_calls = 0, taint_flows = 03 · The calibration values it recommends
The distribution above is the input; these are the numbers it implies for
calibration.py. The committed column is parsed live from the policy module so drift is visible at a glance.
Committed values read from the same release that ran the stats — topos v0.3.4 · calibration.py ↗.
Normalized score floors (SCORE_FLOORS)
| Generator | Recommended | Committed | Drift | Basis (from corpus) |
|---|---|---|---|---|
| Simple | 0.75 | 0.4 | ▲ +0.35 | ECDF elbow at 0.73 → rounded to 0.75. |
| Composable | 0.95 | 0.8 | ▲ +0.15 | ECDF elbow at 0.97 → rounded to 0.95. |
| Secure | 1 | 1 | in sync | Categorical security — held at 1.00 regardless of elbow. |
Raw-metric gates — grounded in corpus percentiles
| Gate | Committed bound | Corpus evidence | Why |
|---|---|---|---|
| McCabe cyclomatic complexity SIMPLE.max_cyclomatic |
<= 15 | median=—, p75=38 | Upper bound near the corpus median — most files stay below it. |
| Max single-function complexity SIMPLE.max_function_complexity |
<= 10 | median=—, p75=10 | Caps the worst function around the corpus 75th percentile. |
| Kolmogorov AST entropy band SIMPLE.min_entropy / max_entropy |
[0.2, 0.8] | p5=0.19, p95=0.64 | Healthy band spans the bulk of the corpus (p5–p95). |
| Martin module instability band COMPOSABLE.instability_low / instability_high |
[0.3, 0.7] | p25=0.25, p75=0.64 | Centred on the corpus median (≈0.5) — balanced coupling. |
| Module fan-in COMPOSABLE.max_fan_in |
<= 15 | p75=0, p95=0 | Bound sits near the corpus 95th percentile. |
| Module fan-out COMPOSABLE.max_fan_out |
<= 15 | p75=0, p95=1 | Bound sits near the corpus 95th percentile. |
| Dangerous CPG calls SECURE.max_dangerous_calls |
== 0 | median=—, p95=2 | Categorical: the corpus median is 0 — any call fails. |
| Active taint flows SECURE.max_taint_flows |
== 0 | median=—, p95=0 | Categorical: zero-tolerance dataflow safety. |
Recalibration trigger: Elbow = argmax of |d²P/ds²| on the file-level ECDF (per dimension). Re-pick a threshold when the elbow differs from the current floor by more than 0.05.
4 · How those values award medals
Topos applies the calibrated gates and floors to each file. The three generator results — Simple, Composable, Secure — meet on an 8-element lattice; the lattice element maps to a medal.
| Lattice element | Medal | Files | |
|---|---|---|---|
IDEAL |
🥇 GOLD | 289 | |
COMPOSABLE_SECURE |
🥈 SILVER | 239 | |
SIMPLE_COMPOSABLE |
🥈 SILVER | 0 | |
SIMPLE_SECURE |
🥈 SILVER | 220 | |
COMPOSABLE |
🥉 BRONZE | 123 | |
SECURE |
🥉 BRONZE | 256 | |
SIMPLE |
🥉 BRONZE | 30 | |
SLOP |
❌ SLOP | 63 |