corpus → distribution → calibration.py → evaluations → medals

Calibration Report

Every value in topos/evaluation/policies/calibration.py is downstream of the evidence below. This report re-syncs the live leaderboard corpus, shows the distribution it produces, and derives the calibration values that distribution now recommends. Add more packages and languages and the picture — and the thresholds it justifies — sharpen.

Synced Jun 16, 2026 · 08:33 UTCTopos v0.3.4 leaderboard/data/raw/structural_scores_pypi.jsonl

1
Experimental results
Every file in the cohort is evaluated by Topos and recorded in the leaderboard corpus.
2
Empirical distribution
Per-metric histograms and per-dimension ECDFs reveal where typical code ends and outliers begin.
3
Calibration values
Elbows and percentiles recommend the gates & floors committed to calibration.py.
4
Evaluations & medals
Topos applies those values; generator achievements meet on the lattice to award a medal.

1 · The experimental corpus

This snapshot drives everything downstream. It grows along two axes — more packages and more languages — and each addition tightens the distribution.

6packages

1220files evaluated

1languages

3quality generators

Language coverage: python (1220). Packages such as boto3, certifi, idna, packaging, requests, urllib3.

2 · The distribution it produces

Topos thresholds are tuned on the top-downloaded PyPI packages: each file is evaluated with Topos probes, raw metric gates are set from distribution percentiles, and normalized score floors are chosen from ECDF elbows and pass-rate analysis.

Simple

cyclomatic ≤ 15, max function complexity ≤ 10, entropy ∈ [0.2, 0.8]

median

47.5

elbow

0.73

≥ floor

57.38%

committed floor

0.4

Score ECDF — solid = committed floor, dashed = elbow, teal = recommended

Floor sweep — % of corpus files passing each candidate

cfg.cyclomatic

ast.max_function_complexity

ast.entropy

Composable

instability ∈ [0.3, 0.7], fan-in ≤ 15, fan-out ≤ 15

median

97.5

elbow

0.97

≥ floor

60%

committed floor

0.8

Score ECDF — solid = committed floor, dashed = elbow, teal = recommended

Floor sweep — % of corpus files passing each candidate

mdg.instability

mdg.fan_in

mdg.fan_out

Secure

dangerous_calls = 0, taint_flows = 0

median

100

elbow

0.97

≥ floor

82.3%

committed floor

Score ECDF — solid = committed floor, dashed = elbow, teal = recommended

Floor sweep — % of corpus files passing each candidate

cpg.dangerous_calls

cpg.taint_flows

3 · The calibration values it recommends

The distribution above is the input; these are the numbers it implies for calibration.py. The committed column is parsed live from the policy module so drift is visible at a glance.

Committed values read from the same release that ran the stats — topos v0.3.4 · calibration.py ↗.

Normalized score floors `(SCORE_FLOORS)`

Generator	Recommended	Committed	Drift	Basis (from corpus)
Simple	0.75	0.4	▲ +0.35	ECDF elbow at 0.73 → rounded to 0.75.
Composable	0.95	0.8	▲ +0.15	ECDF elbow at 0.97 → rounded to 0.95.
Secure	1	1	in sync	Categorical security — held at 1.00 regardless of elbow.

Raw-metric gates — grounded in corpus percentiles

Gate	Committed bound	Corpus evidence	Why
McCabe cyclomatic complexity SIMPLE.max_cyclomatic	<= 15	median=—, p75=38	Upper bound near the corpus median — most files stay below it.
Max single-function complexity SIMPLE.max_function_complexity	<= 10	median=—, p75=10	Caps the worst function around the corpus 75th percentile.
Kolmogorov AST entropy band SIMPLE.min_entropy / max_entropy	[0.2, 0.8]	p5=0.19, p95=0.64	Healthy band spans the bulk of the corpus (p5–p95).
Martin module instability band COMPOSABLE.instability_low / instability_high	[0.3, 0.7]	p25=0.25, p75=0.64	Centred on the corpus median (≈0.5) — balanced coupling.
Module fan-in COMPOSABLE.max_fan_in	<= 15	p75=0, p95=0	Bound sits near the corpus 95th percentile.
Module fan-out COMPOSABLE.max_fan_out	<= 15	p75=0, p95=1	Bound sits near the corpus 95th percentile.
Dangerous CPG calls SECURE.max_dangerous_calls	== 0	median=—, p95=2	Categorical: the corpus median is 0 — any call fails.
Active taint flows SECURE.max_taint_flows	== 0	median=—, p95=0	Categorical: zero-tolerance dataflow safety.

Recalibration trigger: Elbow = argmax of |d²P/ds²| on the file-level ECDF (per dimension). Re-pick a threshold when the elbow differs from the current floor by more than 0.05.

4 · How those values award medals

Topos applies the calibrated gates and floors to each file. The three generator results — Simple, Composable, Secure — meet on an 8-element lattice; the lattice element maps to a medal.

Lattice element	Medal	Files
`IDEAL`	🥇 GOLD	289
`COMPOSABLE_SECURE`	🥈 SILVER	239
`SIMPLE_COMPOSABLE`	🥈 SILVER	0
`SIMPLE_SECURE`	🥈 SILVER	220
`COMPOSABLE`	🥉 BRONZE	123
`SECURE`	🥉 BRONZE	256
`SIMPLE`	🥉 BRONZE	30
`SLOP`	❌ SLOP	63

Experimental results

Empirical distribution

Calibration values

Evaluations & medals

1 · The experimental corpus

2 · The distribution it produces

3 · The calibration values it recommends

Normalized score floors (SCORE_FLOORS)

Raw-metric gates — grounded in corpus percentiles

4 · How those values award medals

Normalized score floors `(SCORE_FLOORS)`