corpus → distribution → calibration.py → evaluations → medals

Calibration Report

Every value in topos/evaluation/policies/calibration.py is downstream of the evidence below. This report re-syncs the live leaderboard corpus, shows the distribution it produces, and derives the calibration values that distribution now recommends. Add more packages and languages and the picture — and the thresholds it justifies — sharpen.

Synced Jun 16, 2026 · 08:33 UTCTopos v0.3.4 leaderboard/data/raw/structural_scores_pypi.jsonl
  1. 1

    Experimental results

    Every file in the cohort is evaluated by Topos and recorded in the leaderboard corpus.

  2. 2

    Empirical distribution

    Per-metric histograms and per-dimension ECDFs reveal where typical code ends and outliers begin.

  3. 3

    Calibration values

    Elbows and percentiles recommend the gates & floors committed to calibration.py.

  4. 4

    Evaluations & medals

    Topos applies those values; generator achievements meet on the lattice to award a medal.

1 · The experimental corpus

This snapshot drives everything downstream. It grows along two axes — more packages and more languages — and each addition tightens the distribution.

6packages
1220files evaluated
1languages
3quality generators

Language coverage: python (1220). Packages such as boto3, certifi, idna, packaging, requests, urllib3.

2 · The distribution it produces

Topos thresholds are tuned on the top-downloaded PyPI packages: each file is evaluated with Topos probes, raw metric gates are set from distribution percentiles, and normalized score floors are chosen from ECDF elbows and pass-rate analysis.

Simple

cyclomatic ≤ 15, max function complexity ≤ 10, entropy ∈ [0.2, 0.8]
median
47.5
elbow
0.73
≥ floor
57.38%
committed floor
0.4
Score ECDF — solid = committed floor, dashed = elbow, teal = recommended 0%25%50%75%100%floor 0.4elbow 0.73rec 0.75normalized score00.250.50.751
Floor sweep — % of corpus files passing each candidate 0.3 → 62.38%62.380.30.35 → 60.33%60.330.350.4 → 57.38%57.380.40.45 → 53.77%53.770.450.5 → 49.02%49.020.50.55 → 40.66%40.660.550.6 → 37.05%37.050.6candidate floor → % files passing
cfg.cyclomatic1241
ast.max_function_complexity044
ast.entropy01.4

Composable

instability ∈ [0.3, 0.7], fan-in ≤ 15, fan-out ≤ 15
median
97.5
elbow
0.97
≥ floor
60%
committed floor
0.8
Score ECDF — solid = committed floor, dashed = elbow, teal = recommended 0%25%50%75%100%floor 0.8elbow 0.97rec 0.95normalized score00.250.50.751
Floor sweep — % of corpus files passing each candidate 0.5 → 65.98%65.980.50.6 → 63.11%63.110.60.7 → 60.08%60.080.70.75 → 60%600.750.8 → 60%600.80.85 → 54.92%54.920.850.9 → 54.92%54.920.9candidate floor → % files passing
mdg.instability01
mdg.fan_in00
mdg.fan_out084

Secure

dangerous_calls = 0, taint_flows = 0
median
100
elbow
0.97
≥ floor
82.3%
committed floor
1
Score ECDF — solid = committed floor, dashed = elbow, teal = recommended 0%25%50%75%100%floor 1elbow 0.97normalized score00.250.50.751
Floor sweep — % of corpus files passing each candidate 0.8 → 82.3%82.30.80.85 → 82.3%82.30.850.9 → 82.3%82.30.90.95 → 82.3%82.30.951 → 82.3%82.31candidate floor → % files passing
cpg.dangerous_calls020
cpg.taint_flows00

3 · The calibration values it recommends

The distribution above is the input; these are the numbers it implies for calibration.py. The committed column is parsed live from the policy module so drift is visible at a glance.

Committed values read from the same release that ran the stats — topos v0.3.4 · calibration.py ↗.

Normalized score floors (SCORE_FLOORS)

GeneratorRecommendedCommittedDriftBasis (from corpus)
Simple 0.75 0.4 ▲ +0.35 ECDF elbow at 0.73 → rounded to 0.75.
Composable 0.95 0.8 ▲ +0.15 ECDF elbow at 0.97 → rounded to 0.95.
Secure 1 1 in sync Categorical security — held at 1.00 regardless of elbow.

Raw-metric gates — grounded in corpus percentiles

GateCommitted boundCorpus evidenceWhy
McCabe cyclomatic complexity
SIMPLE.max_cyclomatic
<= 15 median=—, p75=38 Upper bound near the corpus median — most files stay below it.
Max single-function complexity
SIMPLE.max_function_complexity
<= 10 median=—, p75=10 Caps the worst function around the corpus 75th percentile.
Kolmogorov AST entropy band
SIMPLE.min_entropy / max_entropy
[0.2, 0.8] p5=0.19, p95=0.64 Healthy band spans the bulk of the corpus (p5–p95).
Martin module instability band
COMPOSABLE.instability_low / instability_high
[0.3, 0.7] p25=0.25, p75=0.64 Centred on the corpus median (≈0.5) — balanced coupling.
Module fan-in
COMPOSABLE.max_fan_in
<= 15 p75=0, p95=0 Bound sits near the corpus 95th percentile.
Module fan-out
COMPOSABLE.max_fan_out
<= 15 p75=0, p95=1 Bound sits near the corpus 95th percentile.
Dangerous CPG calls
SECURE.max_dangerous_calls
== 0 median=—, p95=2 Categorical: the corpus median is 0 — any call fails.
Active taint flows
SECURE.max_taint_flows
== 0 median=—, p95=0 Categorical: zero-tolerance dataflow safety.

Recalibration trigger: Elbow = argmax of |d²P/ds²| on the file-level ECDF (per dimension). Re-pick a threshold when the elbow differs from the current floor by more than 0.05.

4 · How those values award medals

Topos applies the calibrated gates and floors to each file. The three generator results — Simple, Composable, Secure — meet on an 8-element lattice; the lattice element maps to a medal.

Lattice elementMedalFiles
IDEAL 🥇 GOLD 289
COMPOSABLE_SECURE 🥈 SILVER 239
SIMPLE_COMPOSABLE 🥈 SILVER 0
SIMPLE_SECURE 🥈 SILVER 220
COMPOSABLE 🥉 BRONZE 123
SECURE 🥉 BRONZE 256
SIMPLE 🥉 BRONZE 30
SLOP ❌ SLOP 63