Platform & handoff

Self-improvement snapshot — observation half of the loop.

The observation half of the improvement loop: what DiscoveryLab measures about itself between runs.

Self-improvement · observation half

One snapshot answers where the platform is. The recursive loop below changes where it goes next.

Five self-improvement signals (biology versions, regression bench, calibration, active-learning convergence, what changed since last time) used to live in separate artifacts. SelfImprovementSnapshot aggregates them into one typed Codable surface — exposed identically through a CLI (peptiter-self-improvement), through the improvement/snapshot MCP tool, and through an optional persistent ledger so the history is queryable over time. Pair it with the improvement/* proposal lifecycle (next section): the snapshot tells you what regressed, the loop decides what to do about it.

Biology version coverage

Every registered biology module reports its TensorLang semantic hash. v1 → v2 → v3_signed → v4_signed now sits beside biology_epi_v1 and biology_epi_v1_signed — a typed roster the snapshot reads from PeptiterMCPCatalog.defaultLeanModules and the matching .reasoner.json manifests.

biology_v{1,2,3_signed,4_signed} + biology_epi_v1{,_signed}

Regression bench

eval_harness.py writes eval_badge.json on every PR with the delta-vs-previous strings. The snapshot lifts that out of the CI sticky comment so any in-process consumer — operator CLI or agent — can answer 'are we regressing?' without scraping GitHub.

tensorlang/eval/eval_badge.json + EvalBadge.swift

Calibration

Brier, ECE, AUROC, AUPRC for the platform's currently-committed calibration_report.json. Drift between snapshots flags subsystem-typed regressions: Brier ↑ or ECE ↑ hint at claim_threshold, AUROC ↓ hints at ranking_policy — both map directly onto ImprovementSubsystem cases so improvement/propose accepts them without translation.

tensorlang/eval/calibration_report.json + ScienceCalibrationEvaluation.swift

Active-learning history

Per-run Beta(α, β) convergence already lives inside V3LabLoopOrchestrator.Trace. The snapshot rolls it up across every cached lab_loop/*.json under --cache-dir so an operator can see 'across the last N runs, did max-variance shrink?'

LabLoopMCPServer.CachedRun + ActiveLearningEngine

Improvement events (subsystem-typed)

Each event carries a subsystemHint that matches an ImprovementSubsystem case — claim_threshold, ranking_policy, mechanism_reasoning, safety_gate, eval_harness. The snapshot's Markdown renders the hint as a literal improvement/propose call so an operator (or agent) can copy-paste the next step. This is what closes the observation → intervention loop.

SelfImprovementSnapshot.detectImprovementEvents(...)

Three surfaces, one library

CLI for operators. MCP tool for agents. Ledger for history.

The same SelfImprovementSnapshot.read(...) drives all three. The CLI is the operator entry point and the only thing that writes to the ledger; the MCP tool is for agents that want the same shape without spawning a child process; the ledger is what makes the diff-based improvement events possible across invocations.

# Operator CLI — Markdown to stdout, append to a ledger
peptiter-self-improvement \
  --repo . \
  --cache-dir ~/.peptiter/cache \
  --ledger-dir ~/.peptiter/ledger

# Agent — same snapshot via JSON-RPC
{"method": "tools/call",
 "params": {"name": "improvement/snapshot",
            "arguments": {}}}

A real snapshot, today

What the platform actually looks like right now.

Live Markdown rollup the CLI prints for the current checkout — six registered biology modules including the epigenome signed twin, the eval bench at F1 0.818, one cached active-learning run, calibration status, and the improvement-event entry the snapshot derived from the prior ledger state.

# Platform self-improvement snapshot

Schema: 0.1

## Biology version coverage (6 modules)

- biology_v1         → peptiter.biology_v1 · hash e065d18796a17c58…
- biology_v2         → peptiter.biology_v2 · hash 07a0604088ee8b6f… · with attribution
- biology_v3_signed  → peptiter.biology_v3 · hash 20331e6e81c29318… · with attribution
- biology_v4_signed  → peptiter.biology_v4 · hash d11eefaddf4b4851… · with attribution
- biology_epi_v1     → peptiter.biology_epi_v1 · hash 708129ac96b1cfe7…
- biology_epi_v1_signed → peptiter.biology_epi_v1 · hash c445ed3e3e922096… · with attribution

## Regression bench

- status: ok · headline model: v2 signed · 4 models evaluated
- F1: 0.818 · safety precision: 1.000 · efficacy recall: 1.000 · tests: 25

## Calibration

- assay: receptorFitCheck · model ReceptorFitHeuristic v1.0 · selected curve isotonic
- holdout: 36 observations (27 positive, 9 negative)
- Brier: 0.0742 · ECE: 0.0472 · AUROC: 0.936 · AUPRC: 0.967

## Active-learning history (1 cached run)

- v3-il23-axis · 3/3 approved · max variance 0.0833 → 0.0833 (→ unchanged)

## Improvement events

- Calibration regression on receptorFitCheck: Brier 0.0100 → 0.0742 (+0.0642).
  → improvement/propose subsystem=claim_threshold
- Calibration regression on receptorFitCheck: AUROC 0.999 → 0.936 (-0.063).
  → improvement/propose subsystem=ranking_policy

The takeaway. The snapshot is the observation half of a coherent self-improvement story. When an entry in improvementEvents warrants a change — a regression bench delta went the wrong way, a lab loop stopped sharpening, a biology version rebuilt itself with a different hash — that's the signal to call improvement/propose and walk the proposal through evaluate → approve → promote (or rollback). The recursive discovery loop below is the intervention half.

Related on this track

Experiments

Validation and assay planning.

AI Scientist

Research loop and next-action reasoning.