Platform & handoff

Operations — run, review, and inspect surfaces.

The operational surfaces used to run, review, and inspect DiscoveryLab in production.

Operational surface · production scaffolding

The MCP catalog, persistence layer, and CI bench that make TensFormer + AIScientist deployable.

Every typed surface above ships with the production scaffolding to run it: a stdio JSON-RPC binary external agents can plug into, an on-disk cache that survives restarts, recursive improvement proposal tools, smart short-circuits for repeated work, an offline inspector, and CI that posts the regression badge on every PR. The new improvement loop is designed to sit behind the same typed dispatcher and review gates as the existing MCP surfaces.

synced from tensorlang/eval/eval_badge.svg on every build

01 · MCP catalog

Five first-party MCP servers behind one composite dispatcher.

PeptiterMCPCatalog.makeComposite(...) returns a CompositeMCPServer routing literature/* → LiteratureMCPServer (4 adapters: fixture, PubMed, Semantic Scholar, Europe PMC), lean/* → LeanVerifierMCPServer (TensorLang manifest hash binding + lake build round-trip), lab_loop/* → LabLoopMCPServer (V3LabLoopOrchestrator behind a typed MCP surface), improvement/* → ImprovementMCPServer (recursive evidence improvement proposals), model/* → PeptiterModelMCPServer (semantic_hash-pinned biology_v* introspection plus world-model source, pathway assertion, and body-twin model-card inspection). Same code path drives the in-process app, the stdio binary, and Swift tests — they cannot drift.

Sources/PeptiterDiscovery/MCP/{PeptiterMCPCatalog,CompositeMCPServer,…}.swift

02 · peptiter-mcp-stdio

Newline-JSON-RPC stdio server, single binary, ready for Claude Code.

Drop-in for any MCP client that speaks newline-delimited JSON-RPC over stdio. .claude/mcp.example.json + docs/MCP_CONFIG.md show the wiring with three flavors (built-binary path, swift-run dev, concurrent dispatcher). Runs sequentially by default; --concurrent N opts into a TaskGroup-backed pool with an actor-serialized stdout writer so JSON lines never interleave at the byte level.

peptiter-mcp-stdio --probe                  # tools/list as JSON, exit
peptiter-mcp-stdio --tools-list-version     # SHA256 of canonical catalog
peptiter-mcp-stdio --concurrent 4           # batch-friendly dispatch
peptiter-mcp-stdio --cache-dir ~/peptiter   # persist trace + verifier receipts

Sources/PeptiterMCPStdioCLI/main.swift

03 · Cached receipts + traces

Lake builds and orchestrator runs persist across restarts.

OnDiskCache<T: Codable & Sendable> generalizes the persistence pattern: actor-backed in-memory dict, optional <dir>/<name>.json file per entry, sanitized filenames, corruption-tolerant load. lean/verify caches receipts keyed by manifest semantic hash so identical-artifact requests skip lake build (or pass noCache: true to force a fresh verify). lab_loop/run caches Trace + fingerprint so cold restarts answer lab_loop/inspect without re-dispatching the orchestrator. improvement/* keeps proposal state inside the MCP server instance; persistence can be added with the same cache pattern when promotion records need to survive process restarts. --prune-cache 7d evicts at startup; peptiter-mcp-cache --evict <ns>/<key> evicts a single entry on demand.

Sources/PeptiterDiscovery/MCP/OnDiskCache.swift

04 · Smart short-circuits

cachedHash and cachedFingerprint let agents skip identical work.

Every expensive tool accepts an optional cache hint from the caller. lean/verify takes cachedHash; lab_loop/run and lab_loop/inspect take cachedFingerprint (FNV-1a over attribution_hash + overrides). Match → compact { unchanged: true, … } delta. Mismatch → full receipt with a delta block flagging the change reason. Stateless on the wire — caller owns the cache.

Sources/PeptiterDiscovery/MCP/{LeanVerifierMCPServer,LabLoopMCPServer}.swift

05 · Inspector CLI

peptiter-mcp-cache reads what's persisted without spinning up the server.

Read-only inspector. Walks the cache directory, reports per-namespace counts and bytes, lists each entry's key + size + age + fingerprint. --json mode emits a stable schema; --details prints the first 160 chars of each payload (text) or the full decoded JSON. Lets operators audit deployment state offline.

peptiter-mcp-cache --cache-dir ~/peptiter
  peptiter cache @ ~/peptiter
  ## lab_loop (1 entries, 11.3 KB)
    v3-il23-axis · 11538 bytes · 0s old · fingerprint 14ebfd30a3a9e488
  ## lean (3 entries, 4.7 KB)
    biology_v2__e065d18796a17c58… · 1820 bytes · 12m old

Sources/PeptiterMCPCacheCLI/main.swift

06 · CI + regression bench

Every PR posts an EVAL.md badge; overlay drift fails the build.

.github/workflows/eval-badge.yml runs the transpiler Python tests, regenerates BiologyV2/V3.lean from the (potentially modified) overlay JSON, runs lake build, executes the eval harness, and posts a sticky PR comment with the headline F1. check_overlay_sync.py blocks PRs where BIOLOGY_OVERLAY.md drifts from build_biology_v3. EVAL.md gains a 'Diff vs base ref' section via --diff-against-base origin/main so reviewers see cumulative regression.

EVAL · v2 signed — F1 0.818 · safety precision 1.000 · 25 cells, 4 models

.github/workflows/eval-badge.yml + tensorlang/eval/eval_harness.py

Operational contracts

What the deployment surface promises an operator.

Seven guarantees enforced at the dispatcher level. Each is a line of code an operator can grep, not a policy memo.

Untrusted MCP

first-party servers only; tool prefixes pre-registered; no community MCP for sensitive data

Hash binding

TensorLang and Lean prove the same finite fragment under one semantic_hash

Audit ledger

every tool call logged through MCPDispatcher; every cache entry timestamped

Fail-loud-don't-fall-silent

missing real-data downloads exit code 2 with instructions, never silently use fixtures

Model-card boundaries

world-model MCP tools expose source licenses, context overlays, executable islands, VVUQ, and blocked claims

Stateless overrides

agents pass their own cache hints; servers don't track per-client state

L4 ceiling

no autonomous wet-lab loops until human approval system is solid (per SCIENTIST.md §7)

Live-system numbers

The surface, in commits and tests.

tests pass: recursive loop test coverage added
tools/list-version: in-band JSON-RPC method · stable SHA256
executables: peptiter-mcp-stdio · peptiter-mcp-cache · peptiter-research-copilot · peptiter-calibration-import
model tools: list_world_model_sources · inspect_pathway_assertions · body_twin_model_card
eval badge: JSON · MD · SVG · sticky PR comment
CI workflows: eval-badge.yml · pathway-lean.yml

Setup, end-to-end. Two commands:swift build produces every executable; cp .claude/mcp.example.json ~/.claude/mcp.json wires Claude Code into the catalog. From there an agent can search literature, verify mechanisms, run closed-loop experiments, and inspect the cache — all behind one dispatcher.

Related on this track

Experiments

Validation and assay planning.

AI Scientist

Research loop and next-action reasoning.