Platform & handoff

AI Scientist — research loop and next-action reasoning.

The AI Scientist closes the research loop and proposes the next informative action, with its reasoning visible.

AI Scientist Workbench · evidence-grounded loop

An Iron Man suit for scientists, not an autonomous scientist.

The AI Scientist Workbench is the operator surface above TensFormer. Six typed Swift surfaces shipped this milestone close the loop from a disease query to a queued, approval-gated wet-lab job — every artifact Codable, every transition audit-logged, no tool capable of touching the lab without an explicit approval. MCP underneath as the tool bus; TensorLang structures the biology; Lean issues the proof receipts; the wet lab decides what is true.

01 · Research copilot

Citation-grounded literature reasoning, not plausible paragraphs.

Four interchangeable LiteratureCopilot adapters: a deterministic local fixture for tests, plus PubMed E-utilities, Semantic Scholar Graph API, and Europe PMC REST. Every EvidenceCard carries provenance back to the originating LiteratureQuery; ClaimExtractor pulls typed (subject, predicate, object) triples out of titles and abstracts using the same predicate vocabulary the biology_v* TensorLang modules use, so claims project directly onto a BiologySpec without translation.

peptiter-research-copilot --disease "ulcerative colitis" --max 3
  rank 1: TNF      (score 1.575)
  rank 2: IL23R    (score 1.360)
  rank 3: IL17A    (score 0.525)

Sources/PeptiterDiscovery/AIScientist/{EvidenceCard,LiteratureCopilot,...}.swift

02 · Evidence graph + canvas bridge

Cards collapse into a typed graph; the graph collapses into a verifiable mechanism.

EvidenceGraph normalizes BiologicalEntities across cards (ontology+ID when present, canonical label otherwise) so PubMed and Europe PMC mentions of the same target collapse to one node. Edges aggregate confidence and uncertainty across supporting triples and carry counterevidence references. EvidenceGraphMechanismBridge converts the typed graph + framing into a PathwayMechanismHypothesis the existing PathwayMechanismVerifier accepts — closing the loop from research evidence to the Lean-verifiable mechanism path.

Sources/PeptiterDiscovery/AIScientist/EvidenceGraph.swift

03 · First-party MCP servers

JSON-RPC tools, untrusted by default, every invocation logged.

MCPProtocol gives JSON-RPC 2.0 envelopes, typed Tool / Resource / Prompt schemas, and an MCPDispatcher that routes calls to a server. LiteratureMCPServer exposes the four LiteratureCopilot adapters as MCP tools (literature/search, literature/list_adapters) plus a notebook resource. LeanVerifierMCPServer wraps the existing LeanExternalVerifier so external agents can drive the biology_v1/v2 contract through MCP without bypassing the typed hash-bound receipt.

mcp.dispatch tools/call name=literature/search
  arguments: { queryID, adapter: "fixture", disease: "ulcerative colitis" }
  → cards: [...], rankedHypotheses: [...], notebookEntry: {...}

Sources/PeptiterDiscovery/MCP/{MCPProtocol,LiteratureMCPServer,LeanVerifierMCPServer}.swift

04 · TensorLang ↔ Lean inside the loop

Per-path attribution + signed-mechanism theorems alongside hash binding.

Phase 4 closes the gap between the TensFormer spine and the AI Scientist surfaces above. emit_biology_v2.py enumerates 23 signed paths through biology_v2; BiologyV2.lean's 14 signed-sum + 28 path-count theorems all native_decide and lake build is green. The Swift LeanExternalVerifier.Receipt now carries the attribution report so reviewers see both protective and risk paths per peptide × endpoint, not just the summed sign.

Packages/PathwayLean/tensorlang/emit_biology_v2.py + .../Examples/BiologyV2.lean

05 · Assay planning

Approval gates encoded as code, not policy memos.

AssayPlan captures objective, materials, candidates, controls, dose schedule, plate map, accept/reject criteria, risk class, approval status — all Codable. AssayPlanDrafter takes a verified mechanism and emits a draft plan whose approval state is .readyForReview only when every gate passes: vehicle present, dose schedule spans 3 orders of magnitude, plate map has no double-assigned wells, risk screen returns no biosecurity or dual-use flags. Otherwise the plan lands in .rejected with the specific reason — a reviewer never sees an unsafe plan in the queue.

Sources/PeptiterDiscovery/AssayPlanning/{AssayPlan,AssayPlanDrafter}.swift

06 · Lab queue + active learning

Belief state flips on every assay outcome.

LabJob has a one-way state machine (drafted → approved → queued → running → returned → analyzed); out-of-order transitions throw rather than silently advancing. LabJobQueue persists Codably. CandidateBelief is a Beta(α, β) conjugate prior whose ingest() flips next-experiment selection on every outcome — positive bumps α, negative bumps β, inconclusive widens without bias; pickNextExperiment chooses the highest-posterior-variance candidate.

Sources/PeptiterDiscovery/LabQueue/{LabJob,LabJobQueue,CandidateBelief}.swift

Karpathy autonomy slider

Explicit levels, capped at L4 until the approval system is solid.

Following Karpathy's "do not jump to fully autonomous agents" guidance, the workbench exposes the autonomy level as a first-class control. Each surface declares the highest level it will operate at; the policy enforces the ceiling at the dispatcher.

Read-only

summarize papers, inspect data, answer questions

Suggest

propose hypotheses, candidates, assays

Dry-run

run computational tools and simulations

Draft

compose protocols, plate maps, lab manifests

Queue with approval

executable lab jobs, human approval required

Bounded loop

pre-approved experiment classes within budget + safety

Forbidden

unconstrained autonomous biological experimentation

Claim contract

What the workbench promises a reviewer.

Citations only

every EvidenceCard carries provenance back to its query

Hash binding

TensorLang and Lean prove the same finite fragment

Untrusted MCP

first-party servers; no community MCP for sensitive data

Approval gate

no wet-lab action queues without human sign-off

Audit ledger

every state transition records timestamp + actor

L4 ceiling

no autonomous wet-lab loops until approval system is solid

Test scoreboard. 137 tests pass, 3 live-network tests skipped without PEPTITER_LIVE_*=1. Every change to biology_v* regenerates EVAL.md so v2 can be compared against v1's regression baseline.

End-to-end pipeline

Disease query in, approval-gated wet-lab job out.

The six surfaces compose into one workflow. Each handoff is typed, each artifact is Codable, each transition is logged. MCP can drive any step from an external agent without bypassing the typed contract.

disease query
  ↓ (Phase 1 · LiteratureCopilot)
EvidenceCard[] with citations + provenance
  ↓ (Phase 2 · EvidenceGraph.ingest)
typed BiologicalEntity graph
  ↓ (Phase 2 · EvidenceGraphMechanismBridge)
PathwayMechanismHypothesis
  ↓ (Phase 4 · TensorLang biology_v* + LeanExternalVerifier)
verified mechanism + per-path attribution + Lean proof receipt
  ↓ (Phase 5 · AssayPlanDrafter)
AssayPlan with vehicle, controls, plate map, risk screen
  ↓ (human approval — L4 ceiling)
LabJob enqueued
  ↓ (Phase 6 · LabJobQueue + ActiveLearningEngine)
outcome ingested → CandidateBelief updated → next experiment chosen

Boundary check. The workbench is an interface for scientists, not a replacement for them. It can summarize, propose, simulate, draft, and queue under approval — it cannot order materials, run robots, or execute unapproved protocols. Karpathy's autonomy slider is the policy; the typed Swift surfaces are the enforcement.

source · open implementation

Sources/PeptiterDiscovery/AIScientist/

Sources/PeptiterDiscovery/MCP/

Sources/PeptiterDiscovery/AssayPlanning/

Sources/PeptiterDiscovery/LabQueue/

Sources/PeptiterResearchCopilotCLI/

Tests/PeptiterDiscoveryTests/

Six SCIENTIST.md tracking issues (#108–#113) closed in one milestone; eight commits in the AI Scientist + TensFormer integration block; full audit trail in the GitHub issue history.

Related on this track

Experiments

Validation and assay planning.