Peptiter / DiscoveryLab
Algorithm evolution

Use clinical-gold peptides to evolve discovery strategies.

This page describes the AlphaEvolve-style layer added to DiscoveryLab: typed discovery-strategy genomes, clinical replay benchmarks, genetic operators, correctness scoring, complexity pressure, and epigenome transfer.

Figure · schematic
Algorithm genome

Evolve the discovery policy, not just the peptide sequence.

The genome is a typed, executable strategy graph. Running it produces candidate peptides, controls, assays, budgets, mechanism claims, and a trace that can be benchmarked against known clinical successes.

Problem framing

Disease-first, mechanism-first, target-sequence-first, natural-analog-first, structure-first, or ensemble replay.

Target selection

Causal graph, receptor, enzyme, chromatin regulator, protein interface, hormone axis, or phenotype-first policy.

Scaffold source

Endogenous, venom, microbial, clinical analog, natural product, de novo, or hybrid peptide families.

Generation

Local mutagenesis, target-conditioned generation, structure-first design, Bayesian optimization, diffusion, or macrocycle enumeration.

Scoring

Binding, mechanism, stability, solubility, permeability, toxicity, synthesis, novelty, and calibration weights.

Validation

Parent, scrambled, near-miss, negative-target, known-positive, and known-negative controls kept in the denominator.

Active learning

Expected improvement, uncertainty sampling, mechanism falsification, Pareto expansion, or calibration repair.

Budget + audit

Model calls, candidate count, docking calls, runtime, wet-lab cost, receipts, failure modes, and anti-hindsight guardrails.

Clinical-gold replay

Known successful peptides become unit tests for discovery algorithms.

The benchmark suite scores whether an algorithm reconstructs the right kind of discovery path. It does not reward simply naming a famous drug, and it penalizes leakage through an anti-hindsight check.

Insulin-style

Physiological peptide replacement and purification.

GLP-1 analog-style

Endogenous hormone optimization for half-life, protease resistance, and receptor activity.

Exendin-style

Evolved natural analog mining for stable receptor pharmacology.

Conotoxin-style

Constrained venom peptide mining for ion-channel modulation.

Fusion-inhibitor-style

Mechanism-derived peptide blockade of a transient protein interface.

Hirudin-style

Natural anticoagulant template redesign and simplification.

Romidepsin/FK228-style

Microbial constrained peptide-like natural-product logic for chromatin-state modulation.

Evolution loop

Selection pressure is scientific correctness plus cost discipline.

Each generation is scored by private gold recovery, assay usefulness, anti-hindsight behavior, candidate count, model calls, docking calls, runtime, wet-lab cost, and strategy complexity.

01

Initialize

Seed a diverse population of typed discovery-strategy genomes.

02

Replay

Run each genome against clinical-gold historical tasks with gold answers hidden from the phenotype producer.

03

Score

Compare outputs to the private gold mechanism, target class, scaffold, developability solution, and validation assays.

04

Breed

Keep elites, cross over strong policies, mutate genes, and preserve strategy diversity.

05

Transfer

Apply the champion to epigenome discovery without deleting the existing peptide-generation workflow.

Fitness

Correctness is multi-part.

The evaluator compares the phenotype of a discovery strategy with a hidden historical gold standard, then subtracts pressure for compute, runtime, over-large batches, and complexity.

fitness = gold_recovery + mechanism + assay_quality + anti_hindsight - compute_cost - runtime - complexity
Now optimizes for verified-controller yield

The champion is no longer scored by clinical-gold recall alone. A controller-aware layer compiles each strategy into a real peptide-GA run, designs the peptide arm of a set of combination trials, and blends fitness with the verified-controller yield — the fraction whose GA-designed arm completes a Lean-verifiable, PK-robust multi-modal controller:

champion = argmax (1 − w)·benchmark + w·verified_controller_yield

So the meta-search optimizes for strategies that produce provably-stable cross-domain combinations, and the champion strategy drives the live peptide GA.

Target recovery

Did the algorithm infer the right target class?

Mechanism recovery

Did it recover the intervention archetype rather than just a binding story?

Scaffold recovery

Did it select the historical success mode: endogenous, venom, microbial, natural product, hybrid, or de novo?

Developability recovery

Did it identify the real obstacles: half-life, permeability, toxicity, synthesis, selectivity, delivery?

Assay quality

Did it propose the assays that would have falsified or supported the historical mechanism?

Efficiency

How many candidates, model calls, docking calls, runtime minutes, and wet-lab units were needed?

Anti-hindsight

Did the trace avoid drug-name leakage and exact historical answer recall?

Epigenome transfer

The champion policy configures the epigenome run without replacing the old workflow.

The transfer layer maps the winning algorithm genome into generator lanes, chromatin targets, assay requirements, control policy, budget notes, calibration metadata, and failure-memory clauses.

Primary benchmark

Romidepsin/FK228-style replay is overweighted because it tests microbial constrained peptide-like natural-product logic for chromatin-state modulation.

Transfer lanes

Sequence-only binder, seed-local lead optimization, structure-first macrocycle, parent control, negative control, and near-miss calibration control lanes are retained as needed.

Run annotations

Candidates receive champion genome ID, policy digest, run ID, time-to-solution proxy, and controls-required metadata for audit and future calibration.

Guardrails

Retrospective recovery is not treated as biological proof. Wet-lab return data and mechanism claim boundaries remain mandatory.