Problem framing
Disease-first, mechanism-first, target-sequence-first, natural-analog-first, structure-first, or ensemble replay.
This page describes the AlphaEvolve-style layer added to DiscoveryLab: typed discovery-strategy genomes, clinical replay benchmarks, genetic operators, correctness scoring, complexity pressure, and epigenome transfer.
The genome is a typed, executable strategy graph. Running it produces candidate peptides, controls, assays, budgets, mechanism claims, and a trace that can be benchmarked against known clinical successes.
Disease-first, mechanism-first, target-sequence-first, natural-analog-first, structure-first, or ensemble replay.
Causal graph, receptor, enzyme, chromatin regulator, protein interface, hormone axis, or phenotype-first policy.
Endogenous, venom, microbial, clinical analog, natural product, de novo, or hybrid peptide families.
Local mutagenesis, target-conditioned generation, structure-first design, Bayesian optimization, diffusion, or macrocycle enumeration.
Binding, mechanism, stability, solubility, permeability, toxicity, synthesis, novelty, and calibration weights.
Parent, scrambled, near-miss, negative-target, known-positive, and known-negative controls kept in the denominator.
Expected improvement, uncertainty sampling, mechanism falsification, Pareto expansion, or calibration repair.
Model calls, candidate count, docking calls, runtime, wet-lab cost, receipts, failure modes, and anti-hindsight guardrails.
The benchmark suite scores whether an algorithm reconstructs the right kind of discovery path. It does not reward simply naming a famous drug, and it penalizes leakage through an anti-hindsight check.
Physiological peptide replacement and purification.
Endogenous hormone optimization for half-life, protease resistance, and receptor activity.
Evolved natural analog mining for stable receptor pharmacology.
Constrained venom peptide mining for ion-channel modulation.
Mechanism-derived peptide blockade of a transient protein interface.
Natural anticoagulant template redesign and simplification.
Microbial constrained peptide-like natural-product logic for chromatin-state modulation.
Each generation is scored by private gold recovery, assay usefulness, anti-hindsight behavior, candidate count, model calls, docking calls, runtime, wet-lab cost, and strategy complexity.
Seed a diverse population of typed discovery-strategy genomes.
Run each genome against clinical-gold historical tasks with gold answers hidden from the phenotype producer.
Compare outputs to the private gold mechanism, target class, scaffold, developability solution, and validation assays.
Keep elites, cross over strong policies, mutate genes, and preserve strategy diversity.
Apply the champion to epigenome discovery without deleting the existing peptide-generation workflow.
The evaluator compares the phenotype of a discovery strategy with a hidden historical gold standard, then subtracts pressure for compute, runtime, over-large batches, and complexity.
fitness = gold_recovery + mechanism + assay_quality + anti_hindsight - compute_cost - runtime - complexityThe champion is no longer scored by clinical-gold recall alone. A controller-aware layer compiles each strategy into a real peptide-GA run, designs the peptide arm of a set of combination trials, and blends fitness with the verified-controller yield — the fraction whose GA-designed arm completes a Lean-verifiable, PK-robust multi-modal controller:
champion = argmax (1 − w)·benchmark + w·verified_controller_yieldSo the meta-search optimizes for strategies that produce provably-stable cross-domain combinations, and the champion strategy drives the live peptide GA.
Did the algorithm infer the right target class?
Did it recover the intervention archetype rather than just a binding story?
Did it select the historical success mode: endogenous, venom, microbial, natural product, hybrid, or de novo?
Did it identify the real obstacles: half-life, permeability, toxicity, synthesis, selectivity, delivery?
Did it propose the assays that would have falsified or supported the historical mechanism?
How many candidates, model calls, docking calls, runtime minutes, and wet-lab units were needed?
Did the trace avoid drug-name leakage and exact historical answer recall?
The transfer layer maps the winning algorithm genome into generator lanes, chromatin targets, assay requirements, control policy, budget notes, calibration metadata, and failure-memory clauses.
Romidepsin/FK228-style replay is overweighted because it tests microbial constrained peptide-like natural-product logic for chromatin-state modulation.
Sequence-only binder, seed-local lead optimization, structure-first macrocycle, parent control, negative control, and near-miss calibration control lanes are retained as needed.
Candidates receive champion genome ID, policy digest, run ID, time-to-solution proxy, and controls-required metadata for audit and future calibration.
Retrospective recovery is not treated as biological proof. Wet-lab return data and mechanism claim boundaries remain mandatory.