DiscoveryLab — Discovery Strategy

Peptiter · Technical specification of the eight-stage peptide discovery workflow, now extended with pathway-mechanism verification and Lean 4 audit receipts

DiscoveryLab is a condition-first peptide discovery application. This document summarises the workflow stages, the artefact each stage produces, the diagram elements shown in the product UI, and the underlying methods or data sources. The current implementation adds an explicit mechanism-verification layer: AI and graph methods can propose pathway hypotheses, Swift checks finite graph properties, Lean 4 artifacts provide an audit path, and perturbation or wet-lab evidence tests the assumptions.

NEW Pathway intelligence and formal mechanism checks

Candidate peptides can now carry typed pathway mechanisms: biological nodes, causal edges, intervention-blocked nodes, therapeutic reachability goals, adverse-pathway blockade claims, protected-pathway safety claims, conservation laws, and perturbation evidence anchors.

Implemented artifacts

PathwayMechanismHypothesis Structured candidate mechanism with proposal source, evidence anchors, causal graph, intervention context, reachability, blockade, safety, and conservation claims.
PathwayMechanismVerifier Local graph-level verifier that checks reachability, blockade, protected-node safety, and reaction conservation before wet-lab handoff.
LeanVerificationArtifact Generated Lean 4 module containing node types, baseline and intervention reachability, theorem names, and checksum-bound report identity.
peptiter-lean-verify External CLI that writes Lean source, invokes Lean or dry-run mode, captures diagnostics, validates checksums, and emits JSON verification receipts.
PerturbationEvidenceRecord Assay, omics, CRISPR, chemical perturbation, or partner wet-lab evidence attached to mechanism assumptions and scored for support, contradiction, or gaps.

Current assays: mechanismVerification · perturbationEvidence · receptor fit · stability · solubility · aggregation · synthesis · off-target · assay readiness

01 Select condition

Define therapeutic intent or phenotype as the entry point. Anchored to standard vocabularies so downstream queries are reproducible.

Diagram elements

Condition node (E66 / M06 / G56) ICD-10 code with surrounding evidence ring; standard vocabulary anchor.
phenotype Observable clinical traits (HPO terms) used to constrain the search.
intent Agonism, antagonism, allosteric modulation, or biased signalling.
comorbid Co-occurring conditions adjusting target prioritisation and selectivity requirements.
contra-ind. Receptor or pathway interactions to avoid (off-target, safety liabilities).

References: ICD-10 · MeSH · MONDO · HPO

02 Map pathways & receptors

Build a directed graph from the condition through pathway layers to candidate receptors with ligand class metadata and candidate mechanism claims that can later be verified.

Diagram elements

Condition vertex Entry point of the directed pathway graph.
Upstream / direct / downstream pathway Signalling cascade reachable from the condition (e.g. incretin → GLP-1/cAMP → insulin secretion for obesity).
Receptor candidates (4 nodes) Druggable receptors retrieved with ligand-class metadata, e.g. GLP1R, GIPR, GCGR, Y2R.
Convergence node Receptors with overlapping endogenous ligands.

References: Reactome · KEGG · IUPHAR / GtoPdb · UniProt · Lean 4

03 BioScout source systems

Evidence-backed mimicry plans drawn from evolved peptide systems; converted into a curated motif library.

Diagram elements

Source organism rows Program-specific lineages (e.g. Gila monster / amphibian / fish gut for obesity; cone snail / spider / scorpion for ion channels).
Motif library Curated sequence motifs and pharmacophores indexed from APD3, DRAMP, ConoServer.

References: APD3 · DRAMP · ConoServer

04 Seeded evolution

Start from validated bioactive peptide seeds and evolve local analogs with full ancestry, parent IDs, and operator history.

Diagram elements

Seed node Validated parent peptide with known bioactivity (e.g. exendin-4, magainin-2, ω-MVIIA).
Branch operator Substitution / cyclisation / N-methylation under family constraints.
Analog leaves Candidates with operator history and rationale for ranking.

References: Pfam / InterPro · Hopp & Woods 1981

05 Visualize peptide–receptor fit

Structure-aware 3D review of binding orientation; RealityKit-based visualisation on macOS / visionOS.

Diagram elements

Receptor scaffold Schematic of the target (Class A/B GPCR 7TM bundle, cytokine receptor, ion channel).
Peptide backbone Cα trace of the candidate placed against the receptor pocket.
Cα atoms Per-residue selectable nodes for side-chain inspection in the 3D view.
Key contact Predicted polar / hydrophobic interaction with a pocket residue (distance < 4 Å).

References: RCSB PDB · AlphaFold DB · PEP-FOLD

06 In-silico lab assessment + mechanism verification

Multi-criteria scoring with explicit rejection gates pre-wet-lab. Each gate is conservative until calibrated outcome data exists. Mechanism claims are also checked for pathway reachability, intervention blockade, protected-node safety, conservation, Lean auditability, and perturbation evidence support.

Diagram elements

fit Composite score from docking pose quality and contact-residue agreement.
stab Predicted resistance to proteolysis and conformational entropy penalty.
sol CamSol / SolubiS-style intrinsic solubility proxy.
aggr Zyggregator / Tango-style β-aggregation score.
synth SPPS coupling-difficulty estimate plus length and modification penalties.
tox ToxinPred-class classifier; failing candidates are gated out.
mech Graph-level mechanism verification: desired endpoint reachable, adverse endpoint blocked, safety claim explicit.
Lean receipt Checksum-bound verification artifact generated for CI or reviewer-facing audit.
perturb Evidence coverage across wet-lab, omics, CRISPR, chemical perturbation, or partner assay records.

References: ATTRACT / AttractKit · Lean 4 · ToxinPred · CamSol · Tango / Zyggregator

07 Prepare wet-lab batch

Hand off through LabSpace to capability-matched partners using machine-readable batch manifests.

Diagram elements

Batch manifest Sequences, modifications, and assay plan in a SiLA 2 / Allotrope ADF-compatible format.
Candidate IDs Per-program identifier prefixes (e.g. DL-GLP-0421, DL-IL17-0438, DL-CAV-0455).
Vials Synthesised quantity and assay readout — height encodes activity proxy (EC50 / IC50).

References: SiLA 2 · Allotrope ADF

08 Receive results & refine

Wet-lab and perturbation feedback update the surrogate model and the encoded mechanism assumptions; re-ranking selects the next batch or the next assay under an acquisition function.

Diagram elements

Closed loop Bayesian / active-learning loop between wet-lab and re-rank nodes.
Wet-lab node Assay results (binding affinity, cytotoxicity, stability) returned through LabSpace.
Re-rank node Surrogate model updated with new evidence; acquisition function selects next batch.

References: Shahriari et al. 2016 (Bayesian Opt.) · Settles 2009 (Active Learning)

Program variants

DiscoveryLab ships with three reference programs that swap condition codes, pathway labels, source organisms, seed peptides, gate thresholds, and candidate ID prefixes throughout the workflow.

Program Condition Primary target Seed Sources
Obesity ICD-10 E66 GLP1R · Class B GPCR exendin-4 Gila monster, amphibian, fish gut
Inflammation ICD-10 M06 IL-17RA · cytokine receptor magainin-2 amphibian, marine invert., human defensin
Neuropathic pain ICD-10 G56 Cav2.2 · N-type Ca²⁺ channel ω-MVIIA cone snail, spider venom, scorpion

Language and posture

The system uses constrained search, receptor-conditioned design, evidence-gated generation, candidate-family evolution, in-silico triage, pathway-mechanism verification, Lean audit receipts, perturbation evidence scoring, and a wet-lab feedback loop. It does not promise instant discovery, guaranteed binding, clinical efficacy, or fully automated drug discovery. Formal verification means the encoded claims follow from encoded assumptions; it does not prove that the biology is complete or clinically true. Scoring is intentionally conservative until calibrated outcome data exists, then specific gates are replaced by validated production packages and curated pathway importers.