Methodology · Disease Atlas by Euretos

A probabilistic translational reasoning framework.

Disease Atlas is not a single-score ranker. It is a map of disease biology that produces calibrated, uncertainty-aware predictions for every gene–disease pair across roughly 34,000 human diseases. Causal evidence, tractability, organ-resolved safety, and modality-specific design all run as separate pipelines, with disagreements between them surfaced rather than averaged away.

The page below explains the architecture visually in two minutes. The full methodology paper proves the system in thirty.

60 pages · published 2026 · references throughout
01 — Architecture

Causal inference at cell-type resolution. Not association counting.

Most target-prioritisation tools count gene–disease associations from the literature. Disease Atlas asks a causal question instead: would perturbing this gene change the disease? The platform begins with the disease as a structured biological object, resolved at organ, tissue, and cell-type level, and ranks the full protein-coding genome against that resolved biology.

01
Disease
~34,000 human diseases
with structured anatomy
02
Cell types · genetics · perturbation
3,500+ cell-type profiles, GWAS, knockdown libraries
03
Causal reasoning
Mendelian randomisation, network propagation, transfer learning
04
Biology · Safety · Tractability · Modality
Four independent pipelines
05
Calibrated integration
Probabilities and uncertainty propagated, never averaged
06
Target assessment
One inspectable card per gene–disease pair
Hover any step. Each layer reads from primary biological data, not from literature claims about it. Every score is reproducible and traceable to source.
02 — Four pipelines

Four questions. Four scores. No single composite.

A target with strong causal biology can be untouchable in clinic if its class carries a defining safety pattern. A target with elegant small-molecule chemistry can have no biological reason to be in the disease. The pipelines run side by side so disagreements stay visible.

Causal ranking
Would perturbing this gene change the disease?
  • Tier 1 causal. Mendelian randomisation + perturbation concordance
  • Tier 2 observational. Network propagation + cell-type expression context
  • Tier 3 transfer. Foundation-model embeddings for gene biology
cis-eQTL instruments from eQTLGen and tissue-matched GTEx are combined with disease GWAS estimates from UK Biobank, FinnGen and the EBI catalog. Perturbation signatures come from LINCS L1000. Literature is never a ranking signal.
Tractability
Can a molecule of a given class engage the target?
  • Antibody score. Surface, extracellular, post-translational features
  • Small-molecule score. ChEMBL binding, druggable pocket, affinity
  • Recovered population. 88.8% of approved antibody targets HIGH-rated
Built entirely from primary structural and binding-measurement data. Neither score uses literature mentions as an input. ERBB2 sits at rank 2 of 19,154 scored genes for antibody tractability. The score finds the bindable class without being told.
Safety
What adverse-event pattern does this target produce?
  • Per-organ AE prediction. Six organs, three evidence sources
  • Genetic essentiality. pLI, LOEUF, DepMap dependency
  • Expression context. Tau tissue-specificity + expression-risk score
Three axes reported separately. Per-organ scores integrate pharmacovigilance (FAERS, target-attributed), predictive toxicology (DILIrank, ToxCast), and mouse-knockout phenotyping (IMPC, MGI) across liver, heart, kidney, lung, CNS and immune.
Modality scoring
What kind of molecule is the right vehicle?
  • Antibody · Small molecule. Class-specific design constraints
  • ADC · Bispecific · T-cell engager. Independent design axes
  • Cross-modality discrimination. AUROC 0.880–0.963
ADC scoring uses internalisation kinetics; T-cell-engager scoring rewards the opposite. A generic druggability score cannot make that call. Per-modality scores correctly assign ERBB2 / EGFR / FOLR1 to ADC and TNFRSF17 / MS4A1 / DLL3 to TCE.
Disease Atlas separates translational questions instead of collapsing them into one score. The integration step preserves disagreements; the assessment card surfaces them.
03 — Probabilistic reasoning

Calibrated probability at three evidence boundaries.

Every gene–disease pair carries a probability, not a rank percentile, at three distinct evidence thresholds. Probabilities are constrained to remain hierarchically coherent: a pair cannot be more likely to be approved than to be in development.

P(causal support) does the genetic and perturbation evidence match approved targets?
0.00
AUC 0.947 · held-out ~95× enrichment over baseline
P(clinical advancement) does the broader profile resemble late-clinical targets?
0.00
AUC 0.917 · held-out ~1,100× enrichment over baseline
P(approval) does the integrated profile resemble approved targets?
0.00
AUC 0.885 · held-out ~560× enrichment over baseline

Three probabilities, one inspectable surface.

A target with high causal support and low clinical-advancement probability identifies biology the field has not yet developed. A target that scores well on all three sits in the evidence band where historically successful drug targets have lived.

The dashed band shows the uncertainty interval propagated from the per-disease confidence tier and evidence-source disagreement.

How calibration is measured

The platform models causal support, clinical advancement, and approval as three independent classifiers, each with its own calibration. Raw outputs pass through Platt scaling at the ranking layer; the translational layer applies branch-specific isotonic regression and tracks Brier and Expected Calibration Error after every training run.

ThresholdMean AUCMean APClass prevalence
Causal support0.9470.8909.4 × 10−3
Clinical development0.9170.0413.7 × 10−5
Approval0.8850.00731.3 × 10−5

The cancer and non-cancer branches are calibrated separately because the underlying prevalence patterns differ systematically. Hierarchical coherence is enforced post-calibration.

04 — Uncertainty

Three sources of uncertainty. Reported separately.

A single confidence number tells the reader nothing about why a prediction is uncertain. Disease Atlas separates the three reasons a probability might sit in the middle of the scale, then exposes them on every assessment card.

Evidence ambiguity Evidence disagreement Prediction instability Combined confidence

Evidence ambiguity

Irreducible noise in the evidence itself. A pair whose calibrated probability sits near the middle carries more ambiguity than one near either tail. This is data uncertainty.

Evidence disagreement

The genetic, network, expression, and transfer layers disagreeing with each other. Low when they converge, high when they diverge. The marker a single composite would hide.

Prediction instability

Sensitivity to the specific training population. A prediction that holds across held-out folds is more trustworthy than one that swings. This is model uncertainty.

05 — Evidence depth

Per-disease confidence reflects what the disease has, not what the model wants.

Roughly 15,000 diseases have rich genetic, perturbation, and single-cell evidence. The remaining long tail does not. The architecture is the same; the confidence with which any individual call should be taken is not.

A
Direct causal genetics Mendelian randomisation, fine-mapped GWAS loci, curated direct association. Probability supported by primary causal evidence and validated against held-out approved targets.
B
Curated association + propagation Direct associations from Open Targets sources plus ontology propagation from nearby Tier-A diseases. Probability supported by indirect causal evidence; calibration confirmed.
C
Expression + literature seeds HPO gene annotations, organ-level expression, and co-mention literature feeding the candidate set. Probability reflects an integrated profile, bounded by association depth.
D
Cell-state fallback inference Cell-type-resolved expression and organ–cell-type expression as the final fallback for the few percent of diseases where direct evidence does not yet exist. Useful as a hypothesis- generation prior, not a translational claim.
Confidence reflects evidence depth, not only model certainty. The tier is computed at seed time and carried through every subsequent layer.
A reader looking at a long-tail disease can see at a glance whether the call rests on primary genetic evidence or on a fallback in the seeding cascade. Tier C/D probabilities are deliberately conservative. A confident call against a thin evidence base is the failure mode the architecture is designed to avoid.
06 — Cell-type resolution

Cell type is a ranking axis, not a filter.

Genes are ranked per cell type per disease, not shown alongside the cell types they happen to express in. The candidate that emerges at cell-type resolution is often not the candidate a literature-weighted ranking would put first.

The Disease Atlas cell-type score across the integrated single-cell atlases. Top hits are anatomically coherent with the underlying biology. The platform is not told this is a gut disease; the ranking reads the answer from the data.

07 — Validation

Top-decile targets advance to trial success at nearly twice the rate of bottom-decile targets.

Measured against the AACT clinical-trial database: 2,587 completed Phase 2 and Phase 3 trials, leakage-clean cohort, primary-endpoint success defined as p < 0.05 with effect direction matching the trial's stated hypothesis.

91.5%
Top decile of platform ranking
50.0%
Bottom decile of platform ranking
1.8× lift DeLong paired-AUC p = 0.006 · across 2,587 trials

What this measures.

The cohort is restricted to trials that started before 2021 and whose targets could be mapped cleanly via a five-stage drug-to-target resolution. Every per-pipeline score contributes a signal that discriminates outcomes at Mann-Whitney p < 10−3; the integrated four-pipeline combination outperforms the strongest single score at DeLong p = 0.006.

The signal concentrates where the biology says it should: AUC 0.58–0.68 in Phase 1, 0.54–0.64 in Phase 2, and near 0.5 by Phase 3+ where failures become more idiosyncratic.

99.4%
Approved targets in top 1% of per-disease ranking
0.994
Mean within-disease pairwise AUC across diseases
88.8%
Approved antibody targets HIGH-rated by tractability
Caveats and what this does not establish

The recovery test asks whether the model finds correct answers in a space where the correct answers are already known. It does not establish prospective novel-target discovery at statistical scale, which the field cannot run at the timescale of any methodology paper.

The cohort base success rate is 66.1%, considerably higher than the field-wide rate, because pharma already filters target candidates on tractability and biology before initiating trials. AUC magnitudes are therefore conservative against balanced-cohort benchmarks.

08 — Example assessment

What your scientists actually see.

One target. One disease. The four pipelines side by side, the disagreements surfaced, the confidence band exposed, and the cell-type context attached. Below is the assessment card for TNFSF15 in ulcerative colitis: the top-ranked target of 30,889 scored genes, currently in Phase 3 development.

TNFSF15 in ulcerative colitis30,889 scored genes · MONDO:0005101 · pipeline run 2026
Rank 1 · Overall priority 90.86 / 100
Causal biology
99.94
Top 0.06 % on biology axis · MR (genetic) + perturbation
Translational
99.998
3 indications in clinical development · strong GWAS support
Tractability
77
Antibody-tractable extracellular ligand · 3 antibody programmes
Safety
62.5
Moderate concerns · no critical flags · essentiality tolerable

Structured rationale

Strong causal evidence (top 0% for this disease). Strongest support from genetic association: multiple genome-wide-significant loci, replicated in independent cohorts, with Mendelian-randomisation evidence linking expression to disease risk. Network propagation and cell-type expression context corroborate without driving the score. Foundation-model embedding contributes additional support.

Direct causal inference (MR). Per-disease confidence tier: A.

Modality fit

Antibody
84.3
Bispecific
71.4
T-cell engager
63.9
Small molecule
31.8

Secreted TNF-superfamily ligand. Architecture lands on antibody as the design the biology supports; comparison auditable.

Combined confidence band: HIGH. Pair-level stability 0.998 · overlay completeness full · evidence-tier A.

TNFSF15 (the TL1A ligand) sits at rank 1 in both ulcerative colitis and Crohn's disease. The assessment is consistent with the front-running biology the development community is currently pursuing, including anti-TL1A programmes from multiple sponsors.

09 — Transparency & validation

Every score is reproducible, traceable to source, and produced without large-language-model reasoning at any stage.

Deterministic provenance

Every claim traces back to the primary record that produced it.

Held-out validation

Disease-stratified five-fold cross-validation with leave-seed-out masking.

Leakage controls

The training pipeline aborts on held-out divergence beyond a strict tolerance.

Calibrated probabilities

Platt and isotonic calibration; Brier and ECE tracked per training run.

Uncertainty propagated

The signal carries forward through every layer; nothing important is averaged away.

Read the methodology. Run a target through the platform.

The whitepaper covers every pipeline, every reference, and the limitations explicitly. If you would like to see what the assessment view produces for one target in your therapeutic area, send us the gene and we will walk through it across the diseases where it scores.

Request a target walkthrough