Last Wednesday we drew the distinction between two meanings of "causal evidence" in target prioritisation: causal claims extracted from text, and quantitative causal inference run on primary biological data. Disease Atlas does the second.
On Friday we walked through the five evidence layers the platform integrates: Mendelian randomisation, perturbation biology, network propagation, cell-type expression, and foundation-model embedding. Each one runs on primary biological data. Each one answers a different causal question.
This post is the deeper picture. The full architecture, the safety stack the LinkedIn posts did not have room for, the cell-type-resolution layer that everything else sits on, what we have calibrated, what we have not yet validated, and the architectural choices we made about what stays out of the inference path entirely.
The depth is set so a reviewer can decide for themselves whether the architecture holds up to the work they would put it to.
A computational platform that ranks the protein-coding genome against every one of approximately 34,000 disease and disorder entries, with each per-disease ranking resolved to a layout of approximately 2,300 cell types. The platform is built from primary biological data (GWAS summary statistics, perturbation magnitudes, single-cell expression, structure data, drug-target binding measurements), not from claims extracted from published papers.
It produces three outputs a researcher actually uses. A ranked candidate list per disease, with evidence decomposed into biology, tractability, safety, and competition. A per-disease cell-type map showing which cell types carry the strongest causal signal. And a target-disease detail view that surfaces a complete and traceable rationale for any candidate-disease pair.
The Euretos AI Platform described in Vlietstra et al. (2018, J Biomed Semantics 9:23), and named in the J&J 2019 acne patent (US 12,051,491 B2), was Euretos's first-generation target-discovery pipeline. The current platform is its successor and carries three architectural decisions forward.
The integrated knowledge graph as foundation. The 250-plus source count of the first generation has expanded to roughly 275 public databases. Every claim in the graph still retains its predicate and provenance back to source, which is the original architectural decision that makes everything downstream auditable.
Cell-type expression deconvolution as a first-class capability. The first-generation cell-type expression library has scaled to 3,500-plus single-cell expression profiles across 110 sub-organs, anchored to the modern atlases (Tabula Sapiens, CellxGene, Human Cell Atlas).
Output structure of ranked candidates with mechanistic context. The patent's "list of targets with associated probabilities" is the direct ancestor of the current candidate-ranking view.
What follows is everything that was added.
The largest break from the first-generation pipeline. The first generation aggregated evidence by graph proximity, perturbation evidence, and literature-derived signal, then reported probability-style scores. The current platform replaces that with an explicit three-tier architecture in which literature does not appear as a ranking signal at any tier.
Tier 1 is causal evidence on primary biology. Two channels.
The first is Mendelian randomisation on GWAS effect sizes from UK Biobank, FinnGen, and the EBI GWAS Catalog, with LD clumping, Cochran's Q heterogeneity testing, Steiger directionality, and F-statistic instrument filtering. The exposure is gene-level (eQTL-derived); the outcome is disease risk. Inherited genetic variation acts as the natural experiment that lets the inference approach a causal question rather than an associational one.
The second is perturbation concordance against disease-specific transcriptomic signatures, drawn from LINCS L1000, DepMap, and curated perturbation atlases. The question this channel answers: when this gene is knocked down or activated in disease-relevant cells, does the resulting transcriptomic shift point toward or away from the disease state?
Tier 2 is observational evidence on network and expression context. Two channels.
Network propagation over the protein-protein interaction graph from a seed set of disease-implicated genes, producing a propagated score per gene that captures pathway proximity. Cell-type expression context derived from the single-cell atlases, scoring each gene by how strongly it is expressed in the cell types where the disease operates.
Tier 3 is transfer evidence for genes that lack direct signal. A single channel: foundation-model embeddings over protein sequence and biological-graph context, used to transfer evidence to genes for which Tier 1 and Tier 2 do not produce a usable signal.
The three tiers combine into a per-disease ranked list. The integration uses explicit causal weighting: Mendelian-randomisation evidence over observational expression, perturbation evidence over literature mention, convergent signal over single-source. Not a democratic average.
Literature is used only to seed the candidate gene set for under-characterised diseases (an eight-tier seeding cascade that selects which genes enter the ranking), but the seeding is independent of the scoring; the same gene is scored identically whether it entered via a literature seed or a primary-data seed. The structural advantage that literature density used to confer on well-studied genes is no longer in the architecture.
The safety layer of the first-generation pipeline was a single composite score. The current platform runs three independent safety axes per target, computed and reported separately so a researcher (or a downstream model) can read each on its own terms.
Axis 1: per-organ adverse-event prediction. Six organ scores, one each for liver, heart, kidney, lung, central nervous system, and immune system. Each integrates three input categories: pharmacovigilance signals from FAERS with target attribution; predictive toxicology screens from in vitro panels, in vivo tox studies, and the DILIrank hepatotoxicity database; and mouse-knockout phenotyping from IMPC and MGI. Each per-organ score carries an explicit confidence and an evidence-level label that distinguishes observed clinical signal from model prediction.
Axis 2: genetic essentiality. Combines pLI and LOEUF from gnomAD (the population-genetics signal of loss-of-function tolerance in healthy humans) with DepMap cancer cell-line dependency (the cellular-fitness signal). Karczewski et al. (2020, Nature 581) is the anchor reference. A gene with high pLI and high DepMap dependency is structurally less safe to inhibit; a gene with low pLI and low DepMap dependency is structurally more loseable.
Axis 3: expression context. Two scores. Tau tissue-specificity (Yanai et al. 2005) over GTEx and the CellxGene cell-type atlases captures how broadly a gene is expressed; a focally-expressed gene is safer to drug than a constitutively-expressed one. A separate expression-risk score adds the criticality of the tissues where expression is observed, with heart, central nervous system, and immune cells weighted higher.
The three axes are computed independently. They are deliberately not collapsed into a single safety score because their failure modes are different and a researcher needs to see which axis is driving any concern. A composite is computed for display purposes but always sits alongside the three axis-level scores.
For every disease entry, the platform stores a layout of approximately 2,300 cell types, shaded by how strongly each cell type is implicated in that disease. The shading is computed from primary cell-type expression data combined with the disease-specific causal evidence above, producing a chance-adjusted score per cell type.
This is a foundation, not a display layer. The same cell-type map drives three things.
Target ranking. Candidates are scored in the cell-type context where the disease operates, not in a tissue-averaged context. A gene that is irrelevant in the disease-driving cell type does not get a top rank just because it shows up in tissue-level bulk expression.
Indication expansion. Given a known target, the cell-type signature of where that target acts can be matched against other diseases whose cell-type maps overlap. The biology of where the target works becomes the bridge between indications, rather than literature counts of co-occurrence.
Proprietary-data overlay. A researcher who uploads their own gene set (a screen-derived modifier list, a kinase shortlist, a differentially-expressed gene set from a custom dataset) gets that set scored against the full per-disease cell-type structure, with the same evidence decomposition that the platform's own ranked candidates receive.
The first-generation cell-type expression library predated the modern single-cell atlas era. The current platform anchors on Tabula Sapiens, CellxGene, and Human Cell Atlas, which is what makes the 3,500-plus cell-type-profile coverage possible.
What is calibrated. Per-threshold probability calibration on the ranked output: ordinal heads are Platt-calibrated; branched translational heads (cancer and non-cancer) are isotonically calibrated within branch. Brier scores and ECE are reported pre- and post-calibration. Output probabilities are constrained so that the probability a candidate reaches Phase 2 is at least the probability it reaches Phase 3, which is at least the probability it gets approved, preventing inversions in the ranking. A per-disease confidence indicator (four levels) is exposed so a researcher knows whether they are looking at a high-confidence ranking on a deeply studied disease, or a lower-confidence ranking on a long-tail disease where primary-data evidence is sparser.
What has been validated retrospectively. Cross-indication concordance: candidates that are known approved drug targets in one disease tend to surface in adjacent diseases where the underlying biology is shared. This is a sanity check, not a formal validation. A trial-outcome retrospective is in progress against roughly 2,000 completed Phase 2 and Phase 3 trials drawn from AACT (the public ClinicalTrials.gov database) with reported primary-endpoint p-values. The analysis tests whether trials whose target carries a high causal-evidence ranking succeed at a higher rate than trials whose target does not. Cohort definition, drug-to-target mapping, and the per-subscore architecture being tested were pre-committed in writing before the result is computed.
What has not yet been validated. Prospective tracking: the platform's predictions for currently running trials have been snapshotted, but predictive performance against the actual trial readouts cannot be assessed until those trials report (the bulk of readouts arrive 2027–2028). Novel-target discovery rate: no published comparison yet of the rate at which the platform surfaces genuinely novel targets that are subsequently validated by independent labs. This is the validation we want and do not have.
Three architectural choices worth surfacing for any methodology-focused reader.
No language model in the scoring or ranking layer. All scores are produced by deterministic numerical pipelines. Language models do not appear in the layer that decides which targets get ranked higher for which diseases.
Primary biological data as the input layer. GWAS effect sizes, single-cell expression matrices, perturbation magnitudes, protein-structure data, drug-target binding measurements. Published papers are used as a discovery layer (which datasets to ingest, which assay protocols matter, which findings to seed candidate sets from) but not as an extraction source for the platform's ranking signal.
Full provenance per score. Every score in the output carries a data_provenance block: the named, versioned data sources it draws from, the versioned model identifier that produced it, the prediction timestamp, and the feature coverage percentage. A pharma governance committee or external auditor can reconstruct any score from inputs and model version. This is the audit-trail commitment that the first-generation pipeline's predicate-and-provenance architecture made possible, carried forward and tightened.
The disease ontology coverage is broader than most reference platforms but is not complete. The 34,000-plus disease and disorder registry maps onto a union of EFO, Mondo, DOID, OMIM, MeSH, and HPO. Of those entries, 33,942 carry full target rankings; roughly 80 have insufficient information to score.
Mendelian-randomisation evidence is only available where suitable instruments exist. For genes without strong eQTL instruments, the Tier 1 causal channel relies on perturbation concordance alone, and where neither channel fires the ranking falls back to Tier 2 observational. The hierarchy is exposed in the per-gene scorecard so a researcher can see whether a high rank is supported by causal evidence or by observational evidence alone.
Cell-type coverage is denser in some compartments than others. Single-cell coverage is dense in immune tissues, epithelial tissues, and the major solid organs; it is thinner in connective tissue, peripheral nervous system, and developmental cell populations. The cell-type map is structurally accurate but the resolution varies.
Prospective tracking against currently running trials continues quietly until the bulk of readouts arrive in 2027–2028. Per-organ safety calibration on well-prescribed targets is in active rework.
Disease Atlas is a research-grade target-prioritisation platform with explicit causal weighting, full audit trail per score, and a multi-axis decision-aware output. We are confident in the architecture. A researcher running it against their own programmes can make their own call.
The complete methodology page (deeper than this post) is at www.euretos.com/disease-atlas-methodology. The first-generation peer-reviewed paper (Vlietstra et al. 2018) and the trial-retrospective protocol (when it publishes) are linked from there.
If you find something that does not hold up, write. The methodology survives external pressure or it does not, and we would rather know.
Disease Atlas is built and operated by Euretos. The current methodology was developed by the Euretos R&D team and reviewed against the first-generation pipeline (Vlietstra et al. 2018) academic literature on quantitative causal evidence for drug-target prioritisation. Comments, corrections, and pushback welcome at information@euretos.com.
Vlietstra W. et al. (2018). Using predicate and provenance information from a knowledge graph for drug efficacy screening. J Biomed Semantics 9:23.
Karczewski K.J. et al. (2020). The mutational constraint spectrum quantified from variation in 141,456 humans. Nature 581, 434–443.
Yanai I. et al. (2005). Genome-wide midrange transcription profiles reveal expression level relationships in human tissue specification. Bioinformatics 21(5), 650–659.
Nelson M.R. et al. (2015). The support of human genetic evidence for approved drug indications. Nature Genetics 47, 856–860.
King E.A., Davis J.W., Degner J.F. (2019). Are drug targets with genetic support twice as likely to be approved? Revised estimates of the impact of genetic support for drug mechanisms on the probability of drug approval. PLOS Genetics 15(12), e1008489.
Tsepilov Y.A. et al. (2026). Open Targets Gentropy pleiotropic map. bioRxiv, DOI 10.1101/2026.04.28.721048.
US Patent 12,051,491 B2 (granted from US 2020/0265937 A1). First-generation Euretos pipeline named in the J&J 2019 acne work.