Knowledge graphs at scale: cell-type-resolved target discovery

Written by Euretos News | Feb 19, 2025 8:00:00 AM

A previous post on this blog described how the Euretos knowledge graph integrates heterogeneous biological evidence (gene-level, tissue-level, literature-level) under common ontologies, using asthma as a worked example. The point of that piece was simple: when evidence is spread across hundreds of databases with no shared vocabulary, integration is the bottleneck — not analysis.

This post extends that work in one direction: cell-type resolution.

The averaging problem in target discovery

Most disease research begins with bulk-tissue expression data. A scientist asks which genes are differentially expressed in diseased tissue, ranks the candidates, and walks down the list. The pattern is so familiar that the underlying assumption disappears from view: that "tissue" is a meaningful unit.

It often is not. A piece of human gut contains epithelial cells, smooth muscle, several fibroblast subtypes, multiple resident and infiltrating immune populations, vasculature, and enteric neurons. Each of those carries a distinct transcriptome. Each can be the disease driver, the bystander, or the response cell. Bulk RNA-seq returns a weighted average across them all, and ranking targets on that average can put a gene at the top because it is highly expressed in a cell type that has nothing to do with the disease.

Single-cell transcriptomics solved that on the data side. The problem then moved to integration: how does a researcher combine cell-type-specific expression with ontology-mapped genetic evidence, with literature-derived associations, with perturbation data — without writing the integration code themselves?

Cell-type expression as a Search filter

The Euretos AI Platform now treats cell type as a first-class filter inside the existing Search interface. The query proceeds in the same shape researchers already know:

Pick a gene category (kinase, GPCR, transcription factor, secreted protein).
Pick a tissue context (lung, gut mucosa, skin, brain region).
Restrict to a cell type or a set of cell types within that tissue.

The Search returns the candidate set with the same ranking machinery the platform already applies to disease-gene associations. The difference is that the candidate genes are now pre-filtered to those expressed in the cell types that matter for the disease — not the average lung, but alveolar type-2 cells; not the average gut, but tuft cells, goblet cells, or a specific dendritic-cell subset.

Two results follow.

The first is shorter candidate lists. A typical bulk-tissue query for a respiratory indication returns several thousand candidates above the expression threshold. Adding a cell-type filter for a specific epithelial or immune subset typically reduces that to a few hundred. The cost in author time is not measured in days of bioinformatics work; it is a single click in the Search interface.

The second is more interpretable lists. When a candidate gene is at the top of a cell-type-resolved query, the researcher already knows where in the tissue the biology is. The next analytical step — looking at perturbation evidence in the same cell type, checking the genetic association strength, reading the literature anchor — proceeds inside the same knowledge-graph context.

Where the cell-type catalogue comes from

The platform's cell-type catalogue is built from publicly-curated single-cell atlases — the Human Cell Atlas, Tabula Sapiens, and disease-specific atlases that have been published over the past five years across major indications. The Euretos team mapped each cell type into the existing knowledge graph, using cell ontology (CL) identifiers as the bridge to disease and gene evidence.

That mapping is what allows the Search filter to work consistently. A query for "tuft cells in colon" returns the same candidate ranking irrespective of which atlas was used to define the cell type, because both end up at the same CL term. This is the same ontology discipline that made cross-database integration work for genes and diseases in the first place; cell type is the next axis on the same map.

For pulmonary fibrosis as one published example, single-cell atlases have identified an aberrant basaloid cell population that sits at the edge of fibroblastic foci and co-expresses basal epithelial, mesenchymal, and senescence markers (Adams et al. 2020, and reviewed in Justet, Zhao & Kaminski, Hum Genomics 2022). A target-discovery query restricted to that cell population is fundamentally different from one across "lung tissue" — and the platform now lets a researcher run it without leaving Search.

What this enables

The pattern is not new. The methodological argument for cell-type-resolved target work has been in the literature for several years. What the AI Platform does is remove the integration effort: the cell-type catalogue, the ontology mapping, the cross-evidence ranking, and the candidate filtering all sit inside one query interface, against an integrated knowledge base of more than 275 public databases.

That changes who can run the analysis. A translational scientist studying a specific disease, without a dedicated bioinformatics team, can now ask a cell-type-specific target-discovery question and get a defensible answer in minutes rather than weeks. The knowledge graph remains the substrate; the cell-type filter is the new dimension.

View full post