TRAILED

Topological Regularization and Integrity Learning for EHR Data

Warning

TRAILED is under active development. The current release provides the foundational ECT (Euler Characteristic Transform) implementation. Healthcare-specific methods — including density-aware descriptors, patient manifold construction, and clinical fidelity metrics — are in progress.

TRAILED is a topological representation learning library for Electronic Health Record (EHR) data. It provides methods for analyzing patient trajectories, validating synthetic data, and assessing clinical fidelity using topological techniques.

Why TRAILED?

Longitudinal EHR analysis and synthetic data generation face two persistent challenges that standard metrics fail to detect:

Mode Collapse

Rare but clinically significant phenotypes — pediatric rare diseases, underrepresented demographics — are often absent from synthetic datasets. Models trained on such data fail silently on these populations. Pairwise statistical metrics miss these coverage gaps because they cannot capture higher-order structure.

Pathological Interpolation

Generative models produce synthetic patient trajectories that pass through biologically implausible states: impossible lab value transitions, contradictory comorbidities, or clinically incoherent sequences. These failures create safety risks and degrade downstream model reliability.

TRAILED addresses these problems using topological methods that capture global structure in patient trajectory spaces — detecting patterns invisible to coordinate-based metrics.

Core Capabilities

  • Topological Descriptors: Representations that capture shape and structure in clinical latent spaces

  • Differentiable: Full gradient support for training-time regularization of generative models

  • Patient Manifold Analysis: Characterize trajectory spaces and identify impossible state transitions

  • Fidelity Metrics: Quantify real-vs-synthetic alignment in coordinate-free topological space

Architecture

        graph LR
   subgraph Input
      A[Patient Embeddings / Trajectories]
   end

   subgraph "TRAILED Core"
      B["Topological Analysis"]
      C["Manifold Construction"]
      D["Fidelity Scoring"]
   end

   subgraph "Applications"
      E["Training Regularizer"]
      F["Synthetic Data QA"]
      G["Trajectory Analysis"]
   end

   A --> B
   A --> C
   B --> D
   C --> D
   D --> E
   D --> F
   D --> G

   style Input fill:#f9f9f9,stroke:#999
   style B fill:#D9EDF7,stroke:#31708F,stroke-width:2px
   style D fill:#DFF0D8,stroke:#3C763D,stroke-width:2px
    

Example Usage

Computing Topological Descriptors

import numpy as np
from trailed import compute_ect_from_numpy

# Patient embeddings from EHR data
patient_embeddings = np.load("embeddings.npy")
descriptor = compute_ect_from_numpy(patient_embeddings, num_thetas=32, resolution=64)

Training Regularization (PyTorch)

For PyTorch neural network use cases, use the upstream aidos-lab/dect package:

from dect.nn import ECTLayer, ECTConfig

ect_layer = ECTLayer(ECTConfig(num_thetas=32, resolution=32))

# Regularize generative model to preserve topological structure
real_topo = ect_layer(real_batch)
synthetic_topo = ect_layer(generated_batch)
topo_loss = torch.nn.functional.mse_loss(synthetic_topo, real_topo)

Synthetic Data Fidelity

# Compare topological structure of real vs synthetic cohorts
real_descriptor = compute_ect_from_numpy(real_embeddings, num_thetas=64, resolution=64)
synthetic_descriptor = compute_ect_from_numpy(synthetic_embeddings, num_thetas=64, resolution=64)

fidelity_score = np.linalg.norm(real_descriptor - synthetic_descriptor)

Installation

We recommend installing with uv for fast, reliable dependency resolution:

uv pip install trailed

With optional dependencies:

uv pip install trailed[sklearn]     # sklearn integration
uv pip install trailed[dataframe]   # pandas/polars integration
uv pip install trailed[all]         # everything

For PyTorch use cases, install the upstream dect package:

pip install dect @ git+https://github.com/aidos-lab/DECT/

Or with pip:

pip install trailed

For development:

git clone https://github.com/Krv-Analytics/trailed.git
cd trailed
uv sync --extra dev --extra docs

Supports Python 3.10, 3.11, 3.12.

Next Steps

References