TRAILED¶

Topological Regularization and Integrity Learning for EHR Data

Warning

TRAILED is under active development. The current release provides the foundational ECT (Euler Characteristic Transform) implementation. Healthcare-specific methods — including density-aware descriptors, patient manifold construction, and clinical fidelity metrics — are in progress.

TRAILED is a topological representation learning library for Electronic Health Record (EHR) data. It provides methods for analyzing patient trajectories, validating synthetic data, and assessing clinical fidelity using topological techniques.

Quick Links¶

Quickstart

Compute topological descriptors from patient data in minutes.

Quickstart

User Guide

Installation, configuration, and clinical workflows.

User Guide

API Reference

Full API documentation for TRAILED modules.

API Reference

Benchmarks

Performance comparison: trailed vs. upstream dect.

Benchmarks

Why TRAILED?¶

Longitudinal EHR analysis and synthetic data generation face two persistent challenges that standard metrics fail to detect:

Mode Collapse: Rare but clinically significant phenotypes — pediatric rare diseases, underrepresented demographics — are often absent from synthetic datasets. Models trained on such data fail silently on these populations. Pairwise statistical metrics miss these coverage gaps because they cannot capture higher-order structure.
Pathological Interpolation: Generative models produce synthetic patient trajectories that pass through biologically implausible states: impossible lab value transitions, contradictory comorbidities, or clinically incoherent sequences. These failures create safety risks and degrade downstream model reliability.

TRAILED addresses these problems using topological methods that capture global structure in patient trajectory spaces — detecting patterns invisible to coordinate-based metrics.

Core Capabilities¶

Topological Descriptors: Representations that capture shape and structure in clinical latent spaces
Differentiable: Full gradient support for training-time regularization of generative models
Patient Manifold Analysis: Characterize trajectory spaces and identify impossible state transitions
Fidelity Metrics: Quantify real-vs-synthetic alignment in coordinate-free topological space

Architecture¶

        graph LR
   subgraph Input
      A[Patient Embeddings / Trajectories]
   end

   subgraph "TRAILED Core"
      B["Topological Analysis"]
      C["Manifold Construction"]
      D["Fidelity Scoring"]
   end

   subgraph "Applications"
      E["Training Regularizer"]
      F["Synthetic Data QA"]
      G["Trajectory Analysis"]
   end

   A --> B
   A --> C
   B --> D
   C --> D
   D --> E
   D --> F
   D --> G

   style Input fill:#f9f9f9,stroke:#999
   style B fill:#D9EDF7,stroke:#31708F,stroke-width:2px
   style D fill:#DFF0D8,stroke:#3C763D,stroke-width:2px

Example Usage¶

Computing Topological Descriptors

import numpy as np
from trailed import compute_ect_from_numpy

# Patient embeddings from EHR data
patient_embeddings = np.load("embeddings.npy")
descriptor = compute_ect_from_numpy(patient_embeddings, num_thetas=32, resolution=64)

Training Regularization (PyTorch)

For PyTorch neural network use cases, use the upstream aidos-lab/dect package:

from dect.nn import ECTLayer, ECTConfig

ect_layer = ECTLayer(ECTConfig(num_thetas=32, resolution=32))

# Regularize generative model to preserve topological structure
real_topo = ect_layer(real_batch)
synthetic_topo = ect_layer(generated_batch)
topo_loss = torch.nn.functional.mse_loss(synthetic_topo, real_topo)

Synthetic Data Fidelity

# Compare topological structure of real vs synthetic cohorts
real_descriptor = compute_ect_from_numpy(real_embeddings, num_thetas=64, resolution=64)
synthetic_descriptor = compute_ect_from_numpy(synthetic_embeddings, num_thetas=64, resolution=64)

fidelity_score = np.linalg.norm(real_descriptor - synthetic_descriptor)

Installation¶

We recommend installing with uv for fast, reliable dependency resolution:

uv pip install trailed

With optional dependencies:

uv pip install trailed[sklearn]     # sklearn integration
uv pip install trailed[dataframe]   # pandas/polars integration
uv pip install trailed[all]         # everything

For PyTorch use cases, install the upstream dect package:

pip install dect @ git+https://github.com/aidos-lab/DECT/

Or with pip:

pip install trailed

For development:

git clone https://github.com/Krv-Analytics/trailed.git
cd trailed
uv sync --extra dev --extra docs

Supports Python 3.10, 3.11, 3.12.

Next Steps¶

Quickstart - Compute your first topological descriptor
Overview - Technical background and roadmap
User Guide - Installation and configuration
API Reference - Full class and function docs