TRAILED¶
Topological Regularization and Integrity Learning for EHR Data
Warning
TRAILED is under active development. The current release provides the foundational ECT (Euler Characteristic Transform) implementation. Healthcare-specific methods — including density-aware descriptors, patient manifold construction, and clinical fidelity metrics — are in progress.
TRAILED is a topological representation learning library for Electronic Health Record (EHR) data. It provides methods for analyzing patient trajectories, validating synthetic data, and assessing clinical fidelity using topological techniques.
Quick Links¶
Compute topological descriptors from patient data in minutes.
Installation, configuration, and clinical workflows.
Full API documentation for TRAILED modules.
Performance comparison: trailed vs. upstream dect.
Why TRAILED?¶
Longitudinal EHR analysis and synthetic data generation face two persistent challenges that standard metrics fail to detect:
- Mode Collapse
Rare but clinically significant phenotypes — pediatric rare diseases, underrepresented demographics — are often absent from synthetic datasets. Models trained on such data fail silently on these populations. Pairwise statistical metrics miss these coverage gaps because they cannot capture higher-order structure.
- Pathological Interpolation
Generative models produce synthetic patient trajectories that pass through biologically implausible states: impossible lab value transitions, contradictory comorbidities, or clinically incoherent sequences. These failures create safety risks and degrade downstream model reliability.
TRAILED addresses these problems using topological methods that capture global structure in patient trajectory spaces — detecting patterns invisible to coordinate-based metrics.
Core Capabilities¶
Topological Descriptors: Representations that capture shape and structure in clinical latent spaces
Differentiable: Full gradient support for training-time regularization of generative models
Patient Manifold Analysis: Characterize trajectory spaces and identify impossible state transitions
Fidelity Metrics: Quantify real-vs-synthetic alignment in coordinate-free topological space
Architecture¶
graph LR
subgraph Input
A[Patient Embeddings / Trajectories]
end
subgraph "TRAILED Core"
B["Topological Analysis"]
C["Manifold Construction"]
D["Fidelity Scoring"]
end
subgraph "Applications"
E["Training Regularizer"]
F["Synthetic Data QA"]
G["Trajectory Analysis"]
end
A --> B
A --> C
B --> D
C --> D
D --> E
D --> F
D --> G
style Input fill:#f9f9f9,stroke:#999
style B fill:#D9EDF7,stroke:#31708F,stroke-width:2px
style D fill:#DFF0D8,stroke:#3C763D,stroke-width:2px
Example Usage¶
Computing Topological Descriptors
import numpy as np
from trailed import compute_ect_from_numpy
# Patient embeddings from EHR data
patient_embeddings = np.load("embeddings.npy")
descriptor = compute_ect_from_numpy(patient_embeddings, num_thetas=32, resolution=64)
Training Regularization (PyTorch)
For PyTorch neural network use cases, use the upstream aidos-lab/dect package:
from dect.nn import ECTLayer, ECTConfig
ect_layer = ECTLayer(ECTConfig(num_thetas=32, resolution=32))
# Regularize generative model to preserve topological structure
real_topo = ect_layer(real_batch)
synthetic_topo = ect_layer(generated_batch)
topo_loss = torch.nn.functional.mse_loss(synthetic_topo, real_topo)
Synthetic Data Fidelity
# Compare topological structure of real vs synthetic cohorts
real_descriptor = compute_ect_from_numpy(real_embeddings, num_thetas=64, resolution=64)
synthetic_descriptor = compute_ect_from_numpy(synthetic_embeddings, num_thetas=64, resolution=64)
fidelity_score = np.linalg.norm(real_descriptor - synthetic_descriptor)
Installation¶
We recommend installing with uv for fast, reliable dependency resolution:
uv pip install trailed
With optional dependencies:
uv pip install trailed[sklearn] # sklearn integration
uv pip install trailed[dataframe] # pandas/polars integration
uv pip install trailed[all] # everything
For PyTorch use cases, install the upstream dect package:
pip install dect @ git+https://github.com/aidos-lab/DECT/
Or with pip:
pip install trailed
For development:
git clone https://github.com/Krv-Analytics/trailed.git
cd trailed
uv sync --extra dev --extra docs
Supports Python 3.10, 3.11, 3.12.
Next Steps¶
Quickstart - Compute your first topological descriptor
Overview - Technical background and roadmap
User Guide - Installation and configuration
API Reference - Full class and function docs