.. _quickstart: ========== Quickstart ========== This guide gets you computing topological descriptors in under 5 minutes. Prerequisites ------------- - Python 3.10+ - TRAILED installed (``uv pip install trailed`` or ``pip install trailed``) What is ECT? ------------ The **Euler Characteristic Transform (ECT)** is a topological descriptor that captures shape information. It works by: 1. Projecting points onto multiple directions 2. Computing the Euler characteristic at different filtration levels 3. Returning a vector that uniquely characterizes the shape For EHR applications, ECT descriptors can capture higher-order structure in patient embedding spaces — detecting patterns that coordinate-based metrics miss. Basic Usage with NumPy ---------------------- **Step 1: Prepare your data** TRAILED operates on point clouds or embeddings. For EHR data, this is typically patient embeddings from a representation learning model. .. code-block:: python import numpy as np # Example: patient embeddings (100 patients, 64-dimensional) patient_embeddings = np.random.randn(100, 64) print(f"Embedding shape: {patient_embeddings.shape}") **Step 2: Compute the ECT descriptor** .. code-block:: python from trailed import compute_ect_from_numpy descriptor = compute_ect_from_numpy( patient_embeddings, num_thetas=32, # Number of directions resolution=64, # Filtration resolution ) print(f"Descriptor shape: {descriptor.shape}") The result is a 1D vector capturing the topological structure of your patient cohort. Using pandas DataFrames ----------------------- TRAILED works directly with pandas: .. code-block:: python import pandas as pd from trailed import compute_ect_from_pandas # Patient features as DataFrame df = pd.DataFrame(np.random.randn(100, 10), columns=[f"feat_{i}" for i in range(10)]) descriptor = compute_ect_from_pandas(df, num_thetas=32, resolution=64) Comparing Cohorts ----------------- A key use case is comparing topological structure between cohorts — for example, real vs. synthetic patient data: .. code-block:: python from trailed import compute_ect_from_numpy import numpy as np # Real patient embeddings real_embeddings = np.load("real_embeddings.npy") # Synthetic patient embeddings synthetic_embeddings = np.load("synthetic_embeddings.npy") # Compute topological descriptors real_ect = compute_ect_from_numpy(real_embeddings, num_thetas=32, resolution=64) synthetic_ect = compute_ect_from_numpy(synthetic_embeddings, num_thetas=32, resolution=64) # Topological distance distance = np.linalg.norm(real_ect - synthetic_ect) print(f"Topological distance: {distance:.4f}") sklearn Integration ------------------- Use ECT features in machine learning pipelines: .. code-block:: python from sklearn.pipeline import Pipeline from sklearn.svm import SVC from trailed.plugins.sklearn import EctTransformer pipe = Pipeline([ ("ect", EctTransformer(num_thetas=32, resolution=64)), ("clf", SVC()), ]) # X is a list of point clouds, y is labels pipe.fit(X_train, y_train) accuracy = pipe.score(X_test, y_test) .. note:: sklearn integration requires ``uv pip install trailed[sklearn]``. PyTorch Integration ------------------- For differentiable ECT in PyTorch deep learning workflows, use the upstream `aidos-lab/dect `_ package: .. code-block:: bash pip install dect @ git+https://github.com/aidos-lab/DECT/ .. code-block:: python import torch from dect.nn import ECTLayer, ECTConfig ect_layer = ECTLayer(ECTConfig(num_thetas=32, resolution=64)) # Input: batch of point clouds [B, N, D] point_clouds = torch.randn(8, 100, 64, requires_grad=True) descriptors = ect_layer(point_clouds) # Gradients flow through loss = descriptors.sum() loss.backward() **Example: Topological Regularization** .. code-block:: python # In a generative model training loop real_batch = ... # Real patient embeddings generated_batch = ... # Generated patient embeddings real_ect = ect_layer(real_batch) generated_ect = ect_layer(generated_batch) # Topological loss encourages structural similarity topo_loss = torch.nn.functional.mse_loss(generated_ect, real_ect) total_loss = reconstruction_loss + lambda_topo * topo_loss Tuning Parameters ----------------- Two key parameters control the descriptor: - **num_thetas**: Number of directions to sample. More directions = finer angular resolution. Typical values: 16-64. - **resolution**: Filtration granularity. Higher = more detail. Typical values: 32-128. .. code-block:: python # High resolution for detailed analysis desc_detailed = compute_ect_from_numpy(embeddings, num_thetas=64, resolution=128) # Low resolution for fast computation desc_fast = compute_ect_from_numpy(embeddings, num_thetas=16, resolution=32) Next Steps ---------- - :doc:`programmatic` - Advanced API usage - :doc:`intermediate` - Direction sampling strategies - :doc:`advanced` - Custom filtrations and optimization - :ref:`Integrations ` - sklearn and tabular integration details