Trailed API¶

TRAILED - Topological Regularization and Integrity Learning for EHR Data

A fast Rust-backed implementation of the Euler Characteristic Transform (ECT) for sklearn pipelines and tabular/DataFrame workflows, focused on EHR data.

For PyTorch neural network use cases, use the upstream aidos-lab/dect package:: pip install dect @ git+https://github.com/aidos-lab/DECT/

class trailed.DataFrameEctTransformer(coord_columns: List[str], group_column: str | None = None, channel_column: str | None = None, num_thetas: int = 64, resolution: int = 64, radius: float = 1.0, scale: float = 500.0, sampling_method: str = 'uniform', seed: int = 42, normalized: bool = False, parallel: bool = True, output_format: Literal['numpy', 'pandas', 'polars'] = 'numpy')[source]¶

Bases: object

DataFrame-native ECT transformer.

This class provides a consistent interface for computing ECT from DataFrames, supporting both pandas and polars.

Parameters:

coord_columns (list of str) – Column names containing point coordinates.
group_column (str, optional) – Column name for group/batch IDs.
channel_column (str, optional) – Column name for channel IDs.
num_thetas (int, default=64) – Number of directions.
resolution (int, default=64) – Number of threshold steps.
radius (float, default=1.0) – Radius of threshold interval.
scale (float, default=500.0) – Scale factor for sigmoid.
sampling_method (str, default="uniform") – Method for generating directions.
seed (int, default=42) – Random seed.
normalized (bool, default=False) – Whether to normalize the ECT.
parallel (bool, default=True) – Whether to use parallel computation.
output_format (str, default="numpy") – Output format: “numpy”, “pandas”, or “polars”.

Examples

>>> import pandas as pd
>>> from trailed.tabular import DataFrameEctTransformer
>>> df = pd.DataFrame({
...     "x": np.random.randn(100),
...     "y": np.random.randn(100),
...     "z": np.random.randn(100),
...     "group": np.repeat(range(10), 10),
... })
>>> transformer = DataFrameEctTransformer(
...     coord_columns=["x", "y", "z"],
...     group_column="group",
...     num_thetas=32,
...     resolution=32,
... )
>>> ect = transformer.transform(df)

fit(df: pd.DataFrame | pl.DataFrame) → DataFrameEctTransformer[source]¶

Fit the transformer by generating directions.

Parameters:: df (DataFrame) – Sample DataFrame to infer dimensions from.
Return type:: self

fit_transform(df: pd.DataFrame | pl.DataFrame) → ndarray[tuple[Any, ...], dtype[_ScalarT]] | pd.DataFrame | pl.DataFrame[source]¶: Fit and transform in one step.

transform(df: pd.DataFrame | pl.DataFrame) → ndarray[tuple[Any, ...], dtype[_ScalarT]] | pd.DataFrame | pl.DataFrame[source]¶

Transform DataFrame to ECT features.

Parameters:: df (DataFrame) – DataFrame containing point cloud data.
Returns:: result – ECT features in the specified output format.
Return type:: ndarray or DataFrame

class trailed.EctChannelTransformer(num_thetas: int = 64, resolution: int = 64, radius: float = 1.0, scale: float = 500.0, max_channels: int | None = None, sampling_method: Literal['uniform', 'structured_2d', 'multiview', 'spherical_grid'] = 'uniform', flatten: bool = True, normalized: bool = False, seed: int = 42)[source]¶

Bases: object

ECT transformer with channel support for categorical features.

This transformer computes separate ECTs for each categorical channel in the point cloud, useful for molecules with different atom types or other categorically-labeled point clouds.

Parameters:

num_thetas (int, default=64) – Number of directions to sample.
resolution (int, default=64) – Number of threshold steps.
radius (float, default=1.0) – Radius of the threshold interval.
scale (float, default=500.0) – Scale factor for sigmoid approximation.
max_channels (int or None, default=None) – Maximum number of channels. If None, inferred from data.
sampling_method (str, default="uniform") – Method for generating directions.
flatten (bool, default=True) – If True, flatten the ECT to a 1D feature vector.
normalized (bool, default=False) – If True, normalize each ECT to [0, 1].
seed (int, default=42) – Random seed for direction generation.

Examples

>>> from trailed.plugins.sklearn import EctChannelTransformer
>>> import numpy as np
>>> # Point clouds with channel labels
>>> X = np.random.randn(10, 50, 3).astype(np.float32)
>>> channels = np.random.randint(0, 3, size=(10, 50))  # 3 channels
>>> transformer = EctChannelTransformer(max_channels=3)
>>> features = transformer.fit_transform(X, channels=channels)

fit(X: ArrayLike, y=None, channels: ArrayLike | None = None) → EctChannelTransformer[source]¶

Fit the transformer.

Parameters:

X (array-like of shape (n_samples, n_points, n_dims)) – Training point clouds.
y (None) – Ignored.
channels (array-like of shape (n_samples, n_points), optional) – Channel indices for each point.

fit_transform(X: ArrayLike, y=None, channels: ArrayLike | None = None) → ndarray[tuple[Any, ...], dtype[_ScalarT]][source]¶: Fit and transform in one step.

get_params(deep: bool = True) → dict[source]¶: Get parameters for this estimator.

set_params(**params) → EctChannelTransformer[source]¶: Set parameters for this estimator.

transform(X: ArrayLike, channels: ArrayLike | None = None) → ndarray[tuple[Any, ...], dtype[_ScalarT]][source]¶

Transform point clouds to ECT features.

Parameters:

X (array-like of shape (n_samples, n_points, n_dims)) – Point clouds to transform.
channels (array-like of shape (n_samples, n_points)) – Channel indices for each point.

Returns:

features – ECT features with shape depending on flatten parameter.

Return type:

ndarray

class trailed.EctTransformer(num_thetas: int = 64, resolution: int = 64, radius: float = 1.0, scale: float = 500.0, sampling_method: Literal['uniform', 'structured_2d', 'multiview', 'spherical_grid'] = 'uniform', flatten: bool = True, normalized: bool = False, parallel: bool = True, seed: int = 42)[source]¶

Bases: object

Sklearn-compatible transformer for computing ECT features.

This transformer computes the Euler Characteristic Transform for batches of point clouds, producing fixed-size feature vectors suitable for machine learning classifiers.

Parameters:

num_thetas (int, default=64) – Number of directions to sample.
resolution (int, default=64) – Number of threshold steps.
radius (float, default=1.0) – Radius of the threshold interval [-radius, radius].
scale (float, default=500.0) – Scale factor for sigmoid approximation.
sampling_method (str, default="uniform") – Method for generating directions. One of “uniform”, “structured_2d”, “multiview”, “spherical_grid”.
flatten (bool, default=True) – If True, flatten the ECT to a 1D feature vector.
normalized (bool, default=False) – If True, normalize each ECT to [0, 1].
parallel (bool, default=True) – If True, use parallel computation.
seed (int, default=42) – Random seed for direction generation.

directions_¶

The direction vectors used for ECT computation.

Type:: ndarray of shape (ambient_dim, num_thetas)

ambient_dim_¶

Inferred ambient dimension from training data.

Type:: int

Examples

>>> from trailed.plugins.sklearn import EctTransformer
>>> import numpy as np
>>> # Create sample point clouds: 10 samples, 50 points each, 3D
>>> X = np.random.randn(10, 50, 3).astype(np.float32)
>>> transformer = EctTransformer(num_thetas=32, resolution=32)
>>> features = transformer.fit_transform(X)
>>> features.shape
(10, 1024)  # 32 * 32 = 1024 features per sample

fit(X: ArrayLike, y=None) → EctTransformer[source]¶

Fit the transformer by generating directions.

Parameters:

X (array-like of shape (n_samples, n_points, n_dims)) – Training point clouds.
y (None) – Ignored.

Returns:

The fitted transformer.

Return type:

self

fit_transform(X: ArrayLike, y=None) → ndarray[tuple[Any, ...], dtype[_ScalarT]][source]¶

Fit and transform in one step.

Parameters:

X (array-like of shape (n_samples, n_points, n_dims)) – Point clouds to fit and transform.
y (None) – Ignored.

Returns:

features – ECT features.

Return type:

ndarray

get_params(deep: bool = True) → dict[source]¶: Get parameters for this estimator.

set_params(**params) → EctTransformer[source]¶: Set parameters for this estimator.

transform(X: ArrayLike) → ndarray[tuple[Any, ...], dtype[_ScalarT]][source]¶

Transform point clouds to ECT features.

Parameters:: X (array-like of shape (n_samples, n_points, n_dims)) – Point clouds to transform.
Returns:: features – ECT features. Shape is (n_samples, resolution * num_thetas) if flatten=True, else (n_samples, resolution, num_thetas).
Return type:: ndarray

class trailed.FastEctTransformer(num_thetas: int = 64, resolution: int = 64, sampling_method: Literal['uniform', 'structured_2d', 'multiview', 'spherical_grid'] = 'uniform', flatten: bool = True, parallel: bool = True, seed: int = 42)[source]¶

Bases: object

Fast (non-differentiable) ECT transformer using bincount.

This transformer is optimized for speed using histogram-based ECT computation. It’s faster than EctTransformer but produces slightly different results (discrete vs smooth approximation).

Parameters:

num_thetas (int, default=64) – Number of directions to sample.
resolution (int, default=64) – Number of histogram bins.
sampling_method (str, default="uniform") – Method for generating directions.
flatten (bool, default=True) – If True, flatten the ECT to a 1D feature vector.
parallel (bool, default=True) – If True, use parallel computation.
seed (int, default=42) – Random seed for direction generation.

Examples

>>> from trailed.plugins.sklearn import FastEctTransformer
>>> import numpy as np
>>> X = np.random.randn(100, 50, 3).astype(np.float32)
>>> transformer = FastEctTransformer(num_thetas=64, resolution=64)
>>> features = transformer.fit_transform(X)

fit(X: ArrayLike, y=None) → FastEctTransformer[source]¶: Fit the transformer.

fit_transform(X: ArrayLike, y=None) → ndarray[tuple[Any, ...], dtype[_ScalarT]][source]¶: Fit and transform in one step.

get_params(deep: bool = True) → dict[source]¶: Get parameters for this estimator.

set_params(**params) → FastEctTransformer[source]¶: Set parameters for this estimator.

transform(X: ArrayLike) → ndarray[tuple[Any, ...], dtype[_ScalarT]][source]¶: Transform point clouds to ECT features.

trailed.compute_ect_from_dataframe(df: pd.DataFrame | pl.DataFrame, coord_columns: List[str], group_column: str | None = None, channel_column: str | None = None, **kwargs) → ndarray[tuple[Any, ...], dtype[_ScalarT]][source]¶

Compute ECT from a pandas or polars DataFrame.

This is a convenience function that automatically detects the DataFrame type and calls the appropriate function.

Parameters:

df (pd.DataFrame or pl.DataFrame) – DataFrame containing point cloud data.
coord_columns (list of str) – Column names containing point coordinates.
group_column (str, optional) – Column name for group/batch IDs.
channel_column (str, optional) – Column name for channel IDs.
**kwargs – Additional arguments passed to compute_ect_from_numpy.

Returns:

ect – ECT features.

Return type:

ndarray

trailed.compute_ect_from_numpy(points: ndarray[tuple[Any, ...], dtype[_ScalarT]], group_ids: ndarray[tuple[Any, ...], dtype[_ScalarT]] | None = None, channel_ids: ndarray[tuple[Any, ...], dtype[_ScalarT]] | None = None, num_thetas: int = 64, resolution: int = 64, radius: float = 1.0, scale: float = 500.0, sampling_method: str = 'uniform', seed: int = 42, normalized: bool = False, parallel: bool = True, directions: ndarray[tuple[Any, ...], dtype[_ScalarT]] | None = None, lin: ndarray[tuple[Any, ...], dtype[_ScalarT]] | None = None) → ndarray[tuple[Any, ...], dtype[_ScalarT]][source]¶

Compute ECT from numpy arrays.

Parameters:

points (ndarray of shape (n_points, n_dims)) – Point coordinates.
group_ids (ndarray of shape (n_points,), optional) – Group/batch indices for each point. Points with the same group_id belong to the same point cloud.
channel_ids (ndarray of shape (n_points,), optional) – Channel indices for each point (e.g., atom types).
num_thetas (int, default=64) – Number of directions.
resolution (int, default=64) – Number of threshold steps.
radius (float, default=1.0) – Radius of threshold interval.
scale (float, default=500.0) – Scale factor for sigmoid.
sampling_method (str, default="uniform") – Method for generating directions.
seed (int, default=42) – Random seed.
normalized (bool, default=False) – Whether to normalize the ECT.
parallel (bool, default=True) – Whether to use parallel computation.

Returns:

ect – ECT features. Shape depends on inputs: - No groups, no channels: (resolution, num_thetas) - With groups, no channels: (n_groups, resolution, num_thetas) - With channels: (n_groups, num_thetas, resolution, n_channels)

Return type:

ndarray

trailed.compute_ect_from_pandas(df: pd.DataFrame, coord_columns: List[str], group_column: str | None = None, channel_column: str | None = None, num_thetas: int = 64, resolution: int = 64, radius: float = 1.0, scale: float = 500.0, sampling_method: str = 'uniform', seed: int = 42, normalized: bool = False, parallel: bool = True, directions: ndarray[tuple[Any, ...], dtype[_ScalarT]] | None = None, lin: ndarray[tuple[Any, ...], dtype[_ScalarT]] | None = None) → ndarray[tuple[Any, ...], dtype[_ScalarT]][source]¶

Compute ECT from a pandas DataFrame.

Parameters:

df (pd.DataFrame) – DataFrame containing point cloud data.
coord_columns (list of str) – Column names containing point coordinates (e.g., [“x”, “y”, “z”]).
group_column (str, optional) – Column name for group/batch IDs.
channel_column (str, optional) – Column name for channel IDs.
num_thetas (int, default=64) – Number of directions.
resolution (int, default=64) – Number of threshold steps.
radius (float, default=1.0) – Radius of threshold interval.
scale (float, default=500.0) – Scale factor for sigmoid.
sampling_method (str, default="uniform") – Method for generating directions.
seed (int, default=42) – Random seed.
normalized (bool, default=False) – Whether to normalize the ECT.
parallel (bool, default=True) – Whether to use parallel computation.

Returns:

ect – ECT features.

Return type:

ndarray

Examples

>>> import pandas as pd
>>> from trailed.tabular import compute_ect_from_pandas
>>> df = pd.DataFrame({
...     "x": [0.1, 0.2, 0.3, 0.5, 0.6, 0.7],
...     "y": [0.1, 0.3, 0.2, 0.4, 0.5, 0.6],
...     "z": [0.0, 0.1, 0.2, 0.1, 0.2, 0.3],
...     "molecule_id": [0, 0, 0, 1, 1, 1],
...     "atom_type": [0, 1, 0, 1, 1, 0],
... })
>>> ect = compute_ect_from_pandas(
...     df,
...     coord_columns=["x", "y", "z"],
...     group_column="molecule_id",
...     channel_column="atom_type",
... )

trailed.compute_ect_from_polars(df: pl.DataFrame, coord_columns: List[str], group_column: str | None = None, channel_column: str | None = None, num_thetas: int = 64, resolution: int = 64, radius: float = 1.0, scale: float = 500.0, sampling_method: str = 'uniform', seed: int = 42, normalized: bool = False, parallel: bool = True, directions: ndarray[tuple[Any, ...], dtype[_ScalarT]] | None = None, lin: ndarray[tuple[Any, ...], dtype[_ScalarT]] | None = None) → ndarray[tuple[Any, ...], dtype[_ScalarT]][source]¶

Compute ECT from a polars DataFrame.

Parameters:

df (pl.DataFrame) – DataFrame containing point cloud data.
coord_columns (list of str) – Column names containing point coordinates.
group_column (str, optional) – Column name for group/batch IDs.
channel_column (str, optional) – Column name for channel IDs.
num_thetas (int, default=64) – Number of directions.
resolution (int, default=64) – Number of threshold steps.
radius (float, default=1.0) – Radius of threshold interval.
scale (float, default=500.0) – Scale factor for sigmoid.
sampling_method (str, default="uniform") – Method for generating directions.
seed (int, default=42) – Random seed.
normalized (bool, default=False) – Whether to normalize the ECT.
parallel (bool, default=True) – Whether to use parallel computation.

Returns:

ect – ECT features.

Return type:

ndarray

Examples

>>> import polars as pl
>>> from trailed.tabular import compute_ect_from_polars
>>> df = pl.DataFrame({
...     "x": [0.1, 0.2, 0.3, 0.5, 0.6, 0.7],
...     "y": [0.1, 0.3, 0.2, 0.4, 0.5, 0.6],
...     "z": [0.0, 0.1, 0.2, 0.1, 0.2, 0.3],
...     "molecule_id": [0, 0, 0, 1, 1, 1],
... })
>>> ect = compute_ect_from_polars(
...     df,
...     coord_columns=["x", "y", "z"],
...     group_column="molecule_id",
... )

trailed.compute_node_heights(x: ndarray[tuple[Any, ...], dtype[_ScalarT]], v: ndarray[tuple[Any, ...], dtype[_ScalarT]]) → ndarray[tuple[Any, ...], dtype[_ScalarT]][source]¶

Compute node heights (projections onto directions).

Parameters:

x (ndarray of shape (n_points, d)) – Point coordinates.
v (ndarray of shape (d, n_directions)) – Direction vectors.

Returns:

heights – Projection of each point onto each direction.

Return type:

ndarray of shape (n_points, n_directions)

trailed.ect_to_dataframe(ect: ndarray[tuple[Any, ...], dtype[_ScalarT]], group_ids: List | None = None, as_polars: bool = False) → pd.DataFrame | pl.DataFrame[source]¶

Convert ECT array to a DataFrame.

Parameters:

ect (ndarray) – ECT array of shape (n_groups, resolution, num_thetas) or (n_groups, num_thetas, resolution, n_channels).
group_ids (list, optional) – Original group identifiers to use as index.
as_polars (bool, default=False) – If True, return a polars DataFrame instead of pandas.

Returns:

df – DataFrame with flattened ECT features.

Return type:

pd.DataFrame or pl.DataFrame

trailed.generate_2d_directions(num_thetas: int) → ndarray[tuple[Any, ...], dtype[_ScalarT]][source]¶

Generate structured directions along the 2D unit circle.

Divides the interval [0, 2*pi) into equal parts and returns the corresponding points on the unit circle.

Parameters:: num_thetas (int) – Number of directions to generate.
Returns:: directions – Unit vectors representing directions.
Return type:: ndarray of shape (2, num_thetas)

Examples

>>> from trailed.sampling import generate_2d_directions
>>> v = generate_2d_directions(8)
>>> v.shape
(2, 8)

trailed.generate_directions(num_thetas: int, ambient_dim: int, method: Literal['uniform', 'structured_2d', 'multiview', 'spherical_grid'] = 'uniform', seed: int = 42) → ndarray[tuple[Any, ...], dtype[_ScalarT]][source]¶

Generate direction vectors using the specified method.

This is a convenience function that dispatches to the appropriate direction generation function based on the method parameter.

Parameters:

num_thetas (int) – Number of directions to generate. For spherical_grid, this is used to estimate the grid dimensions.
ambient_dim (int) – Dimension of the ambient space.
method (str, default="uniform") – Direction sampling method: - “uniform”: Random sampling on unit sphere - “structured_2d”: Evenly spaced on 2D circle (requires ambient_dim=2) - “multiview”: Structured sampling in coordinate planes - “spherical_grid”: Lat/lon grid on sphere (requires ambient_dim=3)
seed (int, default=42) – Random seed (only used for “uniform” method).

Returns:

directions – Unit vectors representing directions.

Return type:

ndarray of shape (ambient_dim, num_directions)

Examples

>>> from trailed.sampling import generate_directions
>>> v = generate_directions(64, 3, method="uniform")
>>> v.shape
(3, 64)
>>> v = generate_directions(64, 2, method="structured_2d")
>>> v.shape
(2, 64)

trailed.generate_lin(radius: float, resolution: int) → ndarray[tuple[Any, ...], dtype[_ScalarT]][source]¶

Generate linear threshold values.

Parameters:

radius (float) – Radius of the interval [-radius, radius].
resolution (int) – Number of threshold steps.

Returns:

lin – Threshold values.

Return type:

ndarray of shape (resolution,)

trailed.generate_multiview_directions(num_thetas: int, ambient_dim: int) → ndarray[tuple[Any, ...], dtype[_ScalarT]][source]¶

Generate structured directions in multiple 2D planes.

Generates directions by embedding the 2D unit circle in the d-dimensional space along each pair of coordinate axes. This produces (d choose 2) sets of structured directions.

Parameters:

num_thetas (int) – Total number of directions to generate.
ambient_dim (int) – Dimension of the ambient space.

Returns:

directions – Unit vectors representing directions.

Return type:

ndarray of shape (ambient_dim, num_thetas)

Examples

>>> from trailed.sampling import generate_multiview_directions
>>> v = generate_multiview_directions(64, 3)
>>> v.shape
(3, 64)

trailed.generate_spherical_grid_directions(num_thetas: int, num_phis: int) → ndarray[tuple[Any, ...], dtype[_ScalarT]][source]¶

Generate directions on a spherical grid (3D only).

Creates a grid of directions on the unit sphere using latitude-longitude parameterization. The polar angle theta ranges from 0 to pi, and the azimuthal angle phi ranges from 0 to 2*pi.

Parameters:

num_thetas (int) – Number of polar angle samples.
num_phis (int) – Number of azimuthal angle samples.

Returns:

directions – Unit vectors representing directions.

Return type:

ndarray of shape (3, num_thetas * num_phis)

Examples

>>> from trailed.sampling import generate_spherical_grid_directions
>>> v = generate_spherical_grid_directions(8, 16)
>>> v.shape
(3, 128)

trailed.generate_uniform_directions(num_thetas: int, ambient_dim: int, seed: int = 42) → ndarray[tuple[Any, ...], dtype[_ScalarT]][source]¶

Generate randomly sampled directions from a sphere.

Samples points from a standard normal distribution and projects them onto the unit sphere to obtain uniformly distributed directions.

Parameters:

num_thetas (int) – Number of directions to generate.
ambient_dim (int) – Dimension of the ambient space.
seed (int, default=42) – Random seed for reproducibility.

Returns:

directions – Unit vectors representing directions.

Return type:

ndarray of shape (ambient_dim, num_thetas)

Examples

>>> from trailed.sampling import generate_uniform_directions
>>> v = generate_uniform_directions(64, 3)
>>> v.shape
(3, 64)
>>> np.allclose(np.linalg.norm(v, axis=0), 1.0)
True

trailed.normalize_directions(v: ndarray[tuple[Any, ...], dtype[_ScalarT]]) → ndarray[tuple[Any, ...], dtype[_ScalarT]][source]¶

Normalize direction vectors to unit length.

Parameters:: v (ndarray of shape (d, n)) – Direction vectors.
Returns:: normalized – Unit vectors.
Return type:: ndarray of shape (d, n)