Phil

Topological Imputation with Representative Selection

Phil generates multiple candidate imputations using configurable sklearn pipelines, scores them with ECT (Euler Characteristic Transform) descriptors, and selects the most representative imputation. Instead of picking a single imputation strategy, explore the space of possibilities and let topological methods guide the selection.

What is Phil?

Phil addresses a common challenge in data science: how do you choose the right imputation strategy? Different methods (mean, median, KNN, iterative) produce different completed datasets, each with distinct downstream effects on analysis.

Phil’s approach:

  1. Generate multiple candidate imputations across configurable sklearn grids

  2. Score each candidate using topological descriptors (ECT)

  3. Select the representative closest to the mean descriptor profile

  4. Deploy directly or integrate via PhilTransformer in sklearn pipelines

Typical Workflow

        graph LR
   subgraph Input
      A[DataFrame with missing values]
   end

   subgraph "Stage 1: Impute"
      B["Generate candidates"]
      C1["KNN imputation"]
      C2["Iterative imputation"]
      C3["Mean/Median imputation"]
   end

   subgraph "Stage 2: Score"
      D["ECT descriptors"]
      E["Distance matrix"]
   end

   subgraph "Stage 3: Select"
      F["Representative"]
   end

   A --> B
   B --> C1
   B --> C2
   B --> C3
   C1 --> D
   C2 --> D
   C3 --> D
   D --> E
   E --> F

   style Input fill:#f9f9f9,stroke:#999
   style B fill:#D9EDF7,stroke:#31708F,stroke-width:2px
   style D fill:#D9EDF7,stroke:#31708F,stroke-width:2px
   style F fill:#DFF0D8,stroke:#3C763D,stroke-width:2px
    

YAML-Driven (Recommended)

from phil import Phil

imputer = Phil(samples=25, random_state=42)
completed = imputer.fit(df_with_missing)

Pipeline Integration

from sklearn.pipeline import Pipeline
from phil.transformers import PhilTransformer

pipe = Pipeline([
    ("impute", PhilTransformer(samples=25)),
    ("model", YourModel()),
])

Key Features

Multi-Strategy Exploration

Generate imputations across mean, median, KNN, iterative, and custom methods.

Topological Scoring

Use ECT descriptors to capture structural properties of each imputation candidate.

Representative Selection

Automatically choose the candidate closest to the ensemble’s central tendency.

sklearn Integration

Drop-in transformer for seamless pipeline compatibility.

Installation

pip install phil

For development:

git clone https://github.com/Krv-Analytics/phil.git
cd phil
uv sync --extra dev --extra docs

Supports Python 3.10, 3.11, 3.12.

Next Steps

References