Phil¶

Topological Imputation with Representative Selection

Phil generates multiple candidate imputations using configurable sklearn pipelines, scores them with ECT (Euler Characteristic Transform) descriptors, and selects the most representative imputation. Instead of picking a single imputation strategy, explore the space of possibilities and let topological methods guide the selection.

Quick Links¶

Quickstart

Impute a dataset and select a representative in minutes.

Quickstart

User Guide

Installation, configuration, and advanced workflows.

User Guide

API Reference

Full API documentation for phil.

API Reference

MCP Server

Run Phil sweeps from Claude, Cursor, or any MCP-capable agent.

MCP Server

What is Phil?¶

Phil addresses a common challenge in data science: how do you choose the right imputation strategy? Different methods (mean, median, KNN, iterative) produce different completed datasets, each with distinct downstream effects on analysis.

Phil’s approach:

Generate multiple candidate imputations across configurable sklearn grids
Score each candidate using topological descriptors (ECT)
Select the representative closest to the mean descriptor profile
Deploy directly or integrate via PhilTransformer in sklearn pipelines

Typical Workflow¶

        graph LR
   subgraph Input
      A[DataFrame with missing values]
   end

   subgraph "Stage 1: Impute"
      B["Generate candidates"]
      C1["KNN imputation"]
      C2["Iterative imputation"]
      C3["Mean/Median imputation"]
   end

   subgraph "Stage 2: Score"
      D["ECT descriptors"]
      E["Distance matrix"]
   end

   subgraph "Stage 3: Select"
      F["Representative"]
   end

   A --> B
   B --> C1
   B --> C2
   B --> C3
   C1 --> D
   C2 --> D
   C3 --> D
   D --> E
   E --> F

   style Input fill:#f9f9f9,stroke:#999
   style B fill:#D9EDF7,stroke:#31708F,stroke-width:2px
   style D fill:#D9EDF7,stroke:#31708F,stroke-width:2px
   style F fill:#DFF0D8,stroke:#3C763D,stroke-width:2px

YAML-Driven (Recommended)

from phil import Phil

imputer = Phil(samples=25, random_state=42)
completed = imputer.fit(df_with_missing)

Pipeline Integration

from sklearn.pipeline import Pipeline
from phil.transformers import PhilTransformer

pipe = Pipeline([
    ("impute", PhilTransformer(samples=25)),
    ("model", YourModel()),
])

Key Features¶

Multi-Strategy Exploration: Generate imputations across mean, median, KNN, iterative, and custom methods.
Topological Scoring: Use ECT descriptors to capture structural properties of each imputation candidate.
Representative Selection: Automatically choose the candidate closest to the ensemble’s central tendency.
sklearn Integration: Drop-in transformer for seamless pipeline compatibility.

Installation¶

pip install phil

For development:

git clone https://github.com/Krv-Analytics/phil.git
cd phil
uv sync --extra dev --extra docs

Supports Python 3.10, 3.11, 3.12.

Next Steps¶

Quickstart - Impute your first dataset
User Guide - Complete installation and configuration
API Reference - Detailed class and function docs
Transformers - sklearn pipeline integration