Phil¶
Topological Imputation with Representative Selection
Phil generates multiple candidate imputations using configurable sklearn pipelines, scores them with ECT (Euler Characteristic Transform) descriptors, and selects the most representative imputation. Instead of picking a single imputation strategy, explore the space of possibilities and let topological methods guide the selection.
Quick Links¶
Impute a dataset and select a representative in minutes.
Installation, configuration, and advanced workflows.
Full API documentation for phil.
What is Phil?¶
Phil addresses a common challenge in data science: how do you choose the right imputation strategy? Different methods (mean, median, KNN, iterative) produce different completed datasets, each with distinct downstream effects on analysis.
Phil’s approach:
Generate multiple candidate imputations across configurable sklearn grids
Score each candidate using topological descriptors (ECT)
Select the representative closest to the mean descriptor profile
Deploy directly or integrate via
PhilTransformerin sklearn pipelines
Typical Workflow¶
graph LR
subgraph Input
A[DataFrame with missing values]
end
subgraph "Stage 1: Impute"
B["Generate candidates"]
C1["KNN imputation"]
C2["Iterative imputation"]
C3["Mean/Median imputation"]
end
subgraph "Stage 2: Score"
D["ECT descriptors"]
E["Distance matrix"]
end
subgraph "Stage 3: Select"
F["Representative"]
end
A --> B
B --> C1
B --> C2
B --> C3
C1 --> D
C2 --> D
C3 --> D
D --> E
E --> F
style Input fill:#f9f9f9,stroke:#999
style B fill:#D9EDF7,stroke:#31708F,stroke-width:2px
style D fill:#D9EDF7,stroke:#31708F,stroke-width:2px
style F fill:#DFF0D8,stroke:#3C763D,stroke-width:2px
YAML-Driven (Recommended)
from phil import Phil
imputer = Phil(samples=25, random_state=42)
completed = imputer.fit(df_with_missing)
Pipeline Integration
from sklearn.pipeline import Pipeline
from phil.transformers import PhilTransformer
pipe = Pipeline([
("impute", PhilTransformer(samples=25)),
("model", YourModel()),
])
Key Features¶
- Multi-Strategy Exploration
Generate imputations across mean, median, KNN, iterative, and custom methods.
- Topological Scoring
Use ECT descriptors to capture structural properties of each imputation candidate.
- Representative Selection
Automatically choose the candidate closest to the ensemble’s central tendency.
- sklearn Integration
Drop-in transformer for seamless pipeline compatibility.
Installation¶
pip install phil
For development:
git clone https://github.com/Krv-Analytics/phil.git
cd phil
uv sync --extra dev --extra docs
Supports Python 3.10, 3.11, 3.12.
Next Steps¶
Quickstart - Impute your first dataset
User Guide - Complete installation and configuration
API Reference - Detailed class and function docs
Transformers - sklearn pipeline integration