Phil Package API¶

Phil package.

class phil.CovariateDistributionImputer(n_neighbors: int = 5, missing_values=nan, random_state=None, threshold: float = 1.0, covariance_matrix=None)[source]¶

Bases: BaseEstimator

Imputer that samples from the conditional distribution P(x_j | x_{-j}) approximated via k-nearest neighbors in the observed covariate space.

fit(X, y) → CovariateDistributionImputer[source]¶

predict(X) → ndarray[source]¶

class phil.DistributionImputer(missing_values=nan, random_state=None, threshold=1.0)[source]¶

Bases: BaseEstimator

Imputer that samples from empirical observed values.

fit(X, y)[source]¶

predict(X)[source]¶

class phil.ECT(config: ECTConfig)[source]¶

Bases: Magic

configure(**kwargs)[source]¶

generate(X: List[ndarray]) → List[ndarray][source]¶

class phil.ECTConfig(*, num_thetas: int, radius: float, resolution: int, scale: int, normalize: bool = True, seed: int = 0)[source]¶

Bases: BaseModel

model_config: ClassVar[ConfigDict] = {}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

normalize: bool¶

num_thetas: int¶

radius: float¶

resolution: int¶

scale: int¶

seed: int¶

class phil.GridGallery[source]¶

Bases: object

Collection of imputation grids optimized for specific domains.

Citations: - Sampling/Multiverse: Wayland et al. (2025) - https://www.nature.com/articles/s41560-025-01871-0 - Finance: Gu, Kelly, & Xiu (2020) on ML for asset pricing and robust ML portfolios. - Healthcare: Stekhoven & Bühlmann (2011) on MissForest and Chen et al. (2023) on clinical imputation. - Marketing: Anand & Mamidi (2020) / Zhang et al. (2025) on ML for consumer analytics. - Engineering: Thomas & Rajabi (2021) and Idri et al. (2016) on systematic reviews of engineering data.

classmethod get(name: str) → ImputationConfig[source]¶

class phil.ImputationConfig(*, methods: List[str], modules: List[str], grids: List[ParameterGrid], domain_knowledge: DomainKnowledge | None = None)[source]¶

Bases: BaseModel

Configuration for imputation methods and parameter grids.

domain_knowledge: DomainKnowledge | None¶

grids: List[ParameterGrid]¶

methods: List[str]¶

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

modules: List[str]¶

class phil.Phil(samples: int = 30, param_grid: str = 'default', magic: str = 'ECT', config=None, random_state=None)[source]¶

Bases: object

fit(df: DataFrame, max_iter: int = 5) → DataFrame[source]¶

generate_descriptors() → List[ndarray][source]¶

impute(df: DataFrame, max_iter: int = 10) → List[ndarray][source]¶

plot_mds(**kwargs)[source]¶: Visualize the ECT descriptor space via MDS after fit().

transform(df: DataFrame, max_iter: int = 5) → DataFrame[source]¶

class phil.PhilTransformer(samples: int = 30, param_grid: str | ImputationConfig = 'default', magic: str = 'ECT', config: dict | None = None, random_state: int | None = None, max_iter: int = 5)[source]¶

Bases: BaseEstimator, TransformerMixin

fit(X: DataFrame, y: Any = None) → PhilTransformer[source]¶

transform(X: DataFrame) → DataFrame[source]¶

class phil.PreprocessingConfig(*, method: str, module: str = 'sklearn.preprocessing', params: ~typing.Dict[str, ~typing.Any] = <factory>)[source]¶

Bases: BaseModel

Configuration for data preprocessing steps.

method: str¶

model_config: ClassVar[ConfigDict] = {}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

module: str¶

params: Dict[str, Any]¶

phil.plot_mds(descriptors: list[np.ndarray], selected_index: int, ax=None, figsize: tuple[int, int] = (8, 6), random_state: int | None = None) → tuple['Figure', np.ndarray][source]¶: Visualize the ECT descriptor space via Multi-Dimensional Scaling (MDS).

class phil.phil.Phil(samples: int = 30, param_grid: str = 'default', magic: str = 'ECT', config=None, random_state=None)[source]¶

Bases: object

fit(df: DataFrame, max_iter: int = 5) → DataFrame[source]¶

generate_descriptors() → List[ndarray][source]¶

impute(df: DataFrame, max_iter: int = 10) → List[ndarray][source]¶

plot_mds(**kwargs)[source]¶: Visualize the ECT descriptor space via MDS after fit().

transform(df: DataFrame, max_iter: int = 5) → DataFrame[source]¶

Scikit-learn compatible transformers for Phil.

class phil.transformers.PhilTransformer(samples: int = 30, param_grid: str | ImputationConfig = 'default', magic: str = 'ECT', config: dict | None = None, random_state: int | None = None, max_iter: int = 5)[source]¶

Bases: BaseEstimator, TransformerMixin

fit(X: DataFrame, y: Any = None) → PhilTransformer[source]¶

transform(X: DataFrame) → DataFrame[source]¶