Phil Package API¶
Phil package.
- class phil.CovariateDistributionImputer(n_neighbors: int = 5, missing_values=nan, random_state=None, threshold: float = 1.0, covariance_matrix=None)[source]¶
Bases:
BaseEstimatorImputer that samples from the conditional distribution P(x_j | x_{-j}) approximated via k-nearest neighbors in the observed covariate space.
- fit(X, y) CovariateDistributionImputer[source]¶
- class phil.DistributionImputer(missing_values=nan, random_state=None, threshold=1.0)[source]¶
Bases:
BaseEstimatorImputer that samples from empirical observed values.
- class phil.ECTConfig(*, num_thetas: int, radius: float, resolution: int, scale: int, normalize: bool = True, seed: int = 0)[source]¶
Bases:
BaseModel- model_config: ClassVar[ConfigDict] = {}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- normalize: bool¶
- num_thetas: int¶
- radius: float¶
- resolution: int¶
- scale: int¶
- seed: int¶
- class phil.GridGallery[source]¶
Bases:
objectCollection of imputation grids optimized for specific domains.
Citations: - Sampling/Multiverse: Wayland et al. (2025) - https://www.nature.com/articles/s41560-025-01871-0 - Finance: Gu, Kelly, & Xiu (2020) on ML for asset pricing and robust ML portfolios. - Healthcare: Stekhoven & Bühlmann (2011) on MissForest and Chen et al. (2023) on clinical imputation. - Marketing: Anand & Mamidi (2020) / Zhang et al. (2025) on ML for consumer analytics. - Engineering: Thomas & Rajabi (2021) and Idri et al. (2016) on systematic reviews of engineering data.
- classmethod get(name: str) ImputationConfig[source]¶
- class phil.ImputationConfig(*, methods: List[str], modules: List[str], grids: List[ParameterGrid], domain_knowledge: DomainKnowledge | None = None)[source]¶
Bases:
BaseModelConfiguration for imputation methods and parameter grids.
- domain_knowledge: DomainKnowledge | None¶
- grids: List[ParameterGrid]¶
- methods: List[str]¶
- model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- modules: List[str]¶
- class phil.Phil(samples: int = 30, param_grid: str = 'default', magic: str = 'ECT', config=None, random_state=None)[source]¶
Bases:
object
- class phil.PhilTransformer(samples: int = 30, param_grid: str | ImputationConfig = 'default', magic: str = 'ECT', config: dict | None = None, random_state: int | None = None, max_iter: int = 5)[source]¶
Bases:
BaseEstimator,TransformerMixin- fit(X: DataFrame, y: Any = None) PhilTransformer[source]¶
- class phil.PreprocessingConfig(*, method: str, module: str = 'sklearn.preprocessing', params: ~typing.Dict[str, ~typing.Any] = <factory>)[source]¶
Bases:
BaseModelConfiguration for data preprocessing steps.
- method: str¶
- model_config: ClassVar[ConfigDict] = {}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- module: str¶
- params: Dict[str, Any]¶
- phil.plot_mds(descriptors: list[np.ndarray], selected_index: int, ax=None, figsize: tuple[int, int] = (8, 6), random_state: int | None = None) tuple['Figure', np.ndarray][source]¶
Visualize the ECT descriptor space via Multi-Dimensional Scaling (MDS).
- class phil.phil.Phil(samples: int = 30, param_grid: str = 'default', magic: str = 'ECT', config=None, random_state=None)[source]¶
Bases:
object
Scikit-learn compatible transformers for Phil.
- class phil.transformers.PhilTransformer(samples: int = 30, param_grid: str | ImputationConfig = 'default', magic: str = 'ECT', config: dict | None = None, random_state: int | None = None, max_iter: int = 5)[source]¶
Bases:
BaseEstimator,TransformerMixin- fit(X: DataFrame, y: Any = None) PhilTransformer[source]¶