MCP Server

No Code. Just a DataFrame and a Question.

The Phil MCP (Model Context Protocol) server lets AI clients — Claude, Gemini, Cursor, and others — run topology-guided imputation sweeps on your pandas or polars dataframes without you writing any Python. Point the agent at a CSV or Parquet file, ask for the best imputation, and let it pick the candidate that best represents the ensemble.

This guide is for practitioners who know their data has missing values and want a reproducible imputation rather than a single hand-picked strategy.

Workflow Comparison

Approach

You Do

AI Does

Speed

Programmatic (Python)

Write Python, configure grids

(nothing)

Depends on grid size

PhilTransformer (sklearn)

Wire into a Pipeline

(nothing)

Depends on grid size

MCP + Agent (recommended)

Hand AI a file path, ask question

Entire sweep workflow

~5–60s (automated tuning)

The Value Prop

Traditional imputation: - You guess a strategy (mean, KNN, iterative regressors) - You run it once, hope it’s “good enough” - No principled way to compare alternatives

Phil via the MCP: - The agent generates a grid of candidate imputations - Each candidate is scored with an ECT topological descriptor - The candidate closest to the ensemble centroid is selected - The agent can iterate on the grid if descriptor spread looks degenerate - You read a structured summary, not a chart of distance metrics

Setup

Phil ships an MCP server entry point (phil-mcp) via the mcp extra of the published philler package. You do not need to clone the repo — uvx (or pipx) can launch it directly from PyPI.

Note

Phil works with any MCP-capable client, including Cursor and Gemini CLI, where you can add Phil as an MCP server/tool.

Open ~/Library/Application Support/Claude/claude_desktop_config.json (macOS) or %APPDATA%\Claude\claude_desktop_config.json (Windows) and add:

{
  "mcpServers": {
    "phil": {
      "command": "uv tool run",
      "args": ["--from", "philler[mcp]", "phil-mcp"]
    }
  }
}

Restart Claude Desktop. A hammer icon in new chats confirms the tools loaded.

Note

GUI-launched apps on macOS often don’t inherit your shell PATH. If Claude can’t find uvx, replace "command": "uv tool run" with its absolute path (find it with which uvx, e.g. /Users/yourname/.local/bin/uvx).

gemini mcp add phil uv tool run --from "philler[mcp]" phil-mcp
claude mcp add phil -- uv tool run --from "philler[mcp]" phil-mcp

Open Settings → Features → MCP → Add new MCP server:

  • Name: phil

  • Type: command

  • Command: uv tool run --from "philler[mcp]" phil-mcp

Alternative install methods

If you prefer a persistent install over ephemeral uv invocations:

pipx install "philler[mcp]"      # then use command: phil-mcp
# or
pip install "philler[mcp]"       # in any venv

Developing against a local clone

Contributors working on Phil’s source can launch the server from a checkout instead:

uv sync --group mcp
uv run phil-mcp

Point your MCP client at uv run --group mcp phil-mcp (with cwd set to the clone) for live-edit development.

Workflow

Once connected, give the agent a goal rather than instructions. The agent already knows the technical steps.

The recommended prompt:

“I have a dataset with missing values at path/to/data.csv. Use Phil to run an imputation sweep and pick the most representative completed dataset. Export it to path/to/imputed.csv.”

Under the hood the agent will:

  1. Ingest — register the file as a stable dataset_id handle

  2. Characterize — summarize missingness, dtypes, and unique counts

  3. List grids — pick the named GridGallery strategy that fits the data (default, finance, healthcare, …)

  4. Create a config — generate canonical YAML with sensible defaults

  5. Validate — confirm the config parses and the dataset is reachable

  6. Run the sweep — fit each candidate imputation and score with ECT

  7. Diagnose — inspect descriptor spread and method counts; iterate if the grid collapsed

  8. Export — write the selected imputed DataFrame to disk

Pandas and polars

Phil’s pipeline runs on pandas internally, but the MCP server accepts any file format pandas or pyarrow can read. To use a polars frame, write it to Parquet first and ingest the path:

import polars as pl

df = pl.read_csv("raw.csv")
df.write_parquet("for_phil.parquet")
# then in your agent chat:
#   "Run a Phil sweep on /abs/path/to/for_phil.parquet"

Available MCP Tools

The server exposes these tools to the AI client. The agent automatically chains them together:

Tool

What It Does

get_workflow_guide

Returns the opinionated, phase-by-phase Phil workflow as markdown. Opt-in — agents that prefer their own plan can ignore it.

get_runtime_context

Reports the server cache directory, session id, and path-visibility guidance — useful when the agent needs to ferry sandboxed files into a host-readable location.

ingest_dataset

Registers a CSV or Parquet path and returns a stable dataset_id handle. Pass that handle to every downstream tool.

begin_dataset_upload / append_dataset_chunk / finalize_dataset_upload

Chunked base64 upload pipeline for clients that cannot share a filesystem with the server. Use only when path-based ingest is impossible.

characterize_dataset

Sparse per-column schema: dtype, n_unique, missing percent, plus aggregate row/column counts. Cheap and safe to call on wide datasets.

probe_columns

Deep per-column inspection for up to 20 columns at a time: sample values, top frequencies, basic numeric statistics.

list_grids

Enumerates the named GridGallery entries (default, sampling, finance, healthcare, marketing, engineering) with method lists and intent blurbs.

create_config

Materializes a canonical YAML config tailored to a dataset_id and grid choice. Stores it on the session so subsequent refine_active_config / run_imputation_sweep calls can omit it.

validate_config

Validates and normalizes a config YAML, returning structured issues if anything is off. Rejects fenced Markdown blocks to prevent silent parse failures.

refine_config / refine_active_config

Apply dotted-path overrides (e.g. imputation.samples=50) to an explicit or session config. Unknown keys raise structured errors with the valid path list.

get_active_config

Returns the in-session config YAML, useful for inspection before running a sweep.

run_imputation_sweep

The headline tool: fits the candidate grid, scores each with the ECT magic method, selects the representative, and persists a RunRecord plus a markdown diff against the previous run.

diagnose_sweep

Inspects a saved run’s descriptor spread, selected index, and per-method candidate counts. Use to decide whether to broaden the grid or raise samples.

get_candidate_descriptors

Returns the top-k candidates ranked by closeness to the mean descriptor, including the selected index.

compare_sweeps

Side-by-side comparison of two persisted runs by config and descriptor statistics.

get_experiment_history

Markdown table of every sweep run in the current session — handy for telling the story of how the agent iterated.

get_sweep_summary

Returns the full persisted RunRecord for a given run_id.

export_imputed_data

Writes the selected imputed DataFrame to disk; CSV / Parquet / Feather inferred from the extension.

Example: A Mixed-Type Frame with Missing Values

Suppose demo.csv has 10 rows, two numeric columns, and one categorical column, with about 20% missingness:

age,income,category
25,50000,A
30,,B
,75000,A
45,80000,
...

A successful agent dialog looks like:

  1. ingest_dataset("/data/demo.csv")dataset_id="ds_abc123"

  2. characterize_dataset("ds_abc123") → reports 4 missing in income, 1 missing in category, etc.

  3. list_grids() → agent picks default

  4. create_config("ds_abc123", grid="default", samples=20)

  5. run_imputation_sweep → returns selected_index=7, descriptor_stats.mean_pairwise_l2=0.14

  6. export_imputed_data("/data/demo_imputed.csv")

The resulting CSV is the candidate Phil considered most representative of the imputation ensemble — not the highest-likelihood, not the lowest-loss, but the one closest to the centroid of the ECT descriptor cloud.

Bringing Your Own Data

  1. Ensure your CSV or Parquet file is accessible on the machine running the MCP server.

  2. Connect the server using the setup steps above.

  3. Ask: “Run a Phil imputation sweep on my_data.csv and export the chosen imputation. Use the finance grid.”

The agent handles missingness analysis, grid selection, descriptor scoring, and selection. Your job is to interpret the resulting imputed dataset using domain knowledge.

Local Medical Demo

For a concrete local workflow (including medical CSV generation and MCP test prompts), see demos/medical/README.md in the repository root.

See also