Demos¶
Pulsar shines when you have real data and real questions. Below are five production demos that showcase different aspects of topological data analysis — from recovering hidden biology to revealing benchmark structure to analyzing clinical trajectories.
Each demo is self-contained and runnable in minutes. Pick one that matches your domain and see the insights Pulsar reveals.
—
1. Palmer Penguins: Recovering Biology Without Labels¶
The Data
The Palmer Penguins dataset contains 333 penguins from three species (Adelie, Chinstrap, Gentoo) with 8 morphological measurements: bill length, bill depth, flipper length, body mass, and more. It’s the ideal educational dataset — real biology, no missing structure, universally understood.
The Discovery
After dropping species labels entirely and letting Pulsar discover structure in the remaining 5-dimensional feature space:
The Gentoos: Completely isolated on Biscoe Island, then perfectly separated by sex. (They are chunky birds with distinctive morphology.)
The Adelies: Fragmented by island of origin — the structural variation within the species is as important as the species itself.
The Chinstraps: Indistinguishable from Dream Island Adelies. They share the same morphological envelope, so the math doesn’t lie.
Key Insight: Topology reveals that habitat and biological sex are as structurally important as species itself. Traditional clustering (K-means) would force three spheres; topology shows the actual complexity.
Try It Now
This is the fastest way to see Pulsar in action. No dataset to download.
# Option 1: Use Pulsar with Claude AI (recommended)
# Install Pulsar MCP server in Claude Desktop (see :ref:`mcp` guide)
# Then ask Claude: "Use Pulsar to analyze the penguin data at demos/penguins/penguins.csv"
# Option 2: Run directly with Python
cd /path/to/pulsar
uv sync
uv run maturin develop --release
uv run python -c "
from pulsar.pipeline import ThemaRS
config = {'run': {'name': 'penguins', 'data': 'demos/penguins/penguins.csv'}}
model = ThemaRS.from_dict(config)
model.fit()
print(f'Discovered {len(model.cosmic_graph.nodes())} nodes and {len(model.cosmic_graph.edges())} edges')
"
Deep Dive
Full walkthrough: demos/penguins/README.md
YAML configuration: demos/penguins/params.yaml
Notebook: The penguins example is also the starting point in the MCP Server guide
—
2. MMLU Benchmark Topology: 57 Subjects, 12 True Clusters¶
The Data
MMLU consists of ~14,000 test questions across 57 administrative subjects (professional medicine, history, chemistry, law, etc.). We embed all questions using bge-small-en-v1.5 (384-dimensional sentence embeddings) and run Pulsar’s topological sweep.
The Discovery
The geometric structure in embedding space reveals 12 distinct regions that cut across subject boundaries:
Region |
Theme |
Top Subjects |
|---|---|---|
0 |
Psychology / Behavioral |
professional_psychology, hs_psychology |
1 |
Medicine / Health |
professional_medicine, nutrition, clinical_knowledge |
2 |
Mathematics / Quantitative |
elementary_math, hs_math, hs_statistics |
3 |
Moral Reasoning |
moral_scenarios (100% isolated) |
5 |
Law |
professional_law (87% of region) |
8 |
History |
hs_world_history, hs_us_history |
Key Insights:
moral_scenarios forms a completely isolated island — structurally alien to the rest of MMLU
professional_law is the tightest cluster (87% of Region 5)
Psychology splits: behavioral questions in Region 0, philosophical in Region 7
Leaderboard blind spot: Different LLMs have vastly different accuracy across regions. The single benchmark number hides this variation.
Random sampling needs 3x more questions than topology-aware sampling to cover all 12 regions
Try It Now
Jupyter notebook with full analysis and per-model evaluation:
cd demos/mmlu
uv sync --group demos
uv run maturin develop --release
jupyter notebook mmlu_topology_demo.ipynb
First run downloads and embeds ~14k questions (~2 min on Apple Silicon). Subsequent runs use cached data.
Deep Dive
Full README with calibration details: demos/mmlu/README.md
Jupyter notebook: mmlu_topology_demo.ipynb
Configuration: mmlu_params.yaml
—
3. Clinical Trajectories: PhysioNet ICU Vitals Over Time¶
The Data
The demo simulates 500 ICU patients over 72 hours with 8 vital signs: heart rate, systolic/diastolic BP, MAP, respiratory rate, temperature, SpO₂, lactate, glucose. Five distinct clinical archetypes are embedded in the synthetic trajectories (sepsis progression, recovery, decline, stable, recovery-plateau).
This demonstrates TemporalCosmicGraph — a 3D tensor approach (patient × feature × time) that captures patient-level temporal patterns, not just snapshots.
The Discovery
Patients cluster by trajectory type, not current state. A recovering patient and a declining patient may have identical vitals right now but opposite futures.
Multiple aggregation modes reveal different groupings: - Persistence → stable vs. volatile patients - Trend → improving vs. worsening trajectories - Volatility → high-risk vs. stable - Change point → when trajectory shifts occur
Early warning signals emerge from trajectory clustering, not from any single vital.
Try It Now
With synthetic data (no real PHI):
cd /path/to/pulsar
uv sync
uv run maturin develop --release
uv run python demos/ehr/physionet.py --synthetic --n-patients 500
With real eICU data (if you have access via PhysioNet):
# First download eICU from https://physionet.org
uv run python demos/ehr/physionet.py --data /path/to/eicu.csv
Deep Dive
Script: demos/ehr/physionet.py
Configuration: physionet_params.yaml
Configuration: physionet_params.yaml
—
4. ECG Arrhythmia Classification via Feature Extraction¶
The Data
ECG (electrocardiogram) signals from the PhysioNet Arrhythmia Database: 12-lead recordings at 500 Hz, 10-second windows = 5,000 samples per lead, per patient. The demo extracts ~80 summary features per ECG:
Statistical: mean, std, min, max, median, skewness, kurtosis
Frequency: FFT peaks, power spectral density
Morphological: zero crossings, rate of change statistics
The Discovery
Topology reveals clusters that align with SNOMED-CT arrhythmia diagnoses better than K-means or traditional clustering
Different leads emphasize different diagnostic features — combining all 12 leads captures the full arrhythmia signature
Trade-off: Feature extraction is computationally efficient vs. true temporal modeling (TemporalCosmicGraph), with minimal loss in structure discovery
Try It Now
With synthetic ECG patterns:
uv run python demos/ehr/ecg_arrhythmia.py --synthetic
With real PhysioNet data:
# Download from https://physionet.org (requires registration)
uv run python demos/ehr/ecg_arrhythmia.py --data /path/to/ecg_data
Deep Dive
Script: demos/ehr/ecg_arrhythmia.py
Configuration: Hardcoded in the script; adjust PCA dimensions and epsilon range as needed
—
5. US Coal Plants: Production-Scale Grid Sweep¶
The Data
147 US coal power plants with features: latitude, longitude, capacity (MW), age, emissions (CO₂, NOx, SO₂), operational status, retire year (if planned). Dataset is automatically downloaded from the retire project. Real-world, production-scale problem.
The Discovery
Plants cluster by operational region and capacity tier, not administrative ownership
Age and emissions profiles separate active vs. retiring cohorts
Geographic clustering aligns with grid topology and energy markets
The full sweep (4 PCA dims × 8 seeds × 50 epsilons = 4,000 ball maps) approximates the cosmic graph from the original Pulsar Nature paper
Try It Now
Automatic dataset download, grid search, and timing report:
uv run python demos/energy/coal.py
The demo prints per-stage wall-clock timings (preprocessing, PCA, Ball Mapper, graph accumulation, thresholding) and the final graph size. On a modern machine: ~2–5 seconds for the full 4,000-map sweep.
Deep Dive
Script: demos/energy/coal.py
Configuration: coal_params.yaml
Data: automatically downloaded on first run from retire project
—
Choosing Your Demo¶
Domain |
Demo |
Why Choose It |
|---|---|---|
Education / Getting Started |
Palmer Penguins |
Fastest, most intuitive |
Research / Benchmarks |
MMLU |
Reveals hidden structure |
Healthcare / Trajectories |
PhysioNet (Clinical) |
Time-series aware |
Healthcare / Signals |
ECG Arrhythmia |
Feature engineering |
Infrastructure / Scale |
Coal Plants |
Real-world, production-ready |
—
Next Steps¶
Once you’ve explored a demo:
Use with Claude AI: Set up the MCP Server server and point Claude at your own data. The AI will handle parameter tuning and generate statistical dossiers.
Adapt for your data: Copy the nearest demo’s YAML config and adjust for your feature scales and desired PCA dimensions.
Deep dive on parameters: See Tuning Guide for guidance on tuning epsilon ranges and dimension selection.
Deploy to production: The coal demo shows how to instrument timing and validation. See Tuning Guide for configuration and parameter guidance.