Demos

Pulsar shines when you have real data and real questions. Below are five production demos that showcase different aspects of topological data analysis — from recovering hidden biology to revealing benchmark structure to analyzing clinical trajectories.

Each demo is self-contained and runnable in minutes. Pick one that matches your domain and see the insights Pulsar reveals.

1. Palmer Penguins: Recovering Biology Without Labels

The Hook: Can topology rediscover penguin species without looking at species labels? Or discover that habitat and sex are equally important structurally?

The Data

The Palmer Penguins dataset contains 333 penguins from three species (Adelie, Chinstrap, Gentoo) with 8 morphological measurements: bill length, bill depth, flipper length, body mass, and more. It’s the ideal educational dataset — real biology, no missing structure, universally understood.

The Discovery

After dropping species labels entirely and letting Pulsar discover structure in the remaining 5-dimensional feature space:

  • The Gentoos: Completely isolated on Biscoe Island, then perfectly separated by sex. (They are chunky birds with distinctive morphology.)

  • The Adelies: Fragmented by island of origin — the structural variation within the species is as important as the species itself.

  • The Chinstraps: Indistinguishable from Dream Island Adelies. They share the same morphological envelope, so the math doesn’t lie.

Key Insight: Topology reveals that habitat and biological sex are as structurally important as species itself. Traditional clustering (K-means) would force three spheres; topology shows the actual complexity.

Try It Now

This is the fastest way to see Pulsar in action. No dataset to download.

# Option 1: Use Pulsar with Claude AI (recommended)
# Install Pulsar MCP server in Claude Desktop (see :ref:`mcp` guide)
# Then ask Claude: "Use Pulsar to analyze the penguin data at demos/penguins/penguins.csv"

# Option 2: Run directly with Python
cd /path/to/pulsar
uv sync
uv run maturin develop --release
uv run python -c "
from pulsar.pipeline import ThemaRS
config = {'run': {'name': 'penguins', 'data': 'demos/penguins/penguins.csv'}}
model = ThemaRS.from_dict(config)
model.fit()
print(f'Discovered {len(model.cosmic_graph.nodes())} nodes and {len(model.cosmic_graph.edges())} edges')
"

Deep Dive

2. MMLU Benchmark Topology: 57 Subjects, 12 True Clusters

The Hook: MMLU is the standard LLM benchmark: 57 subjects, one leaderboard number. What if the real structure doesn't match those labels?

The Data

MMLU consists of ~14,000 test questions across 57 administrative subjects (professional medicine, history, chemistry, law, etc.). We embed all questions using bge-small-en-v1.5 (384-dimensional sentence embeddings) and run Pulsar’s topological sweep.

The Discovery

The geometric structure in embedding space reveals 12 distinct regions that cut across subject boundaries:

MMLU’s Hidden Structure

Region

Theme

Top Subjects

0

Psychology / Behavioral

professional_psychology, hs_psychology

1

Medicine / Health

professional_medicine, nutrition, clinical_knowledge

2

Mathematics / Quantitative

elementary_math, hs_math, hs_statistics

3

Moral Reasoning

moral_scenarios (100% isolated)

5

Law

professional_law (87% of region)

8

History

hs_world_history, hs_us_history

Key Insights:

  • moral_scenarios forms a completely isolated island — structurally alien to the rest of MMLU

  • professional_law is the tightest cluster (87% of Region 5)

  • Psychology splits: behavioral questions in Region 0, philosophical in Region 7

  • Leaderboard blind spot: Different LLMs have vastly different accuracy across regions. The single benchmark number hides this variation.

  • Random sampling needs 3x more questions than topology-aware sampling to cover all 12 regions

Try It Now

Jupyter notebook with full analysis and per-model evaluation:

cd demos/mmlu
uv sync --group demos
uv run maturin develop --release
jupyter notebook mmlu_topology_demo.ipynb

First run downloads and embeds ~14k questions (~2 min on Apple Silicon). Subsequent runs use cached data.

Deep Dive

3. Clinical Trajectories: PhysioNet ICU Vitals Over Time

The Hook: Two patients with identical vital signs right now might have completely different futures. Can topology reveal their trajectory archetypes?

The Data

The demo simulates 500 ICU patients over 72 hours with 8 vital signs: heart rate, systolic/diastolic BP, MAP, respiratory rate, temperature, SpO₂, lactate, glucose. Five distinct clinical archetypes are embedded in the synthetic trajectories (sepsis progression, recovery, decline, stable, recovery-plateau).

This demonstrates TemporalCosmicGraph — a 3D tensor approach (patient × feature × time) that captures patient-level temporal patterns, not just snapshots.

The Discovery

  • Patients cluster by trajectory type, not current state. A recovering patient and a declining patient may have identical vitals right now but opposite futures.

  • Multiple aggregation modes reveal different groupings: - Persistence → stable vs. volatile patients - Trend → improving vs. worsening trajectories - Volatility → high-risk vs. stable - Change point → when trajectory shifts occur

  • Early warning signals emerge from trajectory clustering, not from any single vital.

Try It Now

With synthetic data (no real PHI):

cd /path/to/pulsar
uv sync
uv run maturin develop --release
uv run python demos/ehr/physionet.py --synthetic --n-patients 500

With real eICU data (if you have access via PhysioNet):

# First download eICU from https://physionet.org
uv run python demos/ehr/physionet.py --data /path/to/eicu.csv

Deep Dive

4. ECG Arrhythmia Classification via Feature Extraction

The Hook: 60,000 raw ECG samples per patient. Can a compact feature vector capture enough to cluster arrhythmias?

The Data

ECG (electrocardiogram) signals from the PhysioNet Arrhythmia Database: 12-lead recordings at 500 Hz, 10-second windows = 5,000 samples per lead, per patient. The demo extracts ~80 summary features per ECG:

  • Statistical: mean, std, min, max, median, skewness, kurtosis

  • Frequency: FFT peaks, power spectral density

  • Morphological: zero crossings, rate of change statistics

The Discovery

  • Topology reveals clusters that align with SNOMED-CT arrhythmia diagnoses better than K-means or traditional clustering

  • Different leads emphasize different diagnostic features — combining all 12 leads captures the full arrhythmia signature

  • Trade-off: Feature extraction is computationally efficient vs. true temporal modeling (TemporalCosmicGraph), with minimal loss in structure discovery

Try It Now

With synthetic ECG patterns:

uv run python demos/ehr/ecg_arrhythmia.py --synthetic

With real PhysioNet data:

# Download from https://physionet.org (requires registration)
uv run python demos/ehr/ecg_arrhythmia.py --data /path/to/ecg_data

Deep Dive

5. US Coal Plants: Production-Scale Grid Sweep

The Hook: Real infrastructure data at scale. How do operational coal plants cluster when you account for location, capacity, age, emissions, and status?

The Data

147 US coal power plants with features: latitude, longitude, capacity (MW), age, emissions (CO₂, NOx, SO₂), operational status, retire year (if planned). Dataset is automatically downloaded from the retire project. Real-world, production-scale problem.

The Discovery

  • Plants cluster by operational region and capacity tier, not administrative ownership

  • Age and emissions profiles separate active vs. retiring cohorts

  • Geographic clustering aligns with grid topology and energy markets

  • The full sweep (4 PCA dims × 8 seeds × 50 epsilons = 4,000 ball maps) approximates the cosmic graph from the original Pulsar Nature paper

Try It Now

Automatic dataset download, grid search, and timing report:

uv run python demos/energy/coal.py

The demo prints per-stage wall-clock timings (preprocessing, PCA, Ball Mapper, graph accumulation, thresholding) and the final graph size. On a modern machine: ~2–5 seconds for the full 4,000-map sweep.

Deep Dive

Choosing Your Demo

Domain

Demo

Why Choose It

Education / Getting Started

Palmer Penguins

Fastest, most intuitive

Research / Benchmarks

MMLU

Reveals hidden structure

Healthcare / Trajectories

PhysioNet (Clinical)

Time-series aware

Healthcare / Signals

ECG Arrhythmia

Feature engineering

Infrastructure / Scale

Coal Plants

Real-world, production-ready

Next Steps

Once you’ve explored a demo:

  1. Use with Claude AI: Set up the MCP Server server and point Claude at your own data. The AI will handle parameter tuning and generate statistical dossiers.

  2. Adapt for your data: Copy the nearest demo’s YAML config and adjust for your feature scales and desired PCA dimensions.

  3. Deep dive on parameters: See Tuning Guide for guidance on tuning epsilon ranges and dimension selection.

  4. Deploy to production: The coal demo shows how to instrument timing and validation. See Tuning Guide for configuration and parameter guidance.