Quickstart¶
Get from zero to insights in under 10 minutes.
uv pip install pulsar
Prerequisites¶
Python 3.10+
For development: Rust toolchain
Option 1: Use a Pre-Built Demo (Fastest)¶
The fastest way to see Pulsar in action:
# Run the penguins demo (no data download needed)
cd /path/to/pulsar
uv sync
uv run maturin develop --release
python -c "
from pulsar.pipeline import ThemaRS
config = {'run': {'name': 'penguins', 'data': 'demos/penguins/penguins.csv'}}
model = ThemaRS.from_dict(config)
model.fit()
print(f'Cosmic graph: {len(model.cosmic_graph.nodes())} nodes, {len(model.cosmic_graph.edges())} edges')
"
Done! You’ve discovered penguin species structure without looking at species labels.
For all demos: Demos
Option 2: Use with Claude AI (No Code)¶
Let Claude handle the analysis:
Set up Pulsar MCP server (see MCP Server)
Open Claude Desktop
Paste: “Analyze the file at ``demos/penguins/penguins.csv`` using Pulsar. Find the hidden structure.”
Claude will orchestrate parameter tuning and generate a statistical dossier.
Option 3: YAML-Driven Workflow (Recommended for Reproducibility)¶
Use YAML configuration for transparent, reproducible pipelines.
Step 1: Create a configuration file
Create params.yaml:
data:
path: "data.csv"
preprocessing:
drop_columns: [id]
impute:
age: {method: fill_mean}
salary: {method: fill_median}
category: {method: sample_categorical, seed: 42}
encode:
category: {method: one_hot}
sweep:
pca:
dimensions: {values: [2, 5, 10]}
seed: {values: [42, 7, 13]}
ball_mapper:
epsilon: {range: {min: 0.1, max: 0.5, steps: 5}}
cosmic_graph:
threshold: "auto"
Step 2: Run the pipeline
from pulsar import ThemaRS
model = ThemaRS("params.yaml")
model.fit()
# Access the final graph
graph = model.cosmic_graph
print(f"Nodes: {graph.number_of_nodes()}")
print(f"Edges: {graph.number_of_edges()}")
Step 3: Select representatives
# Get the top 3 representative configurations
reps = model.select_representatives(k=3)
for i, rep in enumerate(reps):
print(f"Representative {i+1}: {rep}")
Option 4: Programmatic Configuration (Full Control)¶
For maximum control, configure directly in Python:
from pulsar import ThemaRS
model = ThemaRS(
data="data.csv",
pca_dims=[2, 5, 10],
epsilon_range=(0.1, 0.5, 5),
random_state=42,
)
model.fit()
Understanding the Pipeline¶
Pulsar executes these stages:
Impute: Fill missing values in specified columns
Scale: StandardScaler normalization
PCA sweep: Project data to multiple dimensions
Ball Mapper sweep: Build neighborhood graphs at multiple epsilon values
Pseudo-Laplacians: Compute graph Laplacians for each configuration
Cosmic graph: Aggregate into a weighted similarity graph
Selection: Choose representative configurations via graph distances
# Access intermediate results
print(f"PCA configurations: {len(model.pca_results_)}")
print(f"Ball Mapper graphs: {len(model.ball_mapper_graphs_)}")
print(f"Weighted adjacency shape: {model.weighted_adjacency_.shape}")
Performance Tips¶
Pulsar’s Rust core provides significant speedups. For large datasets:
# Reduce sweep resolution for faster iteration
model = ThemaRS(
data="large_data.csv",
pca_dims=[5], # Single dimension
epsilon_range=(0.2, 0.4, 3), # Fewer epsilon steps
)
Next Steps¶
Programmatic - Full API control
Tuning Guide - Tuning sweep parameters
Configuration - YAML schema reference