Quickstart¶

Get from zero to insights in under 10 minutes.

uv pip install pulsar

Prerequisites¶

Python 3.10+
For development: Rust toolchain

Option 1: Use a Pre-Built Demo (Fastest)¶

The fastest way to see Pulsar in action:

# Run the penguins demo (no data download needed)
cd /path/to/pulsar
uv sync
uv run maturin develop --release
python -c "
from pulsar.pipeline import ThemaRS
config = {'run': {'name': 'penguins', 'data': 'demos/penguins/penguins.csv'}}
model = ThemaRS.from_dict(config)
model.fit()
print(f'Cosmic graph: {len(model.cosmic_graph.nodes())} nodes, {len(model.cosmic_graph.edges())} edges')
"

Done! You’ve discovered penguin species structure without looking at species labels.

For all demos: Demos

Option 2: Use with Claude AI (No Code)¶

Let Claude handle the analysis:

Set up Pulsar MCP server (see MCP Server)
Open Claude Desktop
Paste: “Analyze the file at ``demos/penguins/penguins.csv`` using Pulsar. Find the hidden structure.”

Claude will orchestrate parameter tuning and generate a statistical dossier.

Option 3: YAML-Driven Workflow (Recommended for Reproducibility)¶

Use YAML configuration for transparent, reproducible pipelines.

Step 1: Create a configuration file

Create params.yaml:

data:
  path: "data.csv"

preprocessing:
  drop_columns: [id]
  impute:
    age:      {method: fill_mean}
    salary:   {method: fill_median}
    category: {method: sample_categorical, seed: 42}
  encode:
    category: {method: one_hot}

sweep:
  projection:
    method: jl
    dimensions: {values: [2, 5, 10]}
    seed: {values: [42, 7, 13]}
    center: true
  ball_mapper:
    epsilon: {range: {min: 0.1, max: 0.5, steps: 5}}

cosmic_graph:
  construction: minhash
  construction_threshold: "auto"

Step 2: Run the pipeline

from pulsar import ThemaRS

model = ThemaRS("params.yaml")
model.fit()

# Access the final graph
graph = model.cosmic_graph
print(f"Nodes: {graph.number_of_nodes()}")
print(f"Edges: {graph.number_of_edges()}")

Step 3: Select representatives

# Get the top 3 representative configurations
reps = model.select_representatives(k=3)
for i, rep in enumerate(reps):
    print(f"Representative {i+1}: {rep}")

Option 4: Programmatic Configuration (Full Control)¶

For maximum control, configure directly in Python:

from pulsar import ThemaRS

model = ThemaRS(
    data="data.csv",
    pca_dims=[2, 5, 10],
    epsilon_range=(0.1, 0.5, 5),
    random_state=42,
)
model.fit()

Understanding the Pipeline¶

Pulsar executes these stages:

Impute: Fill missing values in specified columns
Scale: StandardScaler normalization
Projection sweep: Project data to multiple dimensions with JL by default, or PCA when configured explicitly
Ball Mapper sweep: Build neighborhood graphs at multiple epsilon values
Cosmic graph construction: Fuse Ball Mapper outputs via MinHash (default) or exact sparse pseudo-Laplacian accumulation
Threshold & assembly: Apply construction_threshold to produce a sparse weighted similarity graph
Selection: Choose representative configurations via graph distances

# Access intermediate results
print(f"Projection embeddings: {len(model._embeddings)}")
print(f"Ball Mapper graphs: {len(model.ball_mapper_graphs_)}")
print(f"Weighted adjacency shape: {model.weighted_adjacency_.shape}")

Performance Tips¶

Pulsar’s Rust core provides significant speedups. For large datasets:

# Reduce sweep resolution for faster iteration
model = ThemaRS(
    data="large_data.csv",
    pca_dims=[5],           # Compatibility alias for projection dimensions
    epsilon_range=(0.2, 0.4, 3),  # Fewer epsilon steps
)

Next Steps¶

Programmatic - Full API control
Tuning Guide - Tuning sweep parameters
Configuration - YAML schema reference