Overview¶
Pulsar is a Rust-accelerated topological pipeline for exploring model spaces through systematic parameter sweeps. It transforms raw data into a Cosmic graph that reveals relationships between different model configurations.
The Problem¶
When analyzing data, you face many preprocessing choices:
Which imputation strategy?
How many PCA dimensions?
What neighborhood size for graph construction?
Each combination produces a different representation. Pulsar explores this space systematically and uses topological methods to identify representative configurations.
Architecture¶
Pulsar combines Python ergonomics with Rust performance:
graph TB
subgraph "Python Layer"
A["ThemaRS API (pipeline.py)"]
B["Config parsing (config.py)"]
C["NetworkX integration (analysis/hooks.py)"]
P["Progress reporting (runtime/progress.py)"]
T["Temporal graphs (representations/temporal.py)"]
end
subgraph "Rust Core (PyO3)"
D["Imputation"]
E["PCA computation"]
F["Ball Mapper"]
G["Laplacian accumulation"]
end
subgraph "Output"
H["Cosmic Graph"]
I["Representatives"]
end
A --> B
B --> D
D --> E
E --> F
F --> G
G --> H
H --> C
C --> I
style A fill:#f9f9f9,stroke:#999
style D fill:#FCF3CF,stroke:#D4AC0D,stroke-width:2px
style E fill:#FCF3CF,stroke:#D4AC0D,stroke-width:2px
style F fill:#FCF3CF,stroke:#D4AC0D,stroke-width:2px
style G fill:#FCF3CF,stroke:#D4AC0D,stroke-width:2px
style H fill:#DFF0D8,stroke:#3C763D,stroke-width:2px
Pipeline Stages¶
1. Data Loading & Imputation
Load tabular data and fill missing values with configurable strategies (mean, median, or custom). Multiple imputation seeds generate diverse candidates.
2. Scaling & PCA Sweep
StandardScaler normalization followed by PCA projection. Pulsar sweeps across multiple dimension settings to explore different embedding spaces.
3. Ball Mapper Graph Construction
For each PCA projection, build Ball Mapper graphs at multiple epsilon values. This captures local structure at different scales.
4. Pseudo-Laplacian Accumulation
Compute graph Laplacians for each Ball Mapper configuration and aggregate them into a summary representation.
5. Cosmic Graph Assembly
Combine pseudo-Laplacians into a weighted graph where edges represent similarity between configurations.
6. Representative Selection
Use graph distances (e.g., Forman-Ricci curvature) to identify the most central configurations.
Configuration Model¶
Pulsar uses a hierarchical configuration:
run:
name: my_experiment
data: path/to/data.csv
preprocessing:
drop_columns: [id, timestamp]
impute:
age: {method: sample_normal, seed: 42}
category: {method: sample_categorical, seed: 7}
sweep:
pca:
dimensions: {values: [2, 5, 10, 20]}
seed: {values: [42, 7, 13]}
ball_mapper:
epsilon: {range: {min: 0.1, max: 1.5, steps: 8}}
cosmic_graph:
threshold: "auto"
Key Outputs¶
Output |
Description |
|---|---|
|
NetworkX graph with weighted edges |
|
Dense similarity matrix |
|
List of PCA projections |
|
List of Ball Mapper graphs |
|
Threshold selection diagnostics (if |
Performance¶
The Rust core provides significant speedups:
10-100x faster Ball Mapper construction
Parallel PCA computation across configurations
Memory efficient Laplacian accumulation
For large datasets (>10k rows) or extensive sweeps (>100 configurations), Pulsar’s Rust implementation is essential.
Next Steps¶
Quickstart - Run your first pipeline
User Guide - Configuration details
Configuration - YAML schema reference