.. _overview: ======== Overview ======== Pulsar is a **Rust-accelerated topological pipeline** for exploring model spaces through systematic parameter sweeps. It transforms raw data into a Cosmic graph that reveals relationships between different model configurations. The Problem ----------- When analyzing data, you face many preprocessing choices: - Which imputation strategy? - How many PCA dimensions? - What neighborhood size for graph construction? Each combination produces a different representation. Pulsar explores this space systematically and uses topological methods to identify representative configurations. Architecture ------------ Pulsar combines Python ergonomics with Rust performance: .. mermaid:: graph TB subgraph "Python Layer" A["ThemaRS API (pipeline.py)"] B["Config parsing (config.py)"] C["NetworkX integration (analysis/hooks.py)"] P["Progress reporting (runtime/progress.py)"] T["Temporal graphs (representations/temporal.py)"] end subgraph "Rust Core (PyO3)" D["Imputation"] E["PCA computation"] F["Ball Mapper"] G["Laplacian accumulation"] end subgraph "Output" H["Cosmic Graph"] I["Representatives"] end A --> B B --> D D --> E E --> F F --> G G --> H H --> C C --> I style A fill:#f9f9f9,stroke:#999 style D fill:#FCF3CF,stroke:#D4AC0D,stroke-width:2px style E fill:#FCF3CF,stroke:#D4AC0D,stroke-width:2px style F fill:#FCF3CF,stroke:#D4AC0D,stroke-width:2px style G fill:#FCF3CF,stroke:#D4AC0D,stroke-width:2px style H fill:#DFF0D8,stroke:#3C763D,stroke-width:2px Pipeline Stages --------------- **1. Data Loading & Imputation** Load tabular data and fill missing values with configurable strategies (mean, median, or custom). Multiple imputation seeds generate diverse candidates. **2. Scaling & PCA Sweep** StandardScaler normalization followed by PCA projection. Pulsar sweeps across multiple dimension settings to explore different embedding spaces. **3. Ball Mapper Graph Construction** For each PCA projection, build Ball Mapper graphs at multiple epsilon values. This captures local structure at different scales. **4. Pseudo-Laplacian Accumulation** Compute graph Laplacians for each Ball Mapper configuration and aggregate them into a summary representation. **5. Cosmic Graph Assembly** Combine pseudo-Laplacians into a weighted graph where edges represent similarity between configurations. **6. Representative Selection** Use graph distances (e.g., Forman-Ricci curvature) to identify the most central configurations. Configuration Model ------------------- Pulsar uses a hierarchical configuration: .. code-block:: yaml run: name: my_experiment data: path/to/data.csv preprocessing: drop_columns: [id, timestamp] impute: age: {method: sample_normal, seed: 42} category: {method: sample_categorical, seed: 7} sweep: pca: dimensions: {values: [2, 5, 10, 20]} seed: {values: [42, 7, 13]} ball_mapper: epsilon: {range: {min: 0.1, max: 1.5, steps: 8}} cosmic_graph: threshold: "auto" Key Outputs ----------- ========================= ======================================================== Output Description ========================= ======================================================== ``cosmic_graph`` NetworkX graph with weighted edges ``weighted_adjacency`` Dense similarity matrix ``pca_results_`` List of PCA projections ``ball_mapper_graphs_`` List of Ball Mapper graphs ``stability_result`` Threshold selection diagnostics (if ``auto``) ========================= ======================================================== Performance ----------- The Rust core provides significant speedups: - **10-100x faster** Ball Mapper construction - **Parallel** PCA computation across configurations - **Memory efficient** Laplacian accumulation For large datasets (>10k rows) or extensive sweeps (>100 configurations), Pulsar's Rust implementation is essential. Next Steps ---------- - :ref:`Quickstart ` - Run your first pipeline - :ref:`User Guide ` - Configuration details - :ref:`Configuration ` - YAML schema reference