.. _quickstart: ========== Quickstart ========== Get from zero to insights in under 10 minutes. .. code-block:: bash uv pip install pulsar Prerequisites ------------- - Python 3.10+ - For development: Rust toolchain Option 1: Use a Pre-Built Demo (Fastest) ----------------------------------------- The fastest way to see Pulsar in action: .. code-block:: bash # Run the penguins demo (no data download needed) cd /path/to/pulsar uv sync uv run maturin develop --release python -c " from pulsar.pipeline import ThemaRS config = {'run': {'name': 'penguins', 'data': 'demos/penguins/penguins.csv'}} model = ThemaRS.from_dict(config) model.fit() print(f'Cosmic graph: {len(model.cosmic_graph.nodes())} nodes, {len(model.cosmic_graph.edges())} edges') " Done! You've discovered penguin species structure without looking at species labels. For all demos: :ref:`demos` Option 2: Use with Claude AI (No Code) --------------------------------------- Let Claude handle the analysis: 1. Set up Pulsar MCP server (see :ref:`mcp`) 2. Open Claude Desktop 3. Paste: *"Analyze the file at ``demos/penguins/penguins.csv`` using Pulsar. Find the hidden structure."* Claude will orchestrate parameter tuning and generate a statistical dossier. Option 3: YAML-Driven Workflow (Recommended for Reproducibility) ----------------------------------------------------------------- Use YAML configuration for transparent, reproducible pipelines. **Step 1: Create a configuration file** Create ``params.yaml``: .. code-block:: yaml data: path: "data.csv" preprocessing: drop_columns: [id] impute: age: {method: fill_mean} salary: {method: fill_median} category: {method: sample_categorical, seed: 42} encode: category: {method: one_hot} sweep: pca: dimensions: {values: [2, 5, 10]} seed: {values: [42, 7, 13]} ball_mapper: epsilon: {range: {min: 0.1, max: 0.5, steps: 5}} cosmic_graph: threshold: "auto" **Step 2: Run the pipeline** .. code-block:: python from pulsar import ThemaRS model = ThemaRS("params.yaml") model.fit() # Access the final graph graph = model.cosmic_graph print(f"Nodes: {graph.number_of_nodes()}") print(f"Edges: {graph.number_of_edges()}") **Step 3: Select representatives** .. code-block:: python # Get the top 3 representative configurations reps = model.select_representatives(k=3) for i, rep in enumerate(reps): print(f"Representative {i+1}: {rep}") Option 4: Programmatic Configuration (Full Control) ----------------------------------------------------- For maximum control, configure directly in Python: .. code-block:: python from pulsar import ThemaRS model = ThemaRS( data="data.csv", pca_dims=[2, 5, 10], epsilon_range=(0.1, 0.5, 5), random_state=42, ) model.fit() Understanding the Pipeline -------------------------- Pulsar executes these stages: 1. **Impute**: Fill missing values in specified columns 2. **Scale**: StandardScaler normalization 3. **PCA sweep**: Project data to multiple dimensions 4. **Ball Mapper sweep**: Build neighborhood graphs at multiple epsilon values 5. **Pseudo-Laplacians**: Compute graph Laplacians for each configuration 6. **Cosmic graph**: Aggregate into a weighted similarity graph 7. **Selection**: Choose representative configurations via graph distances .. code-block:: python # Access intermediate results print(f"PCA configurations: {len(model.pca_results_)}") print(f"Ball Mapper graphs: {len(model.ball_mapper_graphs_)}") print(f"Weighted adjacency shape: {model.weighted_adjacency_.shape}") Performance Tips ---------------- Pulsar's Rust core provides significant speedups. For large datasets: .. code-block:: python # Reduce sweep resolution for faster iteration model = ThemaRS( data="large_data.csv", pca_dims=[5], # Single dimension epsilon_range=(0.2, 0.4, 3), # Fewer epsilon steps ) Next Steps ---------- - :doc:`programmatic` - Full API control - :doc:`intermediate` - Tuning sweep parameters - :ref:`Configuration ` - YAML schema reference