topos.utils.tree_sitter

Tree-Sitter Module

Infrastructure for AST parsing and normalization.

This module provides a language-agnostic interface to tree-sitter, designed for future extension to multiple programming languages. Currently supports Python, with a clear extension path for others.

Mathematical Context:

In our categorical framework, parsing acts as a functor from the category of source texts to the category of syntax trees. It discards surface-level detail (whitespace, comments, formatting) while preserving computational structure. This is the left adjoint to the ‘realization’ (pretty-printing) functor that maps trees back to text—not a forgetful functor, which goes from more structure to less.

tree-sitter provides incremental parsing, making it efficient for repeated analysis of evolving code—perfect for evaluating code as it’s being generated or modified.

Usage:

from topos.utils.tree_sitter import parse_python, PythonParser

# Quick parsing root = parse_python(“def foo(): pass”)

# Parser instance for repeated use parser = PythonParser() root = parser.parse(“def bar(): return 42”)

class topos.utils.tree_sitter.LanguageParser(*args, **kwargs)[source]

Bases: Protocol

Protocol for language-specific parsers.

language
parse(source)[source]

Parse source code and return the root AST node.

class topos.utils.tree_sitter.PythonParser(language='python')[source]

Bases: object

Parser for Python source code using tree-sitter.

This class wraps tree-sitter’s Python parser, providing a clean interface for AST generation.

language

The language identifier (‘python’).

Type:

str

Example:

parser = PythonParser()
root = parser.parse("print('hello')")
for child in root.children:
    print(child.type)
language = 'python'
parse(source)[source]

Parse Python source code into an AST.

Parameters:

source – Python source code as a string.

Returns:

The root Node of the parsed AST.

Raises:

ValueError – If parsing fails catastrophically.

parse_bytes(source)[source]

Parse Python source code from bytes.

Parameters:

source – Python source code as bytes.

Returns:

The root Node of the parsed AST.

class topos.utils.tree_sitter.RustParser(language='rust')[source]

Bases: object

Parser for Rust source code using tree-sitter.

language = 'rust'
parse(source)[source]
class topos.utils.tree_sitter.JavaScriptParser(language='javascript')[source]

Bases: object

Parser for JavaScript source code using tree-sitter.

language = 'javascript'
parse(source)[source]
class topos.utils.tree_sitter.TypeScriptParser(language='typescript', _is_tsx=False)[source]

Bases: object

Parser for TypeScript / TSX using tree-sitter-typescript / tree-sitter-tsx.

language = 'typescript'
parse(source)[source]
class topos.utils.tree_sitter.CppParser(language='cpp')[source]

Bases: object

Parser for C++ source code using tree-sitter.

language = 'cpp'
parse(source)[source]
topos.utils.tree_sitter.get_python_parser()[source]

Get the shared Python parser instance.

Returns a singleton parser instance for efficiency when parsing multiple files.

Returns:

The shared PythonParser instance.

topos.utils.tree_sitter.get_rust_parser()[source]

Get the shared Rust parser instance.

topos.utils.tree_sitter.get_javascript_parser()[source]

Get the shared JavaScript parser instance.

topos.utils.tree_sitter.get_tsx_parser()[source]

Shared parser for .tsx (JSX) sources.

topos.utils.tree_sitter.get_typescript_parser()[source]

Shared parser for .ts sources (non-TSX grammar).

topos.utils.tree_sitter.get_cpp_parser()[source]

Get the shared C++ parser instance.

topos.utils.tree_sitter.parse_python(source)[source]

Parse Python source code into an AST.

Convenience function that uses the shared parser instance.

Parameters:

source – Python source code as a string.

Returns:

The root Node of the parsed AST.

Example

root = parse_python(“x = 1 + 2”) assert root.type == “module”

topos.utils.tree_sitter.parse_rust(source)[source]

Parse Rust source code into an AST.

topos.utils.tree_sitter.parse_javascript(source)[source]

Parse JavaScript source code into an AST.

topos.utils.tree_sitter.parse_typescript(source, file=None)[source]

Parse TypeScript or TSX; uses the TSX grammar when file ends with .tsx.

topos.utils.tree_sitter.parse_cpp(source)[source]

Parse C++ source code into an AST.

topos.utils.tree_sitter.node_text(node, source)[source]

Extract the source text corresponding to an AST node.

Parameters:
  • node – The AST node.

  • source – The original source code.

Returns:

The text slice corresponding to the node.

topos.utils.tree_sitter.node_to_sexp(node)[source]

Convert a node to S-expression format.

S-expressions are a compact textual representation of tree structure, useful for debugging and comparison.

Parameters:

node – The AST node to convert.

Returns:

The S-expression string representation.

Example

>>> parse_python("x = 1")
"(module (expression_statement (assignment ...)))"
topos.utils.tree_sitter.find_errors(node)[source]

Find all error nodes in an AST.

Error nodes indicate syntax errors in the source code.

Parameters:

node – The root node to search from.

Returns:

A list of all ERROR nodes in the tree.