topos.utils.tree_sitter¶
Tree-Sitter Module¶
Infrastructure for AST parsing and normalization.
This module provides a language-agnostic interface to tree-sitter, designed for future extension to multiple programming languages. Currently supports Python, with a clear extension path for others.
- Mathematical Context:
In our categorical framework, parsing acts as a functor from the category of source texts to the category of syntax trees. It discards surface-level detail (whitespace, comments, formatting) while preserving computational structure. This is the left adjoint to the ‘realization’ (pretty-printing) functor that maps trees back to text—not a forgetful functor, which goes from more structure to less.
tree-sitter provides incremental parsing, making it efficient for repeated analysis of evolving code—perfect for evaluating code as it’s being generated or modified.
- Usage:
from topos.utils.tree_sitter import parse_python, PythonParser
# Quick parsing root = parse_python(“def foo(): pass”)
# Parser instance for repeated use parser = PythonParser() root = parser.parse(“def bar(): return 42”)
- class topos.utils.tree_sitter.LanguageParser(*args, **kwargs)[source]
Bases:
ProtocolProtocol for language-specific parsers.
- language
- parse(source)[source]
Parse source code and return the root AST node.
- class topos.utils.tree_sitter.PythonParser(language='python')[source]
Bases:
objectParser for Python source code using tree-sitter.
This class wraps tree-sitter’s Python parser, providing a clean interface for AST generation.
- language
The language identifier (‘python’).
- Type:
Example:
parser = PythonParser() root = parser.parse("print('hello')") for child in root.children: print(child.type)
- language = 'python'
- parse(source)[source]
Parse Python source code into an AST.
- Parameters:
source – Python source code as a string.
- Returns:
The root Node of the parsed AST.
- Raises:
ValueError – If parsing fails catastrophically.
- parse_bytes(source)[source]
Parse Python source code from bytes.
- Parameters:
source – Python source code as bytes.
- Returns:
The root Node of the parsed AST.
- class topos.utils.tree_sitter.RustParser(language='rust')[source]
Bases:
objectParser for Rust source code using tree-sitter.
- language = 'rust'
- parse(source)[source]
- class topos.utils.tree_sitter.JavaScriptParser(language='javascript')[source]
Bases:
objectParser for JavaScript source code using tree-sitter.
- language = 'javascript'
- parse(source)[source]
- class topos.utils.tree_sitter.TypeScriptParser(language='typescript', _is_tsx=False)[source]
Bases:
objectParser for TypeScript / TSX using tree-sitter-typescript / tree-sitter-tsx.
- language = 'typescript'
- parse(source)[source]
- class topos.utils.tree_sitter.CppParser(language='cpp')[source]
Bases:
objectParser for C++ source code using tree-sitter.
- language = 'cpp'
- parse(source)[source]
- topos.utils.tree_sitter.get_python_parser()[source]
Get the shared Python parser instance.
Returns a singleton parser instance for efficiency when parsing multiple files.
- Returns:
The shared PythonParser instance.
- topos.utils.tree_sitter.get_rust_parser()[source]
Get the shared Rust parser instance.
- topos.utils.tree_sitter.get_javascript_parser()[source]
Get the shared JavaScript parser instance.
- topos.utils.tree_sitter.get_tsx_parser()[source]
Shared parser for
.tsx(JSX) sources.
- topos.utils.tree_sitter.get_typescript_parser()[source]
Shared parser for
.tssources (non-TSX grammar).
- topos.utils.tree_sitter.get_cpp_parser()[source]
Get the shared C++ parser instance.
- topos.utils.tree_sitter.parse_python(source)[source]
Parse Python source code into an AST.
Convenience function that uses the shared parser instance.
- Parameters:
source – Python source code as a string.
- Returns:
The root Node of the parsed AST.
Example
root = parse_python(“x = 1 + 2”) assert root.type == “module”
- topos.utils.tree_sitter.parse_rust(source)[source]
Parse Rust source code into an AST.
- topos.utils.tree_sitter.parse_javascript(source)[source]
Parse JavaScript source code into an AST.
- topos.utils.tree_sitter.parse_typescript(source, file=None)[source]
Parse TypeScript or TSX; uses the TSX grammar when file ends with
.tsx.
- topos.utils.tree_sitter.parse_cpp(source)[source]
Parse C++ source code into an AST.
- topos.utils.tree_sitter.node_text(node, source)[source]
Extract the source text corresponding to an AST node.
- Parameters:
node – The AST node.
source – The original source code.
- Returns:
The text slice corresponding to the node.
- topos.utils.tree_sitter.node_to_sexp(node)[source]
Convert a node to S-expression format.
S-expressions are a compact textual representation of tree structure, useful for debugging and comparison.
- Parameters:
node – The AST node to convert.
- Returns:
The S-expression string representation.
Example
>>> parse_python("x = 1") "(module (expression_statement (assignment ...)))"
- topos.utils.tree_sitter.find_errors(node)[source]
Find all error nodes in an AST.
Error nodes indicate syntax errors in the source code.
- Parameters:
node – The root node to search from.
- Returns:
A list of all ERROR nodes in the tree.