Language Handler Extension Guide
This guide walks you through creating a new language handler for the FAOS Code Intelligence platform. By the end, you will have a working handler that passes all contract tests and integrates with the existing parsing pipeline.
Time estimate: 1-2 weeks for a production-quality handler (NFR-SC4).
Architecture Overview
The code intelligence system uses a plugin architecture (SDP-3: Open/Closed Principle): adding a new language requires zero changes to core code. You drop in a handler module, register it, and the pipeline picks it up.
tree_sitter_parser.py protocol.py handler_registry.py
(grammar loading) (LanguageHandler protocol) (LANGUAGE_HANDLERS dict)
| | |
v v v
parse source ──> handler.parse(source) ──> list[CodeEntity] ──> pipeline
Key modules (all under services/api/src/faos_api/knowledge_fabric/code/parsers/):
| Module | Purpose |
|---|---|
protocol.py | LanguageHandler protocol, CodeEntity, EntityType |
handler_registry.py | LANGUAGE_HANDLERS dict + get_handler() |
tree_sitter_parser.py | Grammar loading, detect_language(), TreeSitterParser |
python_handler.py | Reference implementation (Python) |
languages/ | Sub-package for newer handlers (C#, Go, and yours) |
Prerequisites
- Python 3.11+
tree-sitter(Python bindings)tree-sitter-{language}grammar package for your target language- Familiarity with Tree-sitter AST node types for the language
Step 1: Create the Handler Module
Create a new file in parsers/languages/:
parsers/languages/lua_handler.py
Minimal Handler Skeleton
"""
Lua language handler for code intelligence.
Implements the LanguageHandler protocol for Lua source files.
"""
from __future__ import annotations
import logging
from collections.abc import Sequence
from tree_sitter import Node
from faos_api.knowledge_fabric.code.parsers.protocol import (
ClassInfo,
CodeEntity,
EnumInfo,
EntityType,
ExportInfo,
FunctionInfo,
ImportInfo,
InheritanceInfo,
InterfaceInfo,
MethodInfo,
TypeAliasInfo,
)
logger = logging.getLogger(__name__)
class LuaHandler:
"""
Lua-specific language handler implementing the LanguageHandler protocol.
"""
# ---- Unified Parse (required) ----
def parse(
self,
source: str,
*,
file_path: str = "",
module_path: str = "",
) -> list[CodeEntity]:
"""
Parse Lua source and return all entities as CodeEntity list.
Args:
source: Lua source code string.
file_path: Path to the source file.
module_path: Dot-separated module path for qualified names.
Returns:
List of CodeEntity. Empty list on parse failure.
"""
try:
return self._parse_impl(source, file_path=file_path, module_path=module_path)
except Exception:
logger.exception("Failed to parse %s", file_path)
return []
def _parse_impl(
self,
source: str,
*,
file_path: str,
module_path: str,
) -> list[CodeEntity]:
"""Core parse logic."""
from faos_api.knowledge_fabric.code.parsers.tree_sitter_parser import (
TreeSitterParser,
)
src_bytes = source.encode("utf-8")
parser = TreeSitterParser()
tree = parser.parse_bytes(src_bytes, "lua")
root = tree.root_node
entities: list[CodeEntity] = []
# Extract functions
entities.extend(self._extract_functions_as_entities(root, src_bytes, file_path, module_path))
return entities
def _extract_functions_as_entities(
self,
root: Node,
source: bytes,
file_path: str,
module_path: str,
) -> list[CodeEntity]:
"""Extract Lua function declarations as CodeEntity."""
entities: list[CodeEntity] = []
for node in root.children:
if node.type != "function_declaration":
continue
name_node = node.child_by_field_name("name")
name = (
source[name_node.start_byte : name_node.end_byte].decode()
if name_node
else "<anonymous>"
)
q_name = f"{module_path}.{name}" if module_path else name
entities.append(
CodeEntity(
name=name,
qualified_name=q_name,
file_path=file_path,
line_start=node.start_point[0] + 1,
line_end=node.end_point[0] + 1,
signature=f"function {name}(...)",
entity_type=EntityType.FUNCTION,
)
)
return entities
# ---- Legacy extraction methods (required by protocol) ----
def extract_classes(self, root: Node, source: bytes) -> Sequence[ClassInfo]:
return [] # Lua has no classes
def extract_functions(self, root: Node, source: bytes) -> Sequence[FunctionInfo]:
functions: list[FunctionInfo] = []
for node in root.children:
if node.type != "function_declaration":
continue
name_node = node.child_by_field_name("name")
name = (
source[name_node.start_byte : name_node.end_byte].decode()
if name_node
else "<anonymous>"
)
functions.append(
FunctionInfo(
name=name,
qualified_name="",
start_line=node.start_point[0] + 1,
end_line=node.end_point[0] + 1,
)
)
return functions
def extract_methods(self, cls_node: Node, source: bytes) -> Sequence[MethodInfo]:
return []
def extract_imports(self, root: Node, source: bytes) -> Sequence[ImportInfo]:
return [] # Lua uses require(), handle if needed
def extract_exports(self, root: Node, source: bytes) -> Sequence[ExportInfo]:
return []
def extract_inheritance(self, cls_node: Node, source: bytes) -> Sequence[InheritanceInfo]:
return []
def extract_enums(self, root: Node, source: bytes) -> Sequence[EnumInfo]:
return []
def extract_interfaces(self, root: Node, source: bytes) -> Sequence[InterfaceInfo]:
return []
def get_qualified_name(self, node: Node, module_path: str) -> str:
name_node = node.child_by_field_name("name")
name = name_node.text.decode() if name_node and name_node.text else ""
return f"{module_path}.{name}" if module_path else name
# ---- Language Metadata (required) ----
def package_indicator_files(self) -> list[str]:
return [] # Lua has no package indicator files
def resolve_module_system(self, import_str: str, file_path: str) -> str:
return "lua"
# ---- Optional Extension Points ----
def custom_tree_sitter_queries(self) -> dict[str, str]:
return {}
Key Patterns
parse()wraps_parse_impl()with try/except -- errors must never crash the pipeline (AC6 from 255-2).- Deferred import of
TreeSitterParserinside_parse_impl()to avoid circular imports. - Legacy methods return empty sequences for entity types the language does not have.
- All entities are
CodeEntityinstances with properentity_type.
Step 2: Register the Grammar in tree_sitter_parser.py
Three changes are needed.
2a. Extension Mapping
Add entries to EXTENSION_MAP (and COMPOUND_EXTENSIONS if needed):
# In tree_sitter_parser.py
EXTENSION_MAP: dict[str, str] = {
# ... existing entries ...
".lua": "lua",
}
2b. Grammar Loading
Add a branch to _get_language():
elif language == "lua":
import tree_sitter_lua
return tree_sitter.Language(tree_sitter_lua.language())
2c. Install the Grammar Package
pip install tree-sitter-lua
And add to your pyproject.toml or requirements.txt.
Step 3: Register the Handler in handler_registry.py
3a. Factory Function
def _lua_handler_factory() -> LanguageHandler:
"""Lazily create a LuaHandler to avoid circular imports."""
from faos_api.knowledge_fabric.code.parsers.languages.lua_handler import (
LuaHandler,
)
return LuaHandler()
3b. Add to LANGUAGE_HANDLERS Dict
LANGUAGE_HANDLERS: dict[str, Callable[[], LanguageHandler]] = {
# ... existing entries ...
"lua": _lua_handler_factory,
}
3c. Add to create_default_registry()
# Lua handler — optional
try:
from faos_api.knowledge_fabric.code.parsers.languages.lua_handler import (
LuaHandler,
)
registry.register("lua", LuaHandler())
except ImportError:
logger.debug("Lua handler not yet available")
Step 4: Write Contract Tests
Create tests/unit/python/knowledge_fabric/code_intel/test_lua_handler.py:
"""
Tests for Lua Language Handler.
Follows the contract testing pattern from existing handlers.
"""
import pytest
from faos_api.knowledge_fabric.code.parsers.handler_registry import (
LANGUAGE_HANDLERS,
get_handler,
)
from faos_api.knowledge_fabric.code.parsers.protocol import (
CodeEntity,
EntityType,
LanguageHandler,
)
from faos_api.knowledge_fabric.code.parsers.languages.lua_handler import LuaHandler
@pytest.fixture
def handler() -> LuaHandler:
return LuaHandler()
# ---- Protocol conformance (required for every handler) ----
class TestProtocolConformance:
"""Every handler MUST pass these tests."""
def test_is_language_handler(self):
assert isinstance(LuaHandler(), LanguageHandler)
def test_parse_returns_list_of_code_entity(self, handler: LuaHandler):
entities = handler.parse("function foo() end")
assert isinstance(entities, list)
for e in entities:
assert isinstance(e, CodeEntity)
def test_parse_empty_source(self, handler: LuaHandler):
entities = handler.parse("")
assert entities == []
def test_handler_registered(self):
assert "lua" in LANGUAGE_HANDLERS
h = LANGUAGE_HANDLERS["lua"]()
assert isinstance(h, LuaHandler)
def test_get_handler_returns_lua(self):
h = get_handler("lua")
assert h is not None
assert isinstance(h, LuaHandler)
# ---- Entity extraction ----
class TestFunctionExtraction:
def test_simple_function(self, handler: LuaHandler):
src = "function greet(name)\n print(name)\nend"
entities = handler.parse(src)
funcs = [e for e in entities if e.entity_type == EntityType.FUNCTION]
assert len(funcs) >= 1
assert funcs[0].name == "greet"
def test_qualified_name(self, handler: LuaHandler):
src = "function add(a, b) return a + b end"
entities = handler.parse(src, module_path="math_utils")
funcs = [e for e in entities if e.entity_type == EntityType.FUNCTION]
assert funcs[0].qualified_name == "math_utils.add"
def test_file_path_propagated(self, handler: LuaHandler):
src = "function foo() end"
entities = handler.parse(src, file_path="lib/utils.lua")
assert all(e.file_path == "lib/utils.lua" for e in entities)
# ---- Error resilience ----
class TestErrorResilience:
def test_malformed_source_returns_empty(self, handler: LuaHandler):
"""parse() must never raise -- return [] on error."""
entities = handler.parse("function ??? {{{{")
assert isinstance(entities, list)
Required Test Categories
Every handler test suite must include:
| Category | Purpose | Minimum |
|---|---|---|
| Protocol conformance | isinstance(handler, LanguageHandler), parse returns list[CodeEntity] | 5 tests |
| Entity extraction | Classes, functions, methods, enums per language | 10+ tests |
| Qualified names | module_path propagation | 2+ tests |
| Error resilience | Malformed source returns [] | 2+ tests |
| Registration | Present in LANGUAGE_HANDLERS, get_handler() works | 2+ tests |
| Language-specific | Unique patterns (e.g., Go receivers, Java annotations) | 5+ tests |
Step 5: CI Integration
Handler tests automatically run in CI because they follow the standard pytest structure:
# Run your handler tests
pytest tests/unit/python/knowledge_fabric/code_intel/test_lua_handler.py -v
# Run ALL handler tests (contract suite)
pytest tests/unit/python/knowledge_fabric/code_intel/ -v -k "handler"
Test File Naming Convention
test_{language}_handler.py # e.g., test_lua_handler.py
Tests are discovered automatically by the CI pipeline. No additional CI configuration is needed.
Ground Truth Validation (Optional)
For production-quality handlers, create ground truth fixtures:
tests/fixtures/code_intel/{language}/
simple_class.{ext}
complex_module.{ext}
expected_entities.json
The expected_entities.json schema:
{
"file": "simple_class.lua",
"entities": [
{
"name": "MyClass",
"entity_type": "class",
"line_start": 1,
"line_end": 10,
"qualified_name": "simple_class.MyClass"
}
]
}
Complete Checklist
Before submitting a PR for your handler:
- Handler class implements all
LanguageHandlerprotocol methods -
parse()returnslist[CodeEntity](never raises) - Grammar package installed and branch added to
_get_language() - Extension(s) added to
EXTENSION_MAP - Factory function added to
LANGUAGE_HANDLERS - Entry added to
create_default_registry() - 25+ tests covering protocol, extraction, qualified names, errors
-
isinstance(YourHandler(), LanguageHandler)passes -
get_handler("your_language")returns your handler - All CI tests pass
Existing Handlers for Reference
| Handler | File | Language-specific features |
|---|---|---|
PythonHandler | python_handler.py | Decorators, __all__ exports, enum detection |
TypeScriptHandler | typescript_handler.py | ESM/CJS imports, interfaces, JSX |
JavaHandler | java_handler.py | Annotations, inner classes, generics, Spring Boot |
CSharpHandler | languages/csharp_handler.py | Namespaces, properties, LINQ, attributes |
GoHandler | languages/go_handler.py | Receiver methods, embedded structs, implicit interfaces |