Skip to main content

Language Handler Extension Guide

This guide walks you through creating a new language handler for the FAOS Code Intelligence platform. By the end, you will have a working handler that passes all contract tests and integrates with the existing parsing pipeline.

Time estimate: 1-2 weeks for a production-quality handler (NFR-SC4).

Architecture Overview​

The code intelligence system uses a plugin architecture (SDP-3: Open/Closed Principle): adding a new language requires zero changes to core code. You drop in a handler module, register it, and the pipeline picks it up.

tree_sitter_parser.py protocol.py handler_registry.py
(grammar loading) (LanguageHandler protocol) (LANGUAGE_HANDLERS dict)
| | |
v v v
parse source ──> handler.parse(source) ──> list[CodeEntity] ──> pipeline

Key modules (all under services/api/src/faos_api/knowledge_fabric/code/parsers/):

ModulePurpose
protocol.pyLanguageHandler protocol, CodeEntity, EntityType
handler_registry.pyLANGUAGE_HANDLERS dict + get_handler()
tree_sitter_parser.pyGrammar loading, detect_language(), TreeSitterParser
python_handler.pyReference implementation (Python)
languages/Sub-package for newer handlers (C#, Go, and yours)

Prerequisites​

  • Python 3.11+
  • tree-sitter (Python bindings)
  • tree-sitter-{language} grammar package for your target language
  • Familiarity with Tree-sitter AST node types for the language

Step 1: Create the Handler Module​

Create a new file in parsers/languages/:

parsers/languages/lua_handler.py

Minimal Handler Skeleton​

"""
Lua language handler for code intelligence.

Implements the LanguageHandler protocol for Lua source files.
"""

from __future__ import annotations

import logging
from collections.abc import Sequence

from tree_sitter import Node

from faos_api.knowledge_fabric.code.parsers.protocol import (
ClassInfo,
CodeEntity,
EnumInfo,
EntityType,
ExportInfo,
FunctionInfo,
ImportInfo,
InheritanceInfo,
InterfaceInfo,
MethodInfo,
TypeAliasInfo,
)

logger = logging.getLogger(__name__)


class LuaHandler:
"""
Lua-specific language handler implementing the LanguageHandler protocol.
"""

# ---- Unified Parse (required) ----

def parse(
self,
source: str,
*,
file_path: str = "",
module_path: str = "",
) -> list[CodeEntity]:
"""
Parse Lua source and return all entities as CodeEntity list.

Args:
source: Lua source code string.
file_path: Path to the source file.
module_path: Dot-separated module path for qualified names.

Returns:
List of CodeEntity. Empty list on parse failure.
"""
try:
return self._parse_impl(source, file_path=file_path, module_path=module_path)
except Exception:
logger.exception("Failed to parse %s", file_path)
return []

def _parse_impl(
self,
source: str,
*,
file_path: str,
module_path: str,
) -> list[CodeEntity]:
"""Core parse logic."""
from faos_api.knowledge_fabric.code.parsers.tree_sitter_parser import (
TreeSitterParser,
)

src_bytes = source.encode("utf-8")
parser = TreeSitterParser()
tree = parser.parse_bytes(src_bytes, "lua")
root = tree.root_node

entities: list[CodeEntity] = []
# Extract functions
entities.extend(self._extract_functions_as_entities(root, src_bytes, file_path, module_path))
return entities

def _extract_functions_as_entities(
self,
root: Node,
source: bytes,
file_path: str,
module_path: str,
) -> list[CodeEntity]:
"""Extract Lua function declarations as CodeEntity."""
entities: list[CodeEntity] = []
for node in root.children:
if node.type != "function_declaration":
continue

name_node = node.child_by_field_name("name")
name = (
source[name_node.start_byte : name_node.end_byte].decode()
if name_node
else "<anonymous>"
)
q_name = f"{module_path}.{name}" if module_path else name

entities.append(
CodeEntity(
name=name,
qualified_name=q_name,
file_path=file_path,
line_start=node.start_point[0] + 1,
line_end=node.end_point[0] + 1,
signature=f"function {name}(...)",
entity_type=EntityType.FUNCTION,
)
)
return entities

# ---- Legacy extraction methods (required by protocol) ----

def extract_classes(self, root: Node, source: bytes) -> Sequence[ClassInfo]:
return [] # Lua has no classes

def extract_functions(self, root: Node, source: bytes) -> Sequence[FunctionInfo]:
functions: list[FunctionInfo] = []
for node in root.children:
if node.type != "function_declaration":
continue
name_node = node.child_by_field_name("name")
name = (
source[name_node.start_byte : name_node.end_byte].decode()
if name_node
else "<anonymous>"
)
functions.append(
FunctionInfo(
name=name,
qualified_name="",
start_line=node.start_point[0] + 1,
end_line=node.end_point[0] + 1,
)
)
return functions

def extract_methods(self, cls_node: Node, source: bytes) -> Sequence[MethodInfo]:
return []

def extract_imports(self, root: Node, source: bytes) -> Sequence[ImportInfo]:
return [] # Lua uses require(), handle if needed

def extract_exports(self, root: Node, source: bytes) -> Sequence[ExportInfo]:
return []

def extract_inheritance(self, cls_node: Node, source: bytes) -> Sequence[InheritanceInfo]:
return []

def extract_enums(self, root: Node, source: bytes) -> Sequence[EnumInfo]:
return []

def extract_interfaces(self, root: Node, source: bytes) -> Sequence[InterfaceInfo]:
return []

def get_qualified_name(self, node: Node, module_path: str) -> str:
name_node = node.child_by_field_name("name")
name = name_node.text.decode() if name_node and name_node.text else ""
return f"{module_path}.{name}" if module_path else name

# ---- Language Metadata (required) ----

def package_indicator_files(self) -> list[str]:
return [] # Lua has no package indicator files

def resolve_module_system(self, import_str: str, file_path: str) -> str:
return "lua"

# ---- Optional Extension Points ----

def custom_tree_sitter_queries(self) -> dict[str, str]:
return {}

Key Patterns​

  1. parse() wraps _parse_impl() with try/except -- errors must never crash the pipeline (AC6 from 255-2).
  2. Deferred import of TreeSitterParser inside _parse_impl() to avoid circular imports.
  3. Legacy methods return empty sequences for entity types the language does not have.
  4. All entities are CodeEntity instances with proper entity_type.

Step 2: Register the Grammar in tree_sitter_parser.py​

Three changes are needed.

2a. Extension Mapping​

Add entries to EXTENSION_MAP (and COMPOUND_EXTENSIONS if needed):

# In tree_sitter_parser.py

EXTENSION_MAP: dict[str, str] = {
# ... existing entries ...
".lua": "lua",
}

2b. Grammar Loading​

Add a branch to _get_language():

elif language == "lua":
import tree_sitter_lua

return tree_sitter.Language(tree_sitter_lua.language())

2c. Install the Grammar Package​

pip install tree-sitter-lua

And add to your pyproject.toml or requirements.txt.

Step 3: Register the Handler in handler_registry.py​

3a. Factory Function​

def _lua_handler_factory() -> LanguageHandler:
"""Lazily create a LuaHandler to avoid circular imports."""
from faos_api.knowledge_fabric.code.parsers.languages.lua_handler import (
LuaHandler,
)

return LuaHandler()

3b. Add to LANGUAGE_HANDLERS Dict​

LANGUAGE_HANDLERS: dict[str, Callable[[], LanguageHandler]] = {
# ... existing entries ...
"lua": _lua_handler_factory,
}

3c. Add to create_default_registry()​

# Lua handler β€” optional
try:
from faos_api.knowledge_fabric.code.parsers.languages.lua_handler import (
LuaHandler,
)
registry.register("lua", LuaHandler())
except ImportError:
logger.debug("Lua handler not yet available")

Step 4: Write Contract Tests​

Create tests/unit/python/knowledge_fabric/code_intel/test_lua_handler.py:

"""
Tests for Lua Language Handler.

Follows the contract testing pattern from existing handlers.
"""

import pytest

from faos_api.knowledge_fabric.code.parsers.handler_registry import (
LANGUAGE_HANDLERS,
get_handler,
)
from faos_api.knowledge_fabric.code.parsers.protocol import (
CodeEntity,
EntityType,
LanguageHandler,
)
from faos_api.knowledge_fabric.code.parsers.languages.lua_handler import LuaHandler


@pytest.fixture
def handler() -> LuaHandler:
return LuaHandler()


# ---- Protocol conformance (required for every handler) ----

class TestProtocolConformance:
"""Every handler MUST pass these tests."""

def test_is_language_handler(self):
assert isinstance(LuaHandler(), LanguageHandler)

def test_parse_returns_list_of_code_entity(self, handler: LuaHandler):
entities = handler.parse("function foo() end")
assert isinstance(entities, list)
for e in entities:
assert isinstance(e, CodeEntity)

def test_parse_empty_source(self, handler: LuaHandler):
entities = handler.parse("")
assert entities == []

def test_handler_registered(self):
assert "lua" in LANGUAGE_HANDLERS
h = LANGUAGE_HANDLERS["lua"]()
assert isinstance(h, LuaHandler)

def test_get_handler_returns_lua(self):
h = get_handler("lua")
assert h is not None
assert isinstance(h, LuaHandler)


# ---- Entity extraction ----

class TestFunctionExtraction:
def test_simple_function(self, handler: LuaHandler):
src = "function greet(name)\n print(name)\nend"
entities = handler.parse(src)
funcs = [e for e in entities if e.entity_type == EntityType.FUNCTION]
assert len(funcs) >= 1
assert funcs[0].name == "greet"

def test_qualified_name(self, handler: LuaHandler):
src = "function add(a, b) return a + b end"
entities = handler.parse(src, module_path="math_utils")
funcs = [e for e in entities if e.entity_type == EntityType.FUNCTION]
assert funcs[0].qualified_name == "math_utils.add"

def test_file_path_propagated(self, handler: LuaHandler):
src = "function foo() end"
entities = handler.parse(src, file_path="lib/utils.lua")
assert all(e.file_path == "lib/utils.lua" for e in entities)


# ---- Error resilience ----

class TestErrorResilience:
def test_malformed_source_returns_empty(self, handler: LuaHandler):
"""parse() must never raise -- return [] on error."""
entities = handler.parse("function ??? {{{{")
assert isinstance(entities, list)

Required Test Categories​

Every handler test suite must include:

CategoryPurposeMinimum
Protocol conformanceisinstance(handler, LanguageHandler), parse returns list[CodeEntity]5 tests
Entity extractionClasses, functions, methods, enums per language10+ tests
Qualified namesmodule_path propagation2+ tests
Error resilienceMalformed source returns []2+ tests
RegistrationPresent in LANGUAGE_HANDLERS, get_handler() works2+ tests
Language-specificUnique patterns (e.g., Go receivers, Java annotations)5+ tests

Step 5: CI Integration​

Handler tests automatically run in CI because they follow the standard pytest structure:

# Run your handler tests
pytest tests/unit/python/knowledge_fabric/code_intel/test_lua_handler.py -v

# Run ALL handler tests (contract suite)
pytest tests/unit/python/knowledge_fabric/code_intel/ -v -k "handler"

Test File Naming Convention​

test_{language}_handler.py # e.g., test_lua_handler.py

Tests are discovered automatically by the CI pipeline. No additional CI configuration is needed.

Ground Truth Validation (Optional)​

For production-quality handlers, create ground truth fixtures:

tests/fixtures/code_intel/{language}/
simple_class.{ext}
complex_module.{ext}
expected_entities.json

The expected_entities.json schema:

{
"file": "simple_class.lua",
"entities": [
{
"name": "MyClass",
"entity_type": "class",
"line_start": 1,
"line_end": 10,
"qualified_name": "simple_class.MyClass"
}
]
}

Complete Checklist​

Before submitting a PR for your handler:

  • Handler class implements all LanguageHandler protocol methods
  • parse() returns list[CodeEntity] (never raises)
  • Grammar package installed and branch added to _get_language()
  • Extension(s) added to EXTENSION_MAP
  • Factory function added to LANGUAGE_HANDLERS
  • Entry added to create_default_registry()
  • 25+ tests covering protocol, extraction, qualified names, errors
  • isinstance(YourHandler(), LanguageHandler) passes
  • get_handler("your_language") returns your handler
  • All CI tests pass

Existing Handlers for Reference​

HandlerFileLanguage-specific features
PythonHandlerpython_handler.pyDecorators, __all__ exports, enum detection
TypeScriptHandlertypescript_handler.pyESM/CJS imports, interfaces, JSX
JavaHandlerjava_handler.pyAnnotations, inner classes, generics, Spring Boot
CSharpHandlerlanguages/csharp_handler.pyNamespaces, properties, LINQ, attributes
GoHandlerlanguages/go_handler.pyReceiver methods, embedded structs, implicit interfaces