Architecture Decisions: Designing for Agent Autonomy
When we started designing FAOSX, we had a whiteboard full of questions and zero answers.
Should agents be stateless or stateful? How do you coordinate ten agents without creating chaos? What happens when an agent makes a mistake at 2 AM? Where does the "intelligence" live—in the agent, the orchestrator, or somewhere in between?
Every architectural decision we made had to answer one core question: How do we give agents enough autonomy to be useful while maintaining enough control to be trusted?
Too much autonomy and you get unpredictable systems that enterprises will never deploy. Too little autonomy and you've just built a fancy chatbot with extra steps.
Architecting for Autonomy Is Different​
Traditional software architecture assumes determinism. You call a function with inputs, you get predictable outputs. The system does what you designed it to do.
Agentic architecture is fundamentally different.
Agents make decisions you didn't explicitly program. They reason about novel situations. They might take different paths to the same goal. Two identical requests might produce different—but equally valid—results.
This probabilistic nature changes everything:
State is more complex. An agent's decision depends not just on inputs but on reasoning chains that are difficult to predict or reproduce.
Failure modes are non-obvious. An agent might "succeed" at a task while producing subtly wrong output. Traditional error handling doesn't catch reasoning errors.
Coordination is harder. When two agents collaborate, their interaction isn't a simple request-response. It's a negotiation between two reasoning systems.
We spent our first two months ignoring these differences. We tried to build agents like we'd build microservices. It didn't work.
The breakthrough came when we embraced a different principle: Trust but verify at every step.
The FAOSX Framework: Our Foundation​
The framework at the heart of FAOSX has four core components:
Agents — Specialized personas with defined capabilities. Each agent has identity, capabilities, constraints, and communication style. Agents are defined in configuration files, not code.
Workflows — YAML files that define how work gets done. Steps, transitions, checkpoints, error handling. Workflows contain structure. Agents contain intelligence.
Tasks — Atomic units of work where actual agent execution happens. Context preparation, agent invocation, output capture, validation.
Orchestrator — The coordinator that ties everything together. Loads workflows, tracks state, dispatches tasks, manages context, handles errors.
The orchestrator is deliberately simple. It doesn't make decisions—it just coordinates. All the intelligence is pushed to the edges.
Agent Design Principles​
After building dozens of agents, clear principles emerged:
Single Responsibility — Each agent should have a clear domain. When agents try to do too much, their outputs get fuzzy.
Explicit Boundaries — Agents must know what they can and cannot do. An agent without boundaries will hallucinate capabilities.
Observable Decisions — Every agent decision must be logged and traceable. Not just what they decided—why they decided it.
Graceful Degradation — Agents will fail. The architecture must contain failures. Failed steps don't crash workflows. Agents can escalate when uncertain.
The Trade-offs That Kept Us Up at Night​
Configuration-Driven vs. Code-Driven Agents​
We chose configuration-driven. Agents are YAML/Markdown files, not code modules.
Why: Lower barrier for non-engineers, faster iteration, AI-native (agents can read their own configs), and portability.
The trade-off: Less flexibility than code. We accepted this because it forces simplicity.
Stateless vs. Stateful Agents​
We chose hybrid. Workflow state is external and persistent. Conversation state is internal and ephemeral.
Why: External state gives durability and observability. Internal state gives natural conversation flow. The summarization step forces us to capture important context.
Central Orchestrator vs. Peer-to-Peer​
We chose lightweight orchestrator with direct collaboration.
The orchestrator handles logistics. But when agents need to collaborate on a decision, they can interact directly—what we call "Party Mode."
Why: Central orchestration prevents chaos. Direct collaboration enables emergent solutions.
The Workflow Engine: Three Versions Later​
Version 1: The Turing-Complete Mistake​
Our first workflow engine was essentially a programming language. Loops, conditionals, variables, functions.
Disaster. Workflows became impossible to understand.
Lesson: Workflow languages should be simple, not powerful.
Version 2: The Over-Simplified Mistake​
Linear steps only. No conditionals. No parallelism.
Too constraining. Real work has branches.
Lesson: Simple doesn't mean simplistic.
Version 3: Structured Flexibility​
Sequential steps, parallel steps, conditional branches, loops with limits, human gates.
That's it. This covers 95% of real workflows.
name: architecture-review
steps:
- agent: architect
task: 'Review system design'
output: review_findings
- parallel:
- agent: security
task: 'Security assessment'
- agent: performance
task: 'Performance analysis'
- condition:
if: "review_findings.risk_level > 'medium'"
then: detailed_review
else: standard_approval
- gate:
type: human_approval
approvers: [tech_lead]
What We'd Do Differently​
Over-engineered: The Plugin System​
We built elaborate plugin isolation—sandboxing, versioning, dependency resolution. Enterprise-grade architecture.
We've used maybe 10% of it. Two months building infrastructure for hypothetical requirements.
Under-engineered: Context Management​
We treated context as simple. Just pass relevant information to each agent.
Very hard, it turns out. Should have invested more architecture here from the start.
Under-estimated: Debugging Tools​
We built agents before we built ways to understand what agents were doing. Debugging was painful.
Lesson: Observability isn't optional. Build it first.
Architecture Is a Living Document​
Our foundation principles remain constant:
- Trust but verify at every step
- Intelligence at the edges, logistics at the center
- Configuration over code where possible
- Observable, traceable, recoverable
Everything else can change.
The architecture we have today isn't the architecture we started with. It's not the architecture we'll have next year. But the principles persist because they address fundamental truths about agentic systems.
Next up: Post 3 — The Agent Persona System
How we design agents with real expertise, not just generic AI responses. Why specialized personas dramatically outperform generic prompts.
This is Post 2 of 10 in the series "Building the Agentic Enterprise: The FAOSX Journey."
Ready to see agentic AI in action? Request a Workshop and let's build the future together.
