Architecture Decisions: Designing for Agent Autonomy

January 18, 2026 · 5 min read

Founder & CEO, FAOSX | CIO 100 Asia 2025 | AI & Digital Transformation Leader

When we started designing FAOSX, we had a whiteboard full of questions and zero answers.

Should agents be stateless or stateful? How do you coordinate ten agents without creating chaos? What happens when an agent makes a mistake at 2 AM? Where does the "intelligence" live—in the agent, the orchestrator, or somewhere in between?

Every architectural decision we made had to answer one core question: How do we give agents enough autonomy to be useful while maintaining enough control to be trusted?

Too much autonomy and you get unpredictable systems that enterprises will never deploy. Too little autonomy and you've just built a fancy chatbot with extra steps.

Architecting for Autonomy Is Different

Traditional software architecture assumes determinism. You call a function with inputs, you get predictable outputs. The system does what you designed it to do.

Agentic architecture is fundamentally different.

Agents make decisions you didn't explicitly program. They reason about novel situations. They might take different paths to the same goal. Two identical requests might produce different—but equally valid—results.

This probabilistic nature changes everything:

State is more complex. An agent's decision depends not just on inputs but on reasoning chains that are difficult to predict or reproduce.

Failure modes are non-obvious. An agent might "succeed" at a task while producing subtly wrong output. Traditional error handling doesn't catch reasoning errors.

Coordination is harder. When two agents collaborate, their interaction isn't a simple request-response. It's a negotiation between two reasoning systems.

We spent our first two months ignoring these differences. We tried to build agents like we'd build microservices. It didn't work.

The breakthrough came when we embraced a different principle: Trust but verify at every step.

The FAOSX Framework: Our Foundation

The framework at the heart of FAOSX has four core components:

Agents — Specialized personas with defined capabilities. Each agent has identity, capabilities, constraints, and communication style. Agents are defined in configuration files, not code.

Workflows — YAML files that define how work gets done. Steps, transitions, checkpoints, error handling. Workflows contain structure. Agents contain intelligence.

Tasks — Atomic units of work where actual agent execution happens. Context preparation, agent invocation, output capture, validation.

Orchestrator — The coordinator that ties everything together. Loads workflows, tracks state, dispatches tasks, manages context, handles errors.

The orchestrator is deliberately simple. It doesn't make decisions—it just coordinates. All the intelligence is pushed to the edges.

Agent Design Principles

After building dozens of agents, clear principles emerged:

Single Responsibility — Each agent should have a clear domain. When agents try to do too much, their outputs get fuzzy.

Explicit Boundaries — Agents must know what they can and cannot do. An agent without boundaries will hallucinate capabilities.

Observable Decisions — Every agent decision must be logged and traceable. Not just what they decided—why they decided it.

Graceful Degradation — Agents will fail. The architecture must contain failures. Failed steps don't crash workflows. Agents can escalate when uncertain.

The Trade-offs That Kept Us Up at Night

Configuration-Driven vs. Code-Driven Agents

We chose configuration-driven. Agents are YAML/Markdown files, not code modules.

Why: Lower barrier for non-engineers, faster iteration, AI-native (agents can read their own configs), and portability.

The trade-off: Less flexibility than code. We accepted this because it forces simplicity.

Stateless vs. Stateful Agents

We chose hybrid. Workflow state is external and persistent. Conversation state is internal and ephemeral.

Why: External state gives durability and observability. Internal state gives natural conversation flow. The summarization step forces us to capture important context.

Central Orchestrator vs. Peer-to-Peer

We chose lightweight orchestrator with direct collaboration.

The orchestrator handles logistics. But when agents need to collaborate on a decision, they can interact directly—what we call "Party Mode."

Why: Central orchestration prevents chaos. Direct collaboration enables emergent solutions.

The Workflow Engine: Three Versions Later

Version 1: The Turing-Complete Mistake

Our first workflow engine was essentially a programming language. Loops, conditionals, variables, functions.

Disaster. Workflows became impossible to understand.

Lesson: Workflow languages should be simple, not powerful.

Version 2: The Over-Simplified Mistake

Linear steps only. No conditionals. No parallelism.

Too constraining. Real work has branches.

Lesson: Simple doesn't mean simplistic.

Version 3: Structured Flexibility

Sequential steps, parallel steps, conditional branches, loops with limits, human gates.

That's it. This covers 95% of real workflows.

name: architecture-review
steps:
  - agent: architect
    task: 'Review system design'
    output: review_findings

  - parallel:
      - agent: security
        task: 'Security assessment'
      - agent: performance
        task: 'Performance analysis'

  - condition:
      if: "review_findings.risk_level > 'medium'"
      then: detailed_review
      else: standard_approval

  - gate:
      type: human_approval
      approvers: [tech_lead]

What We'd Do Differently

Over-engineered: The Plugin System

We built elaborate plugin isolation—sandboxing, versioning, dependency resolution. Enterprise-grade architecture.

We've used maybe 10% of it. Two months building infrastructure for hypothetical requirements.

Under-engineered: Context Management

We treated context as simple. Just pass relevant information to each agent.

Very hard, it turns out. Should have invested more architecture here from the start.

Under-estimated: Debugging Tools

We built agents before we built ways to understand what agents were doing. Debugging was painful.

Lesson: Observability isn't optional. Build it first.

Architecture Is a Living Document

Our foundation principles remain constant:

Trust but verify at every step
Intelligence at the edges, logistics at the center
Configuration over code where possible
Observable, traceable, recoverable

Everything else can change.

The architecture we have today isn't the architecture we started with. It's not the architecture we'll have next year. But the principles persist because they address fundamental truths about agentic systems.

Next up: Post 3 — The Agent Persona System

How we design agents with real expertise, not just generic AI responses. Why specialized personas dramatically outperform generic prompts.

This is Post 2 of 10 in the series "Building the Agentic Enterprise: The FAOSX Journey."

Ready to see agentic AI in action? Request a Workshop and let's build the future together.

Architecting for Autonomy Is Different​

The FAOSX Framework: Our Foundation​

Agent Design Principles​

The Trade-offs That Kept Us Up at Night​

Configuration-Driven vs. Code-Driven Agents​

Stateless vs. Stateful Agents​

Central Orchestrator vs. Peer-to-Peer​

The Workflow Engine: Three Versions Later​

Version 1: The Turing-Complete Mistake​

Version 2: The Over-Simplified Mistake​

Version 3: Structured Flexibility​

What We'd Do Differently​

Over-engineered: The Plugin System​

Under-engineered: Context Management​

Under-estimated: Debugging Tools​

Architecture Is a Living Document​