The AI Operations Blueprint

An LLM-agnostic orchestration layer that connects to your AI provider, your systems and your team.

Inputs

"Check why response times spiked in EU and fix it if it's the CDN again"

Alerts Logs Metrics Tickets Events APIs

Stourio

AI Orchestrator Engine

Analyzes & delegates

Routes to the right capability

Capabilities

When the situation is dynamic or new

Reasons about context

Adapts to unknowns

Deploys agents for new patterns

Diagnose & Repair

Escalate

Take Action

MCP Gateway

Air-gapped tool execution

When the pattern is known and repeatable

Follows predefined rules

Fast, consistent, high-volume

Workflows, APIs, escalations

Your rules

Thresholds & logic you define

Full visibility

Every decision, auditable

Override anytime

Kill switch. Always yours.

The core principle

AI Agents vs. AI Automation

AI agents are autonomous, reasoning entities that make decisions to achieve goals. AI automation follows pre-set, if-this-then-that rules for repetitive tasks. Agents excel in dynamic, unpredictable environments. Automation is best for consistent, high-volume workflows. Agents adapt. Automation is rigid. You need both.

Stourio is an orchestration layer that sits between your team, your systems, and your AI provider. It receives operational signals, uses an LLM to reason about them, then delegates work to either AI agents (for novel situations) or automation workflows (for known patterns).

The key insight: Stourio doesn't care which LLM powers it. You connect the AI provider that fits your needs, your budget, and your compliance requirements.

Any LLM with tool use / function calling support works. The orchestrator communicates via a standard interface: send context, receive a decision, execute the action.

Architecture overview

Five layers, each with a clear responsibility. The orchestrator is the decision point. Everything else is either an input, a capability, a guardrail, or persistence.

====================================================================== SERVER A: STOURIO CORE ("THE BRAIN") ====================================================================== │ │ │ INPUTS ORCHESTRATOR ROUTING │ │ │ │ You (Chat) ──────▶ ──────────▶ AI Agents │ │ Stourio Core │ │ │ Systems ─────────▶ (Your LLM) │ │ │ (Webhooks) │ ───▶ Automation │ │ ▲ │ Workflows │ │ │ │ │ │ ┌───────────────────│──────────────────────────────────┐ │ │ │ PERSISTENCE: Redis Stream + Postgres │ │ │ └───────────────────│──────────────────────────────────┘ │ └────────────────────────│─────────────────────────────────────┘ │ Tool Execution Request (HTTP POST) ▼ Headers: Authorization Bearer ====================================================================== SERVER B: MCP GATEWAY ("THE MUSCLE" - AIR GAPPED) ====================================================================== │ CAPABILITIES │ │ [Read Runbooks] [Query Kibana] [Scale AWS Nodes] │ │ (Context) (Investigation) (Action) │ ======================================================================

Layer by layer

Two channels feed into the orchestrator. Both are always active.

You (Chat interface). A direct conversation channel where you talk to Stourio in natural language. "Check why response times spiked in EU and fix it if it's the CDN again." This can be a web app, a Slack bot, a Teams integration, or a mobile app. Standard WebSocket or REST endpoint that forwards your message to the orchestrator along with conversation history.

Your systems (Webhooks & Redis Streams). Stourio listens to your operational infrastructure through a high-throughput webhook API. When Grafana, Datadog, or PagerDuty fire an alert, it hits POST /api/webhook. To prevent the orchestrator from crashing during an alert storm, the webhook immediately drops the payload into a Redis Stream and returns a 202 Accepted. A background consumer worker dequeues the signals, processes them through the rules engine and LLM, and utilizes At-Least-Once delivery (ACK) to ensure no alerts are dropped.

The combination is what makes this an assistant, not a pipeline. You can ask questions, give commands, and have a conversation. Meanwhile, your systems feed signals that Stourio processes autonomously in the background.

The brain. Stourio receives an input (your message or a system signal), sends it to your connected LLM with the full context (your rules, available tools, conversation history), and gets back a decision.

The LLM doesn't execute anything directly. It returns one of five possible responses:

Gather more context — calls an MCP tool to investigate before deciding.
Delegate to an agent — routes to a specialized AI agent for reasoning-heavy work.
Trigger automation — fires a predefined workflow for a known pattern.
Ask you first — the feedback loop. For high-risk actions, Stourio comes back and asks for confirmation before proceeding.
Respond directly — answers your question or provides a status update.

The routing decision utilizes a deterministic short-circuit. Incoming signals are first evaluated by a fast, deterministic rule engine (e.g., regex, exact event signatures). If a signal matches a known pattern, the system triggers the automation workflow directly, bypassing the LLM. If the situation is ambiguous or unmatched, the LLM evaluates the context to delegate to the correct agent. If the resulting action is high-risk, it asks the user first.

Two lanes, each designed for a fundamentally different type of work.

AI Agents & The MCP Gateway. Agents are focused LLM loops with specialized roles (e.g., "Diagnose & Repair"). When an agent decides to take action or gather data, it does not execute code directly. Instead, it sends an HTTP POST request to a standalone MCP Gateway. This enforces an air-gapped security model: The Orchestrator ("Brain") lives on Server A and holds the LLM API keys. The MCP Gateway ("Muscle") lives on Server B and holds your AWS, database, and infrastructure credentials. If the LLM is compromised or hallucinates, it cannot touch your infrastructure directly; it can only request execution of strictly predefined tools on the Gateway.

Agents are stored as templates in a library: a role description, a set of allowed tools, and constraints. The orchestrator selects and configures the right agent for the situation. Over time, you add new agent templates as you encounter new patterns. The system grows its capabilities through use.

Automation — for known, repeatable patterns. Standard workflow execution via an engine like Temporal, n8n, or plain API orchestration. The orchestrator triggers a predefined workflow by ID with parameters. The workflow runs its steps (health check, apply fix, validate, notify) and returns a result. Fast, consistent, no reasoning needed.

The bridge between the two: when automation encounters something unexpected or fails, it falls back to the agent lane. Patterns that agents solve repeatedly can be "promoted" to automation rules. The system learns which situations need thinking and which need executing.

Every decision passes through three control mechanisms.

Your rules. Stored in a database, injected into the orchestrator's context on every call. Risk thresholds, blast radius limits, time-of-day restrictions, approval requirements. You define them through an admin interface. They're versioned for audit trail.

Full visibility. Every orchestrator decision is logged: what input triggered it, what the LLM reasoned, what action was taken, what tools were used, what the outcome was. Agent execution traces record every sub-step. Everything is queryable: "show me all actions Stourio took on EU infrastructure last week."

Override anytime. A global circuit breaker. For AI Agents, this is implemented as middleware that checks a Redis flag before every tool execution. For Automation, the orchestrator actively sends cancellation API payloads to external engines (Temporal, n8n) to terminate running workflows. It halts both reasoning and distributed execution.

LLMs are stateless. Every call needs the full context. The persistence layer maintains continuity across conversations and actions.

Store	Purpose	Recommended
Conversation state	Chat history for each orchestrator call	PostgreSQL
Agent state	Running agent context, distributed locking (mutex) to prevent race conditions	PostgreSQL + Redis (with Redlock)
Rule store	User-defined rules, versioned	PostgreSQL
Audit log	Every decision and action, immutable	PostgreSQL (append-only)
Signal queue	Incoming system events awaiting processing	Redis Streams or SQS
Session cache	Active sessions, kill switch flags	Redis

Connecting your AI provider

Stourio communicates with your LLM through a standard interface. Every provider that supports tool use / function calling works the same way from the orchestrator's perspective: send a message with context and tool definitions, receive a response with either text or a tool call.

// The orchestrator sends the same structure to any provider Request: system_prompt → Stourio's role + user rules + available tools messages → Conversation history + current input tools → MCP Gateway endpoints + automation triggers Response (one of): text → Direct answer to the user tool_call → Action to execute (agent, automation, MCP query) // Provider-specific adapters handle the API differences // OpenAI uses "functions", Anthropic uses "tools", etc. // The orchestrator doesn't care — a thin adapter normalizes both.

A provider adapter layer translates between Stourio's internal format and each provider's API, acting as a strict security boundary. It cryptographically validates every LLM tool call against a predefined schema (e.g., using Zod or Pydantic) before execution, and sanitizes every raw MCP response before injecting it back into the context window. Switching providers means mapping to this secure boundary, not rewriting the system.

Provider considerations

Different LLMs have different strengths. Models with strong reasoning (Claude Opus 4.6, GPT-5.2xhigh, Gemini 3.1 PRO) work better for the agent lane. Faster, cheaper models (Claude Haiku, Gemini 3 or GPT 5.2) work well for the orchestrator's routing decisions and simple automation triggers. You can use different models for different parts of the system.

Scales linearly with usage. Main cost driver is LLM token consumption. Using a smaller model for routing (orchestrator) and a larger model for reasoning (agents) optimizes the cost-to-quality ratio.

The feedback loop

Not every decision should be autonomous. When the orchestrator encounters a high-risk action (as defined by your rules), it pauses and comes back to you with a structured plan and a confirmation request: what it wants to do, why, what the risk is and what the blast radius would be.

You approve, reject, or modify. Every approval request has a strict Time-to-Live (TTL). If unapproved within the window, the action defaults to 'Reject' to prevent stale execution. Upon approval, the orchestrator performs a rapid state re-validation via MCP tools to ensure the environment hasn't changed before executing the action. This isn't a limitation — it's the core safety mechanism that makes autonomous operations viable in production. Without it, you're trusting an LLM with your infrastructure. With it, you're trusting an LLM that asks before doing anything dangerous.

The threshold for "high-risk" is yours to define. Some teams want confirmation before any production change. Others only want it for actions that affect multiple regions. The rule engine handles this.

Build sequence

You don't build all five layers at once. Start with the smallest useful version and expand.

Phase 1: Foundation

Orchestrator service + chat interface + one MCP server (pick your monitoring tool). At the end of this phase, you can talk to Stourio, it reads your alerts, and it reasons about them. No actions yet — just understanding and responding.

Phase 2: Guardrails

Rule engine + audit log + kill switch + feedback loop. This is the mandatory safety foundation. Rules are enforced, every routing decision is logged, and the distributed override mechanism is operational before any actions are allowed.

Phase 3: Actions

Two automation workflows for your most common known patterns + the Diagnose & Repair agent. Stourio can now fix known issues automatically and investigate unknown ones, operating strictly within the Phase 2 guardrails.

Phase 4: Scale

Additional MCP servers for your other systems + Escalate agent + Take Action agent. The full schema becomes operational across multiple integrations, and standard context windows will no longer efficiently hold your growing runbook library.

Phase 5: Growth

Agent template UI + pattern promotion (recurring agent solutions become automation rules). Admin interface for managing the agent library. The system learns from its own usage.

What can go wrong

Risk	Impact	Mitigation
LLM reasoning error	Wrong action executed	Guardrails layer, confirmation on high-risk, blast radius limits
LLM provider downtime	System stops reasoning	Queue signals, retry with backoff, fallback to automation-only mode
Prompt injection	Malicious signals manipulate the LLM	Sanitize all external inputs before including in LLM context
Agent loops	Agents calling agents indefinitely	Max depth limit (3-4 hops), timeout per agent execution
Rule conflicts	Contradictory rules cause unpredictable behavior	Validation on rule creation, priority ordering
Runaway Automation	Destructive workflows continue executing after orchestrator shutdown.	Kill switch tied directly to external workflow engine cancellation APIs, not just local middleware
Stale Approvals	Executing an outdated plan on a changed infrastructure state causes secondary outages.	TTL on all approval requests + mandatory state re-validation post-approval
The Thundering Herd (Event Storms)	Multiple agents spawned for the same root cause collide and corrupt infrastructure state	Signal debouncing and correlation windows at the queue layer before orchestrator processing
Open-ended Command Execution	Agent hallucinates a destructive terminal command, wiping production data or infrastructure	Implement strict command allow-lists, ephemeral least-privilege credentials, and an absolute ban on raw shell access for all agents
Probabilistic Routing Drift	LLM misroutes a known urgent issue to a slow reasoning agent instead of instant automation, breaching MTTR	Implement a deterministic rules engine before the LLM to handle known alert signatures; restrict LLM routing strictly to novel or ambiguous signals
Agent State Collision (Race Conditions)	Concurrent agents read stale state and execute conflicting actions on the same infrastructure component	Implement strict distributed locking (e.g., Redis Redlock) on target infrastructure before an agent begins reasoning or execution
Malformed LLM Tool Calls	The LLM hallucinates incorrect parameters or invalid JSON, causing external automation engines to panic or execute broken workflows	Strict schema validation at the adapter boundary; drop and retry any tool call that fails schema enforcement before it reaches the execution layer

What you don't need

This architecture is deliberately simple. Standard web infrastructure plus an LLM API. Specifically, you do not need: custom ML models or training, GPU infrastructure, complex multi-agent engine like LangGraph or CrewAI (direct API calls are simpler and more reliable), or Kubernetes (unless you choose to over-engineer from day one).

The entire system runs on application servers, a database, a cache, and API calls to your LLM provider. That's the point. The intelligence comes from the model. The value comes from the orchestration.