The AI Operations Blueprint

An LLM-agnostic orchestration layer that connects to your AI provider, your systems and your team.
Inputs
You
"Check why response times spiked in EU and fix it if it's the CDN again"
Your systems
Alerts Logs Metrics Tickets Events APIs
Stourio
AI Orchestrator Engine
Analyzes & delegates
Routes to the right capability
Capabilities
AI Agents
When the situation is dynamic or new
Reasons about context
Adapts to unknowns
Deploys agents for new patterns
Diagnose & Repair
Escalate
Take Action
MCP Gateway
Air-gapped tool execution
Automation
When the pattern is known and repeatable
Follows predefined rules
Fast, consistent, high-volume
Workflows, APIs, escalations
Your rules
Thresholds & logic you define
Full visibility
Every decision, auditable
Override anytime
Kill switch. Always yours.

The core principle

AI Agents vs. AI Automation

AI agents are autonomous, reasoning entities that make decisions to achieve goals. AI automation follows pre-set, if-this-then-that rules for repetitive tasks. Agents excel in dynamic, unpredictable environments. Automation is best for consistent, high-volume workflows. Agents adapt. Automation is rigid. You need both.

Stourio is an orchestration layer that sits between your team, your systems, and your AI provider. It receives operational signals, uses an LLM to reason about them, then delegates work to either AI agents (for novel situations) or automation workflows (for known patterns).

The key insight: Stourio doesn't care which LLM powers it. You connect the AI provider that fits your needs, your budget, and your compliance requirements.

Any LLM with tool use / function calling support works. The orchestrator communicates via a standard interface: send context, receive a decision, execute the action.

Architecture overview

Five layers, each with a clear responsibility. The orchestrator is the decision point. Everything else is either an input, a capability, a guardrail, or persistence.

====================================================================== SERVER A: STOURIO CORE ("THE BRAIN") ====================================================================== INPUTS ORCHESTRATOR ROUTING You (Chat) ──────▶ ──────────▶ AI Agents Stourio Core │ Systems ─────────▶ (Your LLM) │ (Webhooks) │ ───▶ Automation ▲ │ Workflows ┌───────────────────│──────────────────────────────────┐ PERSISTENCE: Redis Stream + Postgres └───────────────────│──────────────────────────────────┘ └────────────────────────│─────────────────────────────────────┘ │ Tool Execution Request (HTTP POST) ▼ Headers: Authorization Bearer ====================================================================== SERVER B: MCP GATEWAY ("THE MUSCLE" - AIR GAPPED) ====================================================================== CAPABILITIES [Read Runbooks] [Query Kibana] [Scale AWS Nodes] (Context) (Investigation) (Action) ======================================================================

Layer by layer

1
Inputs

Two channels feed into the orchestrator. Both are always active.

You (Chat interface). A direct conversation channel where you talk to Stourio in natural language. "Check why response times spiked in EU and fix it if it's the CDN again." This can be a web app, a Slack bot, a Teams integration, or a mobile app. Standard WebSocket or REST endpoint that forwards your message to the orchestrator along with conversation history.

Your systems (Webhooks & Redis Streams). Stourio listens to your operational infrastructure through a high-throughput webhook API. When Grafana, Datadog, or PagerDuty fire an alert, it hits POST /api/webhook. To prevent the orchestrator from crashing during an alert storm, the webhook immediately drops the payload into a Redis Stream and returns a 202 Accepted. A background consumer worker dequeues the signals, processes them through the rules engine and LLM, and utilizes At-Least-Once delivery (ACK) to ensure no alerts are dropped.

The combination is what makes this an assistant, not a pipeline. You can ask questions, give commands, and have a conversation. Meanwhile, your systems feed signals that Stourio processes autonomously in the background.

2
Orchestrator

The brain. Stourio receives an input (your message or a system signal), sends it to your connected LLM with the full context (your rules, available tools, conversation history), and gets back a decision.

The LLM doesn't execute anything directly. It returns one of five possible responses:

Gather more context — calls an MCP tool to investigate before deciding.
Delegate to an agent — routes to a specialized AI agent for reasoning-heavy work.
Trigger automation — fires a predefined workflow for a known pattern.
Ask you first — the feedback loop. For high-risk actions, Stourio comes back and asks for confirmation before proceeding.
Respond directly — answers your question or provides a status update.

The routing decision utilizes a deterministic short-circuit. Incoming signals are first evaluated by a fast, deterministic rule engine (e.g., regex, exact event signatures). If a signal matches a known pattern, the system triggers the automation workflow directly, bypassing the LLM. If the situation is ambiguous or unmatched, the LLM evaluates the context to delegate to the correct agent. If the resulting action is high-risk, it asks the user first.

3
Capabilities

Two lanes, each designed for a fundamentally different type of work.

AI Agents & The MCP Gateway. Agents are focused LLM loops with specialized roles (e.g., "Diagnose & Repair"). When an agent decides to take action or gather data, it does not execute code directly. Instead, it sends an HTTP POST request to a standalone MCP Gateway. This enforces an air-gapped security model: The Orchestrator ("Brain") lives on Server A and holds the LLM API keys. The MCP Gateway ("Muscle") lives on Server B and holds your AWS, database, and infrastructure credentials. If the LLM is compromised or hallucinates, it cannot touch your infrastructure directly; it can only request execution of strictly predefined tools on the Gateway.

Agents are stored as templates in a library: a role description, a set of allowed tools, and constraints. The orchestrator selects and configures the right agent for the situation. Over time, you add new agent templates as you encounter new patterns. The system grows its capabilities through use.

Automation — for known, repeatable patterns. Standard workflow execution via an engine like Temporal, n8n, or plain API orchestration. The orchestrator triggers a predefined workflow by ID with parameters. The workflow runs its steps (health check, apply fix, validate, notify) and returns a result. Fast, consistent, no reasoning needed.

The bridge between the two: when automation encounters something unexpected or fails, it falls back to the agent lane. Patterns that agents solve repeatedly can be "promoted" to automation rules. The system learns which situations need thinking and which need executing.

4
Guardrails

Every decision passes through three control mechanisms.

Your rules. Stored in a database, injected into the orchestrator's context on every call. Risk thresholds, blast radius limits, time-of-day restrictions, approval requirements. You define them through an admin interface. They're versioned for audit trail.

Full visibility. Every orchestrator decision is logged: what input triggered it, what the LLM reasoned, what action was taken, what tools were used, what the outcome was. Agent execution traces record every sub-step. Everything is queryable: "show me all actions Stourio took on EU infrastructure last week."

Override anytime. A global circuit breaker. For AI Agents, this is implemented as middleware that checks a Redis flag before every tool execution. For Automation, the orchestrator actively sends cancellation API payloads to external engines (Temporal, n8n) to terminate running workflows. It halts both reasoning and distributed execution.

5
Persistence

LLMs are stateless. Every call needs the full context. The persistence layer maintains continuity across conversations and actions.

StorePurposeRecommended
Conversation stateChat history for each orchestrator callPostgreSQL
Agent stateRunning agent context, distributed locking (mutex) to prevent race conditionsPostgreSQL + Redis (with Redlock)
Rule storeUser-defined rules, versionedPostgreSQL
Audit logEvery decision and action, immutablePostgreSQL (append-only)
Signal queueIncoming system events awaiting processingRedis Streams or SQS
Session cacheActive sessions, kill switch flagsRedis

Connecting your AI provider

Stourio communicates with your LLM through a standard interface. Every provider that supports tool use / function calling works the same way from the orchestrator's perspective: send a message with context and tool definitions, receive a response with either text or a tool call.

// The orchestrator sends the same structure to any provider Request: system_prompt → Stourio's role + user rules + available tools messages → Conversation history + current input tools → MCP Gateway endpoints + automation triggers Response (one of): text → Direct answer to the user tool_call → Action to execute (agent, automation, MCP query) // Provider-specific adapters handle the API differences // OpenAI uses "functions", Anthropic uses "tools", etc. // The orchestrator doesn't care — a thin adapter normalizes both.

A provider adapter layer translates between Stourio's internal format and each provider's API, acting as a strict security boundary. It cryptographically validates every LLM tool call against a predefined schema (e.g., using Zod or Pydantic) before execution, and sanitizes every raw MCP response before injecting it back into the context window. Switching providers means mapping to this secure boundary, not rewriting the system.

Provider considerations

Different LLMs have different strengths. Models with strong reasoning (Claude Opus 4.6, GPT-5.2xhigh, Gemini 3.1 PRO) work better for the agent lane. Faster, cheaper models (Claude Haiku, Gemini 3 or GPT 5.2) work well for the orchestrator's routing decisions and simple automation triggers. You can use different models for different parts of the system.

Scales linearly with usage. Main cost driver is LLM token consumption. Using a smaller model for routing (orchestrator) and a larger model for reasoning (agents) optimizes the cost-to-quality ratio.

The feedback loop

Not every decision should be autonomous. When the orchestrator encounters a high-risk action (as defined by your rules), it pauses and comes back to you with a structured plan and a confirmation request: what it wants to do, why, what the risk is and what the blast radius would be.

You approve, reject, or modify. Every approval request has a strict Time-to-Live (TTL). If unapproved within the window, the action defaults to 'Reject' to prevent stale execution. Upon approval, the orchestrator performs a rapid state re-validation via MCP tools to ensure the environment hasn't changed before executing the action. This isn't a limitation — it's the core safety mechanism that makes autonomous operations viable in production. Without it, you're trusting an LLM with your infrastructure. With it, you're trusting an LLM that asks before doing anything dangerous.

The threshold for "high-risk" is yours to define. Some teams want confirmation before any production change. Others only want it for actions that affect multiple regions. The rule engine handles this.

Build sequence

You don't build all five layers at once. Start with the smallest useful version and expand.

Phase 1: Foundation

Orchestrator service + chat interface + one MCP server (pick your monitoring tool). At the end of this phase, you can talk to Stourio, it reads your alerts, and it reasons about them. No actions yet — just understanding and responding.

Phase 2: Guardrails

Rule engine + audit log + kill switch + feedback loop. This is the mandatory safety foundation. Rules are enforced, every routing decision is logged, and the distributed override mechanism is operational before any actions are allowed.

Phase 3: Actions

Two automation workflows for your most common known patterns + the Diagnose & Repair agent. Stourio can now fix known issues automatically and investigate unknown ones, operating strictly within the Phase 2 guardrails.

Phase 4: Scale

Additional MCP servers for your other systems + Escalate agent + Take Action agent. The full schema becomes operational across multiple integrations, and standard context windows will no longer efficiently hold your growing runbook library.

Phase 5: Growth

Agent template UI + pattern promotion (recurring agent solutions become automation rules). Admin interface for managing the agent library. The system learns from its own usage.

What can go wrong

RiskImpactMitigation
LLM reasoning errorWrong action executedGuardrails layer, confirmation on high-risk, blast radius limits
LLM provider downtimeSystem stops reasoningQueue signals, retry with backoff, fallback to automation-only mode
Prompt injectionMalicious signals manipulate the LLMSanitize all external inputs before including in LLM context
Agent loopsAgents calling agents indefinitelyMax depth limit (3-4 hops), timeout per agent execution
Rule conflictsContradictory rules cause unpredictable behaviorValidation on rule creation, priority ordering
Runaway AutomationDestructive workflows continue executing after orchestrator shutdown.Kill switch tied directly to external workflow engine cancellation APIs, not just local middleware
Stale ApprovalsExecuting an outdated plan on a changed infrastructure state causes secondary outages.TTL on all approval requests + mandatory state re-validation post-approval
The Thundering Herd (Event Storms)Multiple agents spawned for the same root cause collide and corrupt infrastructure stateSignal debouncing and correlation windows at the queue layer before orchestrator processing
Open-ended Command ExecutionAgent hallucinates a destructive terminal command, wiping production data or infrastructureImplement strict command allow-lists, ephemeral least-privilege credentials, and an absolute ban on raw shell access for all agents
Probabilistic Routing DriftLLM misroutes a known urgent issue to a slow reasoning agent instead of instant automation, breaching MTTRImplement a deterministic rules engine before the LLM to handle known alert signatures; restrict LLM routing strictly to novel or ambiguous signals
Agent State Collision (Race Conditions)Concurrent agents read stale state and execute conflicting actions on the same infrastructure componentImplement strict distributed locking (e.g., Redis Redlock) on target infrastructure before an agent begins reasoning or execution
Malformed LLM Tool CallsThe LLM hallucinates incorrect parameters or invalid JSON, causing external automation engines to panic or execute broken workflowsStrict schema validation at the adapter boundary; drop and retry any tool call that fails schema enforcement before it reaches the execution layer

What you don't need

This architecture is deliberately simple. Standard web infrastructure plus an LLM API. Specifically, you do not need: custom ML models or training, GPU infrastructure, complex multi-agent engine like LangGraph or CrewAI (direct API calls are simpler and more reliable), or Kubernetes (unless you choose to over-engineer from day one).

The entire system runs on application servers, a database, a cache, and API calls to your LLM provider. That's the point. The intelligence comes from the model. The value comes from the orchestration.