Engine Documentation

Complete technical reference for the Stourio Core engine and the MCP Gateway. Two repositories, two servers, one system.

Deployment architecture

The system runs across two isolated servers. The engine Server holds the orchestration logic, all state, and the automation engine. It has no access to your infrastructure credentials. The MCP Server holds only the gateway that executes privileged tool calls. It sits in a private subnet and accepts traffic exclusively from the engine Server.

Server 1: engine Server Server 2: MCP Server Public or internal network Private subnet only ┌────────────────────────────┐ ┌──────────────────────┐ │ Stourio Core :8000 │ │ MCP Gateway :8080│ │ Orchestrator + API │ │ Tool execution │ │ │ POST │ │ │ default_tool_executor ────│──/execute───▶│ TOOL_REGISTRY │ │ │ + Bearer │ ├ read_runbook │ ├────────────────────────────┤ token │ ├ get_metrics │ │ PostgreSQL :5432 │ │ ├ get_logs │ │ Redis :6379 │ │ ├ execute_remed. │ │ n8n :5678 │ │ ├ send_notif. │ │ Jaeger :16686 │ │ ├ call_api │ └────────────────────────────┘ │ └ generate_report │ └──────────────────────┘ No infrastructure credentials Holds AWS keys, service on this server. tokens, internal docs. Firewall: allow only engine Server IP.
Why two servers?

Credential isolation. The orchestrator sends LLM-generated tool calls to the gateway. If the orchestrator is compromised, the attacker gets conversation history and routing logic but never touches your infrastructure keys. The gateway validates every request against a shared secret and only executes tools from its own registry.

Server 1: Stourio Core engine

Repository: stourio-core-engine. This is the orchestration brain. It receives inputs (user chat or system webhooks), routes them through a rules engine and an LLM, and delegates work to agents or automation workflows. All state is persisted to PostgreSQL. All real-time coordination goes through Redis. All tool execution is forwarded to the MCP Gateway over the network.

Prerequisites

Docker + Docker Compose At least one LLM API key Python 3.12 (for key generation script)

Quick start

# 1. Clone and configure cp .env.example .env python3 scripts/generate_key.py # generates STOURIO_API_KEY # 2. Add your LLM keys to .env # 3. Change POSTGRES_PASSWORD and REDIS_PASSWORD from defaults # 4. Start everything docker compose up -d # 5. Verify curl http://localhost:8000/ # Should return JSON with version curl http://localhost:8000/docs # Swagger UI

On first start, the system creates all PostgreSQL tables and seeds four default safety rules. The signal consumer worker begins listening on the Redis stream immediately.

Infrastructure services

Five containers run on the engine Server via Docker Compose. Only Stourio (:8000) and Jaeger (:16686) expose ports to the host. PostgreSQL and Redis are internal-only (no published ports).

1
Stourio Core

The FastAPI application. Hosts the API, the orchestrator, the agent runtime, the rules engine, and the automation dispatcher. Runs on port 8000.

On startup, it initializes the database schema, seeds default rules, configures OpenTelemetry tracing to Jaeger, and starts the background signal consumer worker that dequeues system events from Redis Streams.

2
PostgreSQL

The permanent state store. Four tables are created automatically on first start:

TablePurposeKey columns
audit_logImmutable record of every decision, action, and eventaction, detail, input_id, execution_id, risk_level
conversation_messagesChat history per conversationconversation_id, role, content
rulesUser-defined routing rules (versioned, active/inactive)pattern, pattern_type, action, risk_level, automation_id
approvalsPending and resolved approval requestsaction_description, status, expires_at

Runs on postgres:16-alpine. Internal port 5432 only. Connection pool size: 10 (configurable via SQLAlchemy engine).

3
Redis

Handles four distinct responsibilities through different data structures:

FunctionRedis typeKey pattern
Signal queueStream + Consumer Groupstourio:signals
Kill switchKey/Valuestourio:kill_switch
Approval cacheKey/Value with TTLstourio:approval:{id}
Distributed locksKey/Value with NX + TTLstourio:lock:{resource}
Rate limitingKey/Value with INCR + TTLstourio:ratelimit:{ip}:{path}:{window}

Runs on redis:7-alpine. Requires password authentication. Max memory 256MB with LRU eviction. Internal port 6379 only.

Signals use consumer groups for reliable delivery. A signal is only removed from the stream after the orchestrator acknowledges successful processing. If the worker crashes, unacknowledged signals are redelivered on restart.

4
n8n

The deterministic automation engine. When the orchestrator triggers a workflow, it sends a POST request to http://n8n:5678/webhook/stourio with the workflow ID, execution context, and step definitions. n8n executes the steps sequentially (health check, apply fix, validate, notify) and returns a result.

Exposed on 127.0.0.1:5678 for local admin access to the visual workflow editor. Not exposed to the public network.

5
Jaeger

Receives OpenTelemetry traces from the Stourio application via OTLP gRPC on internal port 4317. The web UI is available at 127.0.0.1:16686 for viewing request traces, orchestrator decisions, agent execution steps, and timing data.

Stourio instruments every orchestrator call with custom spans: signal source, routing decision, agent type, execution duration. Every FastAPI endpoint is auto-instrumented via opentelemetry-instrumentation-fastapi.

API reference

All endpoints require the X-STOURIO-KEY header. If the key is not set in the environment, the system rejects all requests with HTTP 503. The full interactive API documentation is available at /docs (Swagger UI).

MethodEndpointPurposeRate limit
POST/api/chatSend a user message through the orchestrator30/min
POST/api/webhookIngest a system signal (queued, returns 202)120/min
GET/api/approvalsList pending approval requests60/min
POST/api/approvals/{id}Approve or reject an action60/min
POST/api/killActivate the kill switch (halt all operations)5/min
POST/api/resumeDeactivate the kill switch5/min
GET/api/rulesList all active rules30/min
POST/api/rulesCreate a new rule30/min
DELETE/api/rules/{id}Delete a rule30/min
GET/api/statusSystem status, kill switch state, pending approvals60/min
GET/api/auditRecent audit log entries (default: 50)30/min

Chat request

POST /api/chat Headers: X-STOURIO-KEY: your-api-key Body: { "message": "Why is latency high on the EU CDN?", "conversation_id": "optional-existing-id" // omit to start new } Response: { "conversation_id": "01JARX...", "status": "completed", // or awaiting_approval, needs_info "message": "The agent found...", "type": "agent", // or direct, automation "execution_id": "01JARX...", "steps": [...] }

Webhook signal

POST /api/webhook Headers: X-STOURIO-KEY: your-api-key Body: { "source": "datadog", "event_type": "alert", "title": "High CPU on web-server-03", "severity": "critical", "payload": { "host": "web-server-03", "cpu_percent": 97.3, "duration_minutes": 5 } } Response: 202 Accepted { "status": "queued", "message": "Signal accepted for correlation." }

Webhook signals are enqueued to a Redis Stream and processed asynchronously by the background consumer worker. The endpoint returns immediately to prevent blocking your monitoring system's webhook delivery.

Orchestrator routing

Every input follows the same five-step pipeline: kill switch check, deterministic rules evaluation, LLM routing (if no rule matched), execution, and result return.

Input received │ ▼ Kill switch active? ──yes──▶ Return "System halted" │ no ▼ Rules engine match? │ ├── hard_reject ──▶ Block. Return rejection message. ├── require_approval ──▶ Create approval request. Pause. ├── trigger_automation──▶ Fire workflow via n8n. Return result. ├── force_agent ──▶ Skip LLM routing, go to agent. │ └── No match ──▶ Send to LLM with routing tools │ ├── route_to_agent │ ├── risk high/critical ──▶ Require approval first │ └── risk low/medium ────▶ Execute immediately ├── route_to_automation ──▶ Fire workflow ├── respond_directly ───▶ Return text └── request_more_info ──▶ Ask for clarification

The rules engine runs before the LLM on every request. Known patterns (destructive commands, known alert signatures) are handled deterministically with zero LLM token cost. The LLM only sees inputs that don't match any rule.

Rules engine

Rules are stored in PostgreSQL, cached in memory, and evaluated in order (first match wins). Four pattern types are supported:

Pattern typeMatches againstExample pattern
regexInput content (sanitized + raw)DROP\s+(DATABASE|TABLE)
keywordNormalized input textproduction deploy
event_typeSignal header textalert:critical
payload_matchParsed webhook JSON payloadseverity:critical

Regex patterns are evaluated against both the raw input and a sanitized version that strips SQL comments, C-style block comments, and excessive whitespace. This prevents obfuscation bypasses on destructive commands.

Default safety rules

Seeded automatically on first start if no rules exist:

RulePatternActionRisk
prevent_db_dropDROP\s+(DATABASE|TABLE)Require approvalCritical
block_ssh_rootssh\s+root@Hard rejectCritical
block_rm_rfrm\s+-rf\s+/Hard rejectCritical
auto_scale_cpuCPU\s*>\s*9[0-9]%Trigger automationLow

Creating a rule via API

POST /api/rules { "name": "block_prod_delete", "pattern": "DELETE.*FROM.*production", "pattern_type": "regex", "action": "hard_reject", "risk_level": "critical" } // Actions: require_approval, hard_reject, trigger_automation, force_agent, allow // Risk levels: low, medium, high, critical // For trigger_automation, include "automation_id": "workflow_id_here"

Agent templates

Agents are stored as templates in src/agents/runtime.py. Each template defines a role (system prompt), a set of allowed tools, and a maximum step count. The orchestrator selects the right template based on the LLM's routing decision, then the agent runtime executes it in a loop: LLM call, tool call, LLM call, tool call, until the agent produces a final text response or hits the step limit.

TemplateRoleToolsMax steps
diagnose_repairDiagnose system issues, fetch runbooks, propose and apply fixesget_system_metrics, get_recent_logs, execute_remediation, read_internal_runbook8
escalateSummarize situation, assess severity, notify the right peoplesend_notification4
take_actionGeneral-purpose: API calls, report generation, data lookupscall_api, generate_report6

Every tool call from every agent is routed through default_tool_executor, which forwards it to the MCP Gateway's /execute endpoint. The agent never executes tools locally.

Agent safety mechanisms

Each agent loop checks the kill switch before every LLM call. Each agent acquires a distributed lock with a fencing token on its work resource. If the lock is overtaken by a newer process, the agent terminates. A background heartbeat extends the lock TTL every 10 seconds while the agent is active.

Adding a new agent template

Add a new entry to the AGENT_TEMPLATES dictionary in src/agents/runtime.py. The template ID must also be added to the orchestrator's routing tools enum in src/orchestrator/core.py (the agent_type enum list in the route_to_agent tool definition).

# In src/agents/runtime.py, add to AGENT_TEMPLATES: "security_audit": AgentTemplate( id="security_audit", name="Security Audit", role="""You are a security agent. Analyze access logs, check for anomalies, and report findings.""", tools=[ ToolDefinition( name="get_recent_logs", description="Retrieve recent log entries", parameters={...}, ), ], max_steps=6, ), # Then in src/orchestrator/core.py, update the route_to_agent tool: # "enum": ["diagnose_repair", "escalate", "take_action", "security_audit"]

Automation workflows

Defined in src/automation/workflows.py. Each workflow has an ID, a name, and a list of steps. When triggered, the orchestrator sends the full step definition to n8n's webhook endpoint. n8n handles the actual execution.

Workflow IDNameSteps
auto_scale_horizontalHorizontal Auto-ScaleGet instance count, scale +2, verify health
restart_serviceRolling RestartDrain oldest, restart, verify health, resume traffic
flush_cdn_cacheCDN Cache FlushPurge CDN by region, verify origin response

Adding a workflow

# In src/automation/workflows.py, add to WORKFLOWS: "rotate_secrets": AutomationWorkflow( id="rotate_secrets", name="Secret Rotation", description="Rotate API keys for a service", steps=[ {"action": "generate_new_key", "target": "{{service}}"}, {"action": "update_secret_store", "target": "{{service}}"}, {"action": "restart_service", "target": "{{service}}"}, {"action": "verify_health", "target": "{{service}}", "timeout": 30}, {"action": "revoke_old_key", "target": "{{service}}"}, ], ), // Then configure the matching workflow in n8n's visual editor // to receive the webhook payload and execute each step.

The workflow ID here must match the workflow configured in your n8n instance. Stourio sends the payload; n8n defines how each step is actually executed.

Server 2: MCP Gateway

Repository: stourio-mcp-engine. A single-purpose FastAPI service with one endpoint: POST /execute. The orchestrator sends a tool name and arguments; the gateway looks up the handler in its internal registry and executes it. No routing logic, no LLM calls, no state management. Just tool dispatch.

Setup

# 1. Generate shared secret python3 setup_gateway.py # creates .env with MCP_SHARED_SECRET # 2. Copy the secret to the engine Server's .env too # 3. Add your runbooks mkdir runbooks # Add .md files: runbooks/redis-cache.md, runbooks/api-errors.md, etc. # 4. Build and run docker build -t mcp-gateway . docker run -d -p 8080:8080 --name mcp-gateway --env-file .env mcp-gateway # 5. Verify curl http://localhost:8080/health # no auth required

The /execute contract

POST /execute Headers: Authorization: Bearer <MCP_SHARED_SECRET> Body: { "tool_name": "read_internal_runbook", "arguments": { "service_name": "redis-cache" } } Success (200): { "result": "# Redis Cache Runbook\n..." } Unknown tool (404): { "detail": "Tool 'unknown' is not registered on this gateway." } Rate limited (429): { "error": "Rate limit exceeded. Max 60 requests/minute." }

MCP tool registry

Tools are registered in gateway.py using the @register_tool decorator. The gateway dispatches by matching tool_name against the registry dictionary. Unknown tools are rejected with a 404.

ToolStatusDescription
read_internal_runbookLiveReads a Markdown file from the /app/docs directory. Path traversal protected.
get_system_metricsStubConnect to Prometheus, CloudWatch, or Datadog.
get_recent_logsStubConnect to Loki, CloudWatch Logs, or ELK.
execute_remediationStubConnect to AWS SSM, Ansible, or Rundeck.
send_notificationStubConnect to Slack webhook, SendGrid, or PagerDuty.
call_apiStubHTTP dispatch with URL allowlist.
generate_reportStubReport formatting and export.

Stub tools return structured JSON explaining they are not yet connected. The agent LLM receives this as a tool result and handles it gracefully, typically reporting that the integration is not yet configured.

Adding a tool

Three steps: add the handler to the gateway, add the tool definition to the agent template, and rebuild the gateway image.

# Step 1: In gateway.py, add the handler @register_tool("check_disk_usage") async def tool_check_disk_usage(arguments: dict) -> dict: host = arguments.get("host", "localhost") # ... your implementation ... return {"usage_percent": 73.2, "mount": "/"} # Step 2: In stourio-core-engine/src/agents/runtime.py, # add the tool definition to the appropriate agent template: ToolDefinition( name="check_disk_usage", description="Check disk usage on a host", parameters={ "type": "object", "properties": { "host": {"type": "string"} }, "required": ["host"] }, ), # Step 3: Rebuild and redeploy the gateway docker build -t mcp-gateway . && docker restart mcp-gateway
Why both sides?

The agent template defines what the LLM knows it can call (the tool's name, description, and parameter schema). The gateway defines what actually happens when it's called. The core's tool executor validates that every LLM tool call exists in the agent's allowed set before forwarding it to the gateway. This is defense in depth: even if the LLM hallucinates a tool name, it's rejected before reaching the network.

Removing a tool

Remove the @register_tool function from gateway.py and remove the corresponding ToolDefinition from the agent template in the core. Rebuild both images. If the tool is removed from the gateway but not from the agent template, the agent will call it and receive a 404. If the tool is removed from the agent template but not from the gateway, it becomes unreachable (the core's whitelist blocks it before it ever reaches the gateway).

Security model

1
API authentication

Every request to the Core API requires the X-STOURIO-KEY header. Generated via python3 scripts/generate_key.py (cryptographically secure, 32 characters). If the key is not configured in the environment, the system returns HTTP 503 on all endpoints until it is set.

2
MCP gateway authentication

Every request to the Gateway requires Authorization: Bearer <MCP_SHARED_SECRET>. Applied at the FastAPI dependency level (all endpoints protected by default, not opt-in). The same secret must be configured on both servers.

3
Rate limiting

The Core uses Redis-backed per-IP rate limiting with configurable limits per endpoint prefix (see API reference table). The Gateway uses in-memory sliding window rate limiting, defaulting to 60 requests/minute per IP. Both return HTTP 429 with a Retry-After header.

4
Tool call validation

The Core's default_tool_executor applies three checks before forwarding any tool call to the gateway: (1) whitelist check against all tool names defined in agent templates, (2) regex validation that the tool name contains only [a-zA-Z0-9_-], (3) the gateway itself rejects any tool not in its registry. Three layers, three independent codebases.

5
Network isolation

PostgreSQL and Redis expose no ports to the host (Docker expose only, no ports). Jaeger and n8n are bound to 127.0.0.1. The MCP Gateway should be firewalled to accept traffic only from the engine Server IP on port 8080.

6
Kill switch

A Redis flag checked before every orchestrator decision and before every agent tool call. When activated via POST /api/kill, all new inputs are rejected and all running agents halt at their next step. Deactivate via POST /api/resume. Both actions are recorded in the audit log.

Environment variables

Stourio Core (.env)

VariableRequiredDefaultPurpose
STOURIO_API_KEYYesAPI authentication key for all endpoints
ORCHESTRATOR_PROVIDERNoopenaiLLM provider for routing: openai, anthropic, deepseek, google
ORCHESTRATOR_MODELNogpt-4o-miniModel for routing decisions (fast, cheap recommended)
AGENT_PROVIDERNoopenaiLLM provider for agent reasoning
AGENT_MODELNogpt-4o-miniModel for agent work (strong reasoning recommended)
OPENAI_API_KEYIf usingOpenAI API key
ANTHROPIC_API_KEYIf usingAnthropic API key
DEEPSEEK_API_KEYIf usingDeepSeek API key
GOOGLE_API_KEYIf usingGoogle Gemini API key
POSTGRES_PASSWORDYeschangemePostgreSQL password (change before first start)
REDIS_PASSWORDYeschangemeRedis password (change before first start)
DATABASE_URLNopostgresql+asyncpg://stourio:changeme@postgres:5432/stourioFull connection string (auto-composed from password in docker-compose)
REDIS_URLNoredis://:changeme@redis:6379/0Full Redis URL
AUTOMATION_WEBHOOK_URLNohttp://n8n:5678/webhook/stourion8n webhook endpoint
MCP_SERVER_URLYesFull URL to MCP gateway (e.g. http://10.0.1.50:8080)
MCP_SHARED_SECRETYesBearer token for gateway auth (must match gateway .env)
CORS_ORIGINSNohttp://localhost:3000,http://localhost:8000Comma-separated allowed origins
MAX_AGENT_DEPTHNo4Maximum agent nesting depth
APPROVAL_TTL_SECONDSNo300Seconds before unapproved actions auto-reject
LOG_LEVELNoinfoLogging level: debug, info, warning, error

MCP Gateway (.env)

VariableRequiredDefaultPurpose
MCP_SHARED_SECRETYesBearer token for auth (must match core .env)
MCP_RATE_LIMITNo60Max requests per minute per IP
MCP_DOCS_DIRNo/app/docsRunbook directory path inside container

Project structure

stourio-core-engine

stourio-core-engine/ ├── docker-compose.yml All 5 services ├── Dockerfile Stourio container ├── .env.example Copy to .env and configure ├── scripts/ │ ├── generate_key.py Generate STOURIO_API_KEY │ ├── setup.sh First-time setup helper │ └── test.sh ├── src/ │ ├── main.py FastAPI app, lifespan, signal worker │ ├── config.py Pydantic settings from .env │ ├── telemetry.py OpenTelemetry + Jaeger setup │ ├── api/ │ │ ├── routes.py All API endpoints │ │ └── rate_limit.py Redis-backed rate limiter middleware │ ├── orchestrator/ │ │ └── core.py Routing pipeline: rules → LLM → execute │ ├── agents/ │ │ └── runtime.py Agent templates, execution loop, tool executor │ ├── automation/ │ │ └── workflows.py Workflow definitions + n8n dispatch │ ├── rules/ │ │ └── engine.py Rule evaluation, sanitization, seeding │ ├── guardrails/ │ │ └── approvals.py Approval lifecycle (create/resolve/expire) │ ├── persistence/ │ │ ├── database.py SQLAlchemy models + table definitions │ │ ├── redis_store.py Kill switch, locks, queues, approvals │ │ ├── audit.py Audit log writes │ │ └── conversations.py Chat history persistence │ ├── adapters/ │ │ ├── base.py Abstract LLM adapter interface │ │ ├── registry.py Provider selection from config │ │ ├── openai_adapter.py │ │ ├── anthropic_adapter.py │ │ └── google_adapter.py │ └── models/ │ └── schemas.py All Pydantic models and enums └── tests/

stourio-mcp-engine

stourio-mcp-engine/ ├── Dockerfile Gateway container (bakes in runbooks) ├── .env.example Copy to .env and configure ├── gateway.py FastAPI app, tool registry, all handlers ├── setup_gateway.py Generate MCP_SHARED_SECRET └── runbooks/ Your internal docs (.md files) └── .gitkeep

Production checklist

Change all default passwords

POSTGRES_PASSWORD, REDIS_PASSWORD, and generate STOURIO_API_KEY and MCP_SHARED_SECRET. Never deploy with "changeme".

Configure LLM providers (Multi-Model Matrix)

Stourio uses a multi-model routing cache. You must set API keys for the orchestrator, the default agent fallback, and any explicit agent template overrides.

Set MCP_SERVER_URL to the gateway's private IP

Not localhost. The actual internal IP of the MCP Server (e.g. http://10.0.1.50:8080).

Firewall the MCP Gateway

Port 8080 accepts traffic only from the engine Server IP. Block everything else.

Restrict CORS_ORIGINS

Remove localhost entries. Set to your actual frontend domain(s) only.

Verify Jaeger and n8n are not publicly exposed

Both should be bound to 127.0.0.1 in docker-compose.yml (already the default).

Add your runbooks to the MCP Gateway

Place Markdown files in the runbooks/ directory. Rebuild the image. These are the docs your agents will reference when diagnosing issues.

Configure n8n workflows

Access n8n at localhost:5678, create workflows that match the IDs in your automation definitions (auto_scale_horizontal, restart_service, flush_cdn_cache).

Replace MCP tool stubs with real implementations

Connect get_system_metrics to your monitoring, get_recent_logs to your log aggregator, send_notification to Slack/PagerDuty. Each is a single async function in gateway.py.

Test the kill switch

POST /api/kill, verify all operations halt, POST /api/resume. Confirm both actions appear in the audit log.