Complete technical reference for the Stourio Core engine and the MCP Gateway. Two repositories, two servers, one system.
The system runs across two isolated servers. The engine Server holds the orchestration logic, all state, and the automation engine. It has no access to your infrastructure credentials. The MCP Server holds only the gateway that executes privileged tool calls. It sits in a private subnet and accepts traffic exclusively from the engine Server.
Credential isolation. The orchestrator sends LLM-generated tool calls to the gateway. If the orchestrator is compromised, the attacker gets conversation history and routing logic but never touches your infrastructure keys. The gateway validates every request against a shared secret and only executes tools from its own registry.
Repository: stourio-core-engine. This is the orchestration brain. It receives inputs (user chat or system webhooks), routes them through a rules engine and an LLM, and delegates work to agents or automation workflows. All state is persisted to PostgreSQL. All real-time coordination goes through Redis. All tool execution is forwarded to the MCP Gateway over the network.
On first start, the system creates all PostgreSQL tables and seeds four default safety rules. The signal consumer worker begins listening on the Redis stream immediately.
Five containers run on the engine Server via Docker Compose. Only Stourio (:8000) and Jaeger (:16686) expose ports to the host. PostgreSQL and Redis are internal-only (no published ports).
The FastAPI application. Hosts the API, the orchestrator, the agent runtime, the rules engine, and the automation dispatcher. Runs on port 8000.
On startup, it initializes the database schema, seeds default rules, configures OpenTelemetry tracing to Jaeger, and starts the background signal consumer worker that dequeues system events from Redis Streams.
The permanent state store. Four tables are created automatically on first start:
| Table | Purpose | Key columns |
|---|---|---|
| audit_log | Immutable record of every decision, action, and event | action, detail, input_id, execution_id, risk_level |
| conversation_messages | Chat history per conversation | conversation_id, role, content |
| rules | User-defined routing rules (versioned, active/inactive) | pattern, pattern_type, action, risk_level, automation_id |
| approvals | Pending and resolved approval requests | action_description, status, expires_at |
Runs on postgres:16-alpine. Internal port 5432 only. Connection pool size: 10 (configurable via SQLAlchemy engine).
Handles four distinct responsibilities through different data structures:
| Function | Redis type | Key pattern |
|---|---|---|
| Signal queue | Stream + Consumer Group | stourio:signals |
| Kill switch | Key/Value | stourio:kill_switch |
| Approval cache | Key/Value with TTL | stourio:approval:{id} |
| Distributed locks | Key/Value with NX + TTL | stourio:lock:{resource} |
| Rate limiting | Key/Value with INCR + TTL | stourio:ratelimit:{ip}:{path}:{window} |
Runs on redis:7-alpine. Requires password authentication. Max memory 256MB with LRU eviction. Internal port 6379 only.
Signals use consumer groups for reliable delivery. A signal is only removed from the stream after the orchestrator acknowledges successful processing. If the worker crashes, unacknowledged signals are redelivered on restart.
The deterministic automation engine. When the orchestrator triggers a workflow, it sends a POST request to http://n8n:5678/webhook/stourio with the workflow ID, execution context, and step definitions. n8n executes the steps sequentially (health check, apply fix, validate, notify) and returns a result.
Exposed on 127.0.0.1:5678 for local admin access to the visual workflow editor. Not exposed to the public network.
Receives OpenTelemetry traces from the Stourio application via OTLP gRPC on internal port 4317. The web UI is available at 127.0.0.1:16686 for viewing request traces, orchestrator decisions, agent execution steps, and timing data.
Stourio instruments every orchestrator call with custom spans: signal source, routing decision, agent type, execution duration. Every FastAPI endpoint is auto-instrumented via opentelemetry-instrumentation-fastapi.
All endpoints require the X-STOURIO-KEY header. If the key is not set in the environment, the system rejects all requests with HTTP 503. The full interactive API documentation is available at /docs (Swagger UI).
| Method | Endpoint | Purpose | Rate limit |
|---|---|---|---|
| POST | /api/chat | Send a user message through the orchestrator | 30/min |
| POST | /api/webhook | Ingest a system signal (queued, returns 202) | 120/min |
| GET | /api/approvals | List pending approval requests | 60/min |
| POST | /api/approvals/{id} | Approve or reject an action | 60/min |
| POST | /api/kill | Activate the kill switch (halt all operations) | 5/min |
| POST | /api/resume | Deactivate the kill switch | 5/min |
| GET | /api/rules | List all active rules | 30/min |
| POST | /api/rules | Create a new rule | 30/min |
| DELETE | /api/rules/{id} | Delete a rule | 30/min |
| GET | /api/status | System status, kill switch state, pending approvals | 60/min |
| GET | /api/audit | Recent audit log entries (default: 50) | 30/min |
Webhook signals are enqueued to a Redis Stream and processed asynchronously by the background consumer worker. The endpoint returns immediately to prevent blocking your monitoring system's webhook delivery.
Every input follows the same five-step pipeline: kill switch check, deterministic rules evaluation, LLM routing (if no rule matched), execution, and result return.
The rules engine runs before the LLM on every request. Known patterns (destructive commands, known alert signatures) are handled deterministically with zero LLM token cost. The LLM only sees inputs that don't match any rule.
Rules are stored in PostgreSQL, cached in memory, and evaluated in order (first match wins). Four pattern types are supported:
| Pattern type | Matches against | Example pattern |
|---|---|---|
| regex | Input content (sanitized + raw) | DROP\s+(DATABASE|TABLE) |
| keyword | Normalized input text | production deploy |
| event_type | Signal header text | alert:critical |
| payload_match | Parsed webhook JSON payload | severity:critical |
Regex patterns are evaluated against both the raw input and a sanitized version that strips SQL comments, C-style block comments, and excessive whitespace. This prevents obfuscation bypasses on destructive commands.
Seeded automatically on first start if no rules exist:
| Rule | Pattern | Action | Risk |
|---|---|---|---|
| prevent_db_drop | DROP\s+(DATABASE|TABLE) | Require approval | Critical |
| block_ssh_root | ssh\s+root@ | Hard reject | Critical |
| block_rm_rf | rm\s+-rf\s+/ | Hard reject | Critical |
| auto_scale_cpu | CPU\s*>\s*9[0-9]% | Trigger automation | Low |
Agents are stored as templates in src/agents/runtime.py. Each template defines a role (system prompt), a set of allowed tools, and a maximum step count. The orchestrator selects the right template based on the LLM's routing decision, then the agent runtime executes it in a loop: LLM call, tool call, LLM call, tool call, until the agent produces a final text response or hits the step limit.
| Template | Role | Tools | Max steps |
|---|---|---|---|
| diagnose_repair | Diagnose system issues, fetch runbooks, propose and apply fixes | get_system_metrics, get_recent_logs, execute_remediation, read_internal_runbook | 8 |
| escalate | Summarize situation, assess severity, notify the right people | send_notification | 4 |
| take_action | General-purpose: API calls, report generation, data lookups | call_api, generate_report | 6 |
Every tool call from every agent is routed through default_tool_executor, which forwards it to the MCP Gateway's /execute endpoint. The agent never executes tools locally.
Each agent loop checks the kill switch before every LLM call. Each agent acquires a distributed lock with a fencing token on its work resource. If the lock is overtaken by a newer process, the agent terminates. A background heartbeat extends the lock TTL every 10 seconds while the agent is active.
Add a new entry to the AGENT_TEMPLATES dictionary in src/agents/runtime.py. The template ID must also be added to the orchestrator's routing tools enum in src/orchestrator/core.py (the agent_type enum list in the route_to_agent tool definition).
Defined in src/automation/workflows.py. Each workflow has an ID, a name, and a list of steps. When triggered, the orchestrator sends the full step definition to n8n's webhook endpoint. n8n handles the actual execution.
| Workflow ID | Name | Steps |
|---|---|---|
| auto_scale_horizontal | Horizontal Auto-Scale | Get instance count, scale +2, verify health |
| restart_service | Rolling Restart | Drain oldest, restart, verify health, resume traffic |
| flush_cdn_cache | CDN Cache Flush | Purge CDN by region, verify origin response |
The workflow ID here must match the workflow configured in your n8n instance. Stourio sends the payload; n8n defines how each step is actually executed.
Repository: stourio-mcp-engine. A single-purpose FastAPI service with one endpoint: POST /execute. The orchestrator sends a tool name and arguments; the gateway looks up the handler in its internal registry and executes it. No routing logic, no LLM calls, no state management. Just tool dispatch.
Tools are registered in gateway.py using the @register_tool decorator. The gateway dispatches by matching tool_name against the registry dictionary. Unknown tools are rejected with a 404.
| Tool | Status | Description |
|---|---|---|
| read_internal_runbook | Live | Reads a Markdown file from the /app/docs directory. Path traversal protected. |
| get_system_metrics | Stub | Connect to Prometheus, CloudWatch, or Datadog. |
| get_recent_logs | Stub | Connect to Loki, CloudWatch Logs, or ELK. |
| execute_remediation | Stub | Connect to AWS SSM, Ansible, or Rundeck. |
| send_notification | Stub | Connect to Slack webhook, SendGrid, or PagerDuty. |
| call_api | Stub | HTTP dispatch with URL allowlist. |
| generate_report | Stub | Report formatting and export. |
Stub tools return structured JSON explaining they are not yet connected. The agent LLM receives this as a tool result and handles it gracefully, typically reporting that the integration is not yet configured.
Three steps: add the handler to the gateway, add the tool definition to the agent template, and rebuild the gateway image.
The agent template defines what the LLM knows it can call (the tool's name, description, and parameter schema). The gateway defines what actually happens when it's called. The core's tool executor validates that every LLM tool call exists in the agent's allowed set before forwarding it to the gateway. This is defense in depth: even if the LLM hallucinates a tool name, it's rejected before reaching the network.
Remove the @register_tool function from gateway.py and remove the corresponding ToolDefinition from the agent template in the core. Rebuild both images. If the tool is removed from the gateway but not from the agent template, the agent will call it and receive a 404. If the tool is removed from the agent template but not from the gateway, it becomes unreachable (the core's whitelist blocks it before it ever reaches the gateway).
Every request to the Core API requires the X-STOURIO-KEY header. Generated via python3 scripts/generate_key.py (cryptographically secure, 32 characters). If the key is not configured in the environment, the system returns HTTP 503 on all endpoints until it is set.
Every request to the Gateway requires Authorization: Bearer <MCP_SHARED_SECRET>. Applied at the FastAPI dependency level (all endpoints protected by default, not opt-in). The same secret must be configured on both servers.
The Core uses Redis-backed per-IP rate limiting with configurable limits per endpoint prefix (see API reference table). The Gateway uses in-memory sliding window rate limiting, defaulting to 60 requests/minute per IP. Both return HTTP 429 with a Retry-After header.
The Core's default_tool_executor applies three checks before forwarding any tool call to the gateway: (1) whitelist check against all tool names defined in agent templates, (2) regex validation that the tool name contains only [a-zA-Z0-9_-], (3) the gateway itself rejects any tool not in its registry. Three layers, three independent codebases.
PostgreSQL and Redis expose no ports to the host (Docker expose only, no ports). Jaeger and n8n are bound to 127.0.0.1. The MCP Gateway should be firewalled to accept traffic only from the engine Server IP on port 8080.
A Redis flag checked before every orchestrator decision and before every agent tool call. When activated via POST /api/kill, all new inputs are rejected and all running agents halt at their next step. Deactivate via POST /api/resume. Both actions are recorded in the audit log.
| Variable | Required | Default | Purpose |
|---|---|---|---|
| STOURIO_API_KEY | Yes | API authentication key for all endpoints | |
| ORCHESTRATOR_PROVIDER | No | openai | LLM provider for routing: openai, anthropic, deepseek, google |
| ORCHESTRATOR_MODEL | No | gpt-4o-mini | Model for routing decisions (fast, cheap recommended) |
| AGENT_PROVIDER | No | openai | LLM provider for agent reasoning |
| AGENT_MODEL | No | gpt-4o-mini | Model for agent work (strong reasoning recommended) |
| OPENAI_API_KEY | If using | OpenAI API key | |
| ANTHROPIC_API_KEY | If using | Anthropic API key | |
| DEEPSEEK_API_KEY | If using | DeepSeek API key | |
| GOOGLE_API_KEY | If using | Google Gemini API key | |
| POSTGRES_PASSWORD | Yes | changeme | PostgreSQL password (change before first start) |
| REDIS_PASSWORD | Yes | changeme | Redis password (change before first start) |
| DATABASE_URL | No | postgresql+asyncpg://stourio:changeme@postgres:5432/stourio | Full connection string (auto-composed from password in docker-compose) |
| REDIS_URL | No | redis://:changeme@redis:6379/0 | Full Redis URL |
| AUTOMATION_WEBHOOK_URL | No | http://n8n:5678/webhook/stourio | n8n webhook endpoint |
| MCP_SERVER_URL | Yes | Full URL to MCP gateway (e.g. http://10.0.1.50:8080) | |
| MCP_SHARED_SECRET | Yes | Bearer token for gateway auth (must match gateway .env) | |
| CORS_ORIGINS | No | http://localhost:3000,http://localhost:8000 | Comma-separated allowed origins |
| MAX_AGENT_DEPTH | No | 4 | Maximum agent nesting depth |
| APPROVAL_TTL_SECONDS | No | 300 | Seconds before unapproved actions auto-reject |
| LOG_LEVEL | No | info | Logging level: debug, info, warning, error |
| Variable | Required | Default | Purpose |
|---|---|---|---|
| MCP_SHARED_SECRET | Yes | Bearer token for auth (must match core .env) | |
| MCP_RATE_LIMIT | No | 60 | Max requests per minute per IP |
| MCP_DOCS_DIR | No | /app/docs | Runbook directory path inside container |
POSTGRES_PASSWORD, REDIS_PASSWORD, and generate STOURIO_API_KEY and MCP_SHARED_SECRET. Never deploy with "changeme".
Stourio uses a multi-model routing cache. You must set API keys for the orchestrator, the default agent fallback, and any explicit agent template overrides.
Not localhost. The actual internal IP of the MCP Server (e.g. http://10.0.1.50:8080).
Port 8080 accepts traffic only from the engine Server IP. Block everything else.
Remove localhost entries. Set to your actual frontend domain(s) only.
Both should be bound to 127.0.0.1 in docker-compose.yml (already the default).
Place Markdown files in the runbooks/ directory. Rebuild the image. These are the docs your agents will reference when diagnosing issues.
Access n8n at localhost:5678, create workflows that match the IDs in your automation definitions (auto_scale_horizontal, restart_service, flush_cdn_cache).
Connect get_system_metrics to your monitoring, get_recent_logs to your log aggregator, send_notification to Slack/PagerDuty. Each is a single async function in gateway.py.
POST /api/kill, verify all operations halt, POST /api/resume. Confirm both actions appear in the audit log.