Running Agentic AI in Production

Stage	What	Example
Chat	One prompt, one response	ChatGPT in the browser
Agent	Tool use, context-aware	OpenCode CLI
Multi-Agent	Orchestration, delegation	Oh-My-OpenAgent + ACP

Provider	Models	Highlights
Local GPU	Qwen3.5 397B, Gemma 4 26B	Data sovereignty, no VPN for local tasks
SAIA	GLM 5 Turbo, GLM 5.1, GLM 4.7, GLM 4.6V	Primary agent models, vision
SAIA	Mistral Large 3, GPT OSS, Devstral 2, DeepSeek R1, Qwen3.5 122B	Flagship reasoning, code-gen, chain-of-thought

Category	Model	Why
ultrabrain	GLM 5 Turbo	Hardest reasoning tasks
deep	GLM 4.7	Complex implementation
visual-engineering	Qwen3.5 122B	UI/UX, styling, animation
quick	Gemma 4 26B	Trivial fixes (fast, free)
writing	GLM 5.1	Documentation, articles

Agent Config Is Decoupled From Provider Infrastructure

Every agent model is a LiteLLM alias. The agent doesn't know where the model runs.

{
  "provider": {
    "litellm": {
      "models": {
        "glm-5-turbo": { "name": "GLM 5 Turbo (SAIA)" },
        "saia/gpt-oss-120b": { "name": "SAIA GPT OSS 120B" },
        "qwen3.5-397b": { "name": "Qwen3.5 397B (local)" }
      },
      "options": {
        "baseURL": "http://127.0.0.1:4000/v1",
        "timeout": 120000,
        "maxRetries": 5
      }
    }
  }
}

Model swap = one line in LiteLLM config.

Agent	Role	Model	Why This Model
Sisyphus	Orchestrator	GLM 5 Turbo	Strong reasoning for delegation
Prometheus	Planner	GLM 4.7	Structured work breakdowns
Oracle	Consultant	GPT OSS 120B	High-IQ read-only analysis
Metis	Pre-planning	GLM 5 Turbo	Ambiguities and edge cases
Momus	Reviewer	Qwen3.5 122B	Plan quality assurance

Agent	Role	Model	Why This Model
Sisyphus-Junior	Worker	Devstral 2 123B	Code-gen, cheaper than orchestrator
Atlas	Worker	GLM 5 Turbo	Implementation execution
Librarian	Reference search	Qwen3.5 35B	External docs / API research
Explore	Code search	Gemma 4 26B	Fast local grep (cheap)
Multimodal	Vision	GLM 4.6V	Image analysis, screenshots

MCP Server	Function	Transport
Inkscape	SVG creation and manipulation	local
CodeGraphContext	Code-indexed knowledge (Neo4j KG)	local (SSH tunnel)
Playwright	Browser automation, screenshots	built-in
Git	Repository operations	built-in
File System	Read/write access to workspace	built-in

Skill	Description
graphwiz-reporter	Autonomous: KG + RSS → research → published article
test-driven-development	Test first, then implement
systematic-debugging	Structured error analysis before fixes
dispatching-parallel-agents	Parallel task distribution
verification-before-completion	Evidence before assertions

Playbook	Function
litellm-proxy	Docker Compose: LiteLLM + PostgreSQL + Redis
opencode-deploy	Build & install OpenCode from source (Go)
opencode-sync	Sync agent configs to all hosts
knowledge-graph	Neo4j + CodeGraphContext deployment
vpn-hub / vpn-peers	WireGuard mesh networking
traefik	Reverse proxy + TLS (Let's Encrypt)

Running Agentic AI in Production

From Chat to Orchestrated Multi-Agent Systems

Agenda

Chat → Agent → Multi-Agent: Three Stages of AI Integration

Production AI Infrastructure Must Be Secure, Controllable, Integrable

Model Orchestration — LiteLLM Proxy

One Endpoint, 25+ Models — Zero Vendor Lock-In

LiteLLM Handles Fallback, Caching, Logging, and Rate Limiting

Three Providers Cover Every Use Case

Agent-to-Model Mapping — Each Agent Gets Its Optimal Model

Agent Config Is Decoupled From Provider Infrastructure

Agent Framework — OpenCode

OpenCode: CLI Agent That Reads, Writes, and Tests Code Autonomously

OpenCode Integrates Models, MCP Servers, Plugins, and LSP in One CLI

Agent Orchestration — Oh-My-OpenAgent

Sisyphus Delegates to Specialised Agents by Role

Orchestrator + Consultant Agents — The Brain

Worker + Research Agents — The Hands

Real Workflow: 7 Steps, 6 Models, 1 Endpoint

Skills, Plugins & MCP — Extensibility

MCP Connects Agents to the Outside World

Skills: Reusable Workflow Templates

ACP — Agent Client Protocol

ACP Is the LSP Moment for AI Agents — One Protocol, Any Editor, Any Agent

ACP in Practice — Multi-Project Orchestration

Infrastructure — Ansible Automation

9 Hosts, Everything as Code — Ansible Playbooks

Traefik → LiteLLM → GPUs → MCP: The Full Infrastructure Stack

Wrap Up

The Architecture: Three Decoupling Layers

Resources — Core Tools

Resources — Extensions & Infrastructure

Questions & Discussion

License

Layer	Tool	Role
Models	SAIA, Local GPUs	25+ LLMs, 2 providers
Orchestration	LiteLLM Proxy	One API, fallback, logging
Agent	OpenCode CLI	Reads, writes, tests code
Multi-Agent	Oh-My-OpenAgent	10 specialised agents
Cross-Editor	ACP	Standardised agent protocol
Extension	MCP + Skills + Plugins	Tool integration
Infrastructure	Ansible	Reproducible deployment