Run your own AI inference cluster with Ollama, Open WebUI, and LiteLLM — fully containerized for production. Pull models, chat through a polished interface, and route requests through an OpenAI-compatible proxy.
What's Included
- docker-compose.yml — 3 services: Ollama (LLM engine), Open WebUI (chat interface), LiteLLM (OpenAI-compatible proxy)
- LiteLLM proxy config — Model routing, rate limits, and fallback configuration
- .env.example — All environment variables documented
- README.md — Architecture diagram, quick start, production checklist
Requirements
- Docker Engine 24+ with Docker Compose v2
- NVIDIA GPU with 8GB+ VRAM (for 7B models)
- NVIDIA Container Toolkit
Download
Download the zip, extract it, run docker compose up -d.