How to Run Ornith 1.0: Complete Local Setup Guide
Run Ornith 1.0 on your own hardware — no API keys, no cloud costs, completely private. Set up a local Ornith 1.0 server in minutes with vLLM, Ollama, or LM Studio.
Quick Start with Ornith 1.0
The fastest way to get Ornith 1.0 running locally — choose your preferred method:
vLLM
High-throughput OpenAI-compatible server for production deployments.
vllm serve deepreinforce-ai/Ornith-1.0-9B \
--served-model-name Ornith-1.0-9B \
--host 0.0.0.0 --port 8000 \
--max-model-len 262144 \
--gpu-memory-utilization 0.90 \
--enable-prefix-caching \
--enable-auto-tool-choice --tool-call-parser qwen3_xml \
--reasoning-parser qwen3 \
--trust-remote-code SGLang
Fast serving engine with optimized scheduling for MoE models.
python -m sglang.launch_server \
--model-path deepreinforce-ai/Ornith-1.0-9B \
--served-model-name Ornith-1.0-9B \
--host 0.0.0.0 --port 8000 \
--context-length 262144 \
--mem-fraction-static 0.85 \
--tool-call-parser qwen3_coder \
--reasoning-parser qwen3 Ollama
One-command setup for local use — pull and run immediately.
ollama run hf.co/deepreinforce-ai/Ornith-1.0-9B-GGUF LM Studio
GUI application for Mac/Windows — search and download Ornith 1.0 GGUF models directly.
# 1. Open LM Studio
# 2. Search "Ornith-1.0" in the model browser
# 3. Download Q4_K_M or Q5_K_M quantization
# 4. Load and start chatting llama.cpp
Lightweight C++ inference — serve as an OpenAI-compatible API.
llama-server -hf deepreinforce-ai/Ornith-1.0-9B-GGUF \
--port 8000 -c 262144 Connect Ornith 1.0 to Coding Agents
Once your Ornith 1.0 server is running, point any OpenAI-compatible coding agent at it:
Claude Code
export OPENAI_BASE_URL="http://localhost:8000/v1"
export OPENAI_API_KEY="EMPTY" OpenHands
export LLM_MODEL="openai/deepreinforce-ai/Ornith-1.0-9B"
export LLM_BASE_URL="http://localhost:8000/v1"
export LLM_API_KEY="EMPTY"
openhands OpenClaw
export OPENAI_BASE_URL="http://localhost:8000/v1"
export OPENAI_API_KEY="EMPTY"
export OPENAI_MODEL="deepreinforce-ai/Ornith-1.0-9B" Hermes Agent
export OPENAI_BASE_URL="http://localhost:8000/v1"
export OPENAI_API_KEY="EMPTY"
export MODEL="deepreinforce-ai/Ornith-1.0-9B" Ornith 1.0 Deployment Tips
Choose the Right Ornith 1.0 Model Size
For a single consumer GPU (RTX 3090/4090 with 24GB), use Ornith 1.0-35B MoE at Q5_K_M. It is faster and more accurate than Ornith 1.0-9B. For 8-16GB VRAM, use Ornith 1.0-9B at Q4_K_M. The 397B model requires 8x 80GB GPUs.
Enable Prefix Caching for Ornith 1.0
When serving Ornith 1.0 with vLLM, add --enable-prefix-caching to reuse computed KV cache across requests with shared prefixes. This dramatically speeds up agent loops where each turn shares the same system prompt.
Ornith 1.0 Context Window
All Ornith 1.0 models support up to 262K token context. Set --max-model-len 262144 in vLLM or --context-length 262144 in SGLang. For llama.cpp, use -c 262144. This gives Ornith 1.0 enough context for large repository analysis.
vLLM vs SGLang Tool Call Formats
Ornith 1.0 uses --tool-call-parser qwen3_xml for vLLM but --tool-call-parser qwen3_coder for SGLang. Using the wrong parser may affect tool-calling quality. Always match the parser to your serving framework.
Ornith 1.0 Setup FAQ
What is the easiest way to run Ornith 1.0 locally?
Can I use Ornith 1.0 with Claude Code?
Which serving framework should I use for Ornith 1.0?
Does Ornith 1.0 support tool calling?
Compare Ornith 1.0 Performance
See how Ornith 1.0 stacks up against Claude, Qwen, and DeepSeek on agentic coding benchmarks.