How to Run Ornith 1.0: Complete Local Setup Guide

Run Ornith 1.0 on your own hardware — no API keys, no cloud costs, completely private. Set up a local Ornith 1.0 server in minutes with vLLM, Ollama, or LM Studio.

Quick Start with Ornith 1.0

The fastest way to get Ornith 1.0 running locally — choose your preferred method:

vLLM

High-throughput OpenAI-compatible server for production deployments.

vllm serve deepreinforce-ai/Ornith-1.0-9B \
--served-model-name Ornith-1.0-9B \
--host 0.0.0.0 --port 8000 \
--max-model-len 262144 \
--gpu-memory-utilization 0.90 \
--enable-prefix-caching \
--enable-auto-tool-choice --tool-call-parser qwen3_xml \
--reasoning-parser qwen3 \
--trust-remote-code

SGLang

Fast serving engine with optimized scheduling for MoE models.

python -m sglang.launch_server \
--model-path deepreinforce-ai/Ornith-1.0-9B \
--served-model-name Ornith-1.0-9B \
--host 0.0.0.0 --port 8000 \
--context-length 262144 \
--mem-fraction-static 0.85 \
--tool-call-parser qwen3_coder \
--reasoning-parser qwen3

Ollama

One-command setup for local use — pull and run immediately.

ollama run hf.co/deepreinforce-ai/Ornith-1.0-9B-GGUF

LM Studio

GUI application for Mac/Windows — search and download Ornith 1.0 GGUF models directly.

# 1. Open LM Studio
# 2. Search "Ornith-1.0" in the model browser
# 3. Download Q4_K_M or Q5_K_M quantization
# 4. Load and start chatting

llama.cpp

Lightweight C++ inference — serve as an OpenAI-compatible API.

llama-server -hf deepreinforce-ai/Ornith-1.0-9B-GGUF \
--port 8000 -c 262144

Connect Ornith 1.0 to Coding Agents

Once your Ornith 1.0 server is running, point any OpenAI-compatible coding agent at it:

Claude Code

export OPENAI_BASE_URL="http://localhost:8000/v1"
export OPENAI_API_KEY="EMPTY"

OpenHands

export LLM_MODEL="openai/deepreinforce-ai/Ornith-1.0-9B"
export LLM_BASE_URL="http://localhost:8000/v1"
export LLM_API_KEY="EMPTY"
openhands

OpenClaw

export OPENAI_BASE_URL="http://localhost:8000/v1"
export OPENAI_API_KEY="EMPTY"
export OPENAI_MODEL="deepreinforce-ai/Ornith-1.0-9B"

Hermes Agent

export OPENAI_BASE_URL="http://localhost:8000/v1"
export OPENAI_API_KEY="EMPTY"
export MODEL="deepreinforce-ai/Ornith-1.0-9B"

Ornith 1.0 Deployment Tips

Choose the Right Ornith 1.0 Model Size

For a single consumer GPU (RTX 3090/4090 with 24GB), use Ornith 1.0-35B MoE at Q5_K_M. It is faster and more accurate than Ornith 1.0-9B. For 8-16GB VRAM, use Ornith 1.0-9B at Q4_K_M. The 397B model requires 8x 80GB GPUs.

Enable Prefix Caching for Ornith 1.0

When serving Ornith 1.0 with vLLM, add --enable-prefix-caching to reuse computed KV cache across requests with shared prefixes. This dramatically speeds up agent loops where each turn shares the same system prompt.

Ornith 1.0 Context Window

All Ornith 1.0 models support up to 262K token context. Set --max-model-len 262144 in vLLM or --context-length 262144 in SGLang. For llama.cpp, use -c 262144. This gives Ornith 1.0 enough context for large repository analysis.

vLLM vs SGLang Tool Call Formats

Ornith 1.0 uses --tool-call-parser qwen3_xml for vLLM but --tool-call-parser qwen3_coder for SGLang. Using the wrong parser may affect tool-calling quality. Always match the parser to your serving framework.

Ornith 1.0 Setup FAQ

What is the easiest way to run Ornith 1.0 locally?

The easiest way to run Ornith 1.0 is with Ollama. Just run 'ollama run hf.co/deepreinforce-ai/Ornith-1.0-9B-GGUF' and it downloads and starts the model automatically. No Python or GPU setup needed — Ollama handles everything.

Can I use Ornith 1.0 with Claude Code?

Yes. Start an Ornith 1.0 server (vLLM, SGLang, or llama.cpp), then set OPENAI_BASE_URL to your server address and OPENAI_API_KEY to 'EMPTY'. Claude Code will route requests through your local Ornith 1.0 instance instead of the Anthropic API.

Which serving framework should I use for Ornith 1.0?

For production deployments, use vLLM — it offers the highest throughput with OpenAI-compatible API. For quick local testing, use Ollama or LM Studio. For MoE models, SGLang offers optimized scheduling. All three support Ornith 1.0 out of the box.

Does Ornith 1.0 support tool calling?

Yes. Ornith 1.0 emits well-formed tool calls for agent loops. When serving with vLLM, enable tool calling with --enable-auto-tool-choice --tool-call-parser qwen3_xml --reasoning-parser qwen3. The reasoning trace appears in a separate reasoning_content field.

Compare Ornith 1.0 Performance

See how Ornith 1.0 stacks up against Claude, Qwen, and DeepSeek on agentic coding benchmarks.

View Benchmarks Ornith vs Others