Using Claude Code with Open Models

Using Claude Code with OpenAI Open Models (GPT-OSS) and Qwen3-Coder

Below is a practical, step-by-step tutorial that shows you how to aim Claude Code at any OpenAI "open-models" release (gpt-oss-20b / gpt-oss-120b) or Qwen3-Coder by self-hosting on Hugging Face Inference Endpoints or by routing through OpenRouter. It demonstrates the minimal environment-variable technique (URL + key) as well as an optional LiteLLM proxy for larger fleets. Follow the path that best fits your infrastructure.

1. Prerequisites

Claude Code ≥ 0.5.3 with gateway support (you can check with claude --version). (LiteLLM)
A Hugging Face account with a read/write token (Settings → Access Tokens). (Hugging Face)
For OpenRouter, an OpenRouter API key. (OpenRouter)

2. Path A – Self-host GPT-OSS or Qwen on Hugging Face

2.1 Grab the model

Open the GPT-OSS repo (openai/gpt-oss-20b or openai/gpt-oss-120b) on Hugging Face and accept the Apache-2.0 license. (Hugging Face, OpenAI)
For Qwen choose Qwen/Qwen3-Coder-480B-A35B-Instruct (or a smaller GGUF spin-off if you lack GPUs). (Hugging Face)

2.2 Deploy a Text Generation Inference endpoint

Click Deploy → Inference Endpoint on the model page.
Select the Text Generation Inference (TGI) template ≥ v1.4.0. TGI now ships an OpenAI-compatible Messages API—tick "Enable OpenAI compatibility" or add --enable-openai in advanced settings. (Hugging Face)
Choose hardware (A10 G, A100, or CPU for 20 B) and create the endpoint. (Hugging Face)

2.3 Collect the credentials

After the endpoint is "Running", copy:

ENDPOINT_URL (ends in /v1).
HF_API_TOKEN (your user or org token). (Hugging Face)

2.4 Point Claude Code at the endpoint

Set environment variables in the shell that launches Claude Code:

export ANTHROPIC_BASE_URL="https://<your-endpoint>.us-east-1.aws.endpoints.huggingface.cloud"
export ANTHROPIC_AUTH_TOKEN="hf_xxxxxxxxxxxxxxxxx"
export ANTHROPIC_MODEL="gpt-oss-20b"        # or gpt-oss-120b / Qwen model id

Claude Code now believes it is talking to Anthropic yet routes to your open model because TGI mirrors the OpenAI schema. Test:

claude --model gpt-oss-20b

Streaming works—TGI returns token streams under /v1/chat/completions just like the real OpenAI API. (Hugging Face)

2.5 Cost and scaling notes

HF Inference Endpoints auto-scales, so watch credit burn. (Hugging Face)
If you need local control, run TGI in Docker with docker run --name tgi -p 8080:80 ... --enable-openai. (Hugging Face, GitHub)

3. Path B – Proxy through OpenRouter

OpenRouter exposes hundreds of models (including the new GPT-OSS and Qwen3-Coder slugs) behind one OpenAI-compatible endpoint.

3.1 Register and pick a model

Sign up at openrouter.ai, copy your key. (OpenRouter)
Model slugs:
- openai/gpt-oss-20b or openai/gpt-oss-120b (OpenAI open models). (OpenRouter)
- qwen/qwen3-coder-480b (Qwen coder). (OpenRouter)

3.2 Configure Claude Code

export ANTHROPIC_BASE_URL="https://openrouter.ai/api/v1"
export ANTHROPIC_AUTH_TOKEN="or_xxxxxxxxx"
export ANTHROPIC_MODEL="openai/gpt-oss-20b"

Run:

claude --model openai/gpt-oss-20b

OpenRouter handles billing and fallback; Claude Code stays unchanged. (OpenRouter)

4. Path C – Optional LiteLLM Proxy for Mixed Fleets

If you want Claude Code to hot-swap between Anthropic, GPT-OSS, Qwen, and Azure models, drop LiteLLM in front:

model_list:
  - model_name: gpt-oss-20b
    litellm_params:
      model: openai/gpt-oss-20b           # via OpenRouter or local TGI
      api_key: os.environ/OPENROUTER_KEY
  - model_name: qwen3-coder
    litellm_params:
      model: openrouter/qwen/qwen3-coder
      api_key: os.environ/OPENROUTER_KEY

Start the proxy and then:

export ANTHROPIC_BASE_URL="http://localhost:4000"
export ANTHROPIC_AUTH_TOKEN="litellm_master"
claude --model gpt-oss-20b

LiteLLM keeps a cost log and supports simple-shuffle routing—avoid the latency routing mode when you still call Anthropic models. (LiteLLM)

5. Troubleshooting Checklist

Symptom	Fix
404 on `/v1/chat/completions`	Ensure `--enable-openai` flag is active in TGI. (Hugging Face)
Empty responses	Verify the `ANTHROPIC_MODEL` matches the slug you mapped. (LiteLLM)
400 error after model swap	Switch LiteLLM router to `simple-shuffle` not latency-based. (LiteLLM)
Slow first token	Warm up the endpoint with a small prompt after scaling to zero. (Hugging Face)

6. Key Takeaways

Claude Code needs only ANTHROPIC_BASE_URL and AUTH_TOKEN to talk to any OpenAI-compatible backend.
Hugging Face TGI 1.4+ exposes that schema, letting you host GPT-OSS or Qwen in your own cloud with minimal glue.
OpenRouter is the fastest route if you want zero DevOps.
LiteLLM sits in front when you want policy-based routing across many vendors.

With these methods, you can mix and match open-source and proprietary models inside the same CLI workflow, keeping costs low while preserving the familiar Claude Code developer experience.

Integration with Claude Flow

Claude Flow can enhance this setup with its swarm orchestration capabilities:

Using Claude Flow with Open Models

Initialize Claude Flow MCP:

claude mcp add claude-flow npx claude-flow@alpha mcp start

Configure for Open Models:

# Set your chosen model backend
export ANTHROPIC_BASE_URL="https://openrouter.ai/api/v1"
export ANTHROPIC_AUTH_TOKEN="your_key"
export ANTHROPIC_MODEL="openai/gpt-oss-20b"

# Enable Claude Flow features
export CLAUDE_FLOW_HOOKS_ENABLED="true"
export CLAUDE_FLOW_TELEMETRY_ENABLED="true"

Leverage Swarm Coordination:

# Initialize a swarm for complex tasks
npx claude-flow@alpha swarm init --topology mesh --max-agents 5

# Use SPARC methodology with open models
npx claude-flow@alpha sparc run architect "Design authentication system"

Benefits of Claude Flow + Open Models

Cost Optimization: Route simple tasks to smaller models, complex ones to larger
Performance Tracking: Monitor token usage across different models
Swarm Coordination: Distribute work across multiple model instances
Memory Persistence: Maintain context across sessions regardless of model

References

(OpenAI, Business Insider, Hugging Face, OpenRouter, LiteLLM, WIRED)

Last updated: January 2025 | Part of Claude Flow v2.0.0-alpha.87 documentation

Using Claude Code with Open Models

Using Claude Code with OpenAI Open Models (GPT-OSS) and Qwen3-Coder

1. Prerequisites

2. Path A – Self-host GPT-OSS or Qwen on Hugging Face

2.1 Grab the model

2.2 Deploy a Text Generation Inference endpoint

2.3 Collect the credentials

2.4 Point Claude Code at the endpoint

2.5 Cost and scaling notes

3. Path B – Proxy through OpenRouter

3.1 Register and pick a model

3.2 Configure Claude Code

4. Path C – Optional LiteLLM Proxy for Mixed Fleets

5. Troubleshooting Checklist

6. Key Takeaways

Integration with Claude Flow

Using Claude Flow with Open Models

Benefits of Claude Flow + Open Models

References

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!