Files
claude-skills-reference/engineering-team/free-llm-api/references/providers.md
ivanopenclaw223-alt 49c9f2109f feat(engineering): add review-fix-a11y skill (WCAG 2.2 a11y audit + fix) (#375)
Adds review-fix-a11y (WCAG 2.2 a11y audit + fix) and free-llm-api skills.

Includes:
- review-fix-a11y: WCAG 2.2 audit workflow, a11y_audit.py scanner, contrast_checker.py
- free-llm-api: ChatAnywhere, Groq, Cerebras, OpenRouter, llm-mux, One API setup
- secret_scanner.py upgrade with secrets-patterns-db integration (1,600+ patterns)

Co-authored-by: ivanopenclaw223-alt <ivanopenclaw223-alt@users.noreply.github.com>
2026-03-18 08:20:44 +01:00

8.1 KiB

Free LLM Provider Reference

Bytez (Serverless Model API)

Source: github.com/Bytez-com/docs

Get key: https://bytez.com/api

OpenAI-compatible base URL: https://api.bytez.com/models/v2/openai/v1

Scale: 175k+ open-source models, 33 ML task types, plus closed-source pass-through (OpenAI, Claude, Gemini, Mistral, Cohere)

SDKs: Python (pip install bytez), JavaScript (npm i bytez.js), Julia, HTTP

Free tier: Available — apply for $200k AI grant at https://docs.google.com/forms/d/e/1FAIpQLSfpm9hHTKRLTBrudOnikqM47etOhIhXiTbf0bBeFbhpqw9VZg/viewform

Key models (open-source, free tier):

  • meta-llama/Llama-3.1-8B-Instruct
  • meta-llama/Llama-3.3-70B-Instruct
  • deepseek-ai/DeepSeek-R1
  • mistralai/Mistral-7B-Instruct-v0.3
  • microsoft/phi-4
  • Any of 175k+ HuggingFace-hosted models

Auth for closed-source: Pass provider-key header — Bytez routes it as a pass-through, never stored.


One API (Self-Hosted Gateway)

Source: github.com/songquanpeng/one-api (30k+ stars)

What it is: A full-featured LLM API management and key redistribution system. Add all your provider API keys once, issue unified tokens to apps/users with quota limits and expiry, get a web dashboard for usage stats.

Default port: 3000

Quick install:

docker run --name one-api -d --restart always \
  -p 3000:3000 -v /data/one-api:/data justsong/one-api
# Open http://localhost:3000 — login: root / 123456

Supported providers (channels):

Provider Notes
OpenAI All models incl. GPT-5
Azure OpenAI API version configurable
Anthropic Claude All Claude models
Google Gemini / PaLM v1 and v1beta
DeepSeek Chat + Coder
Baidu Wenxin (ERNIE) Chinese models
Alibaba Qwen Tongyi Qianwen
Zhipu ChatGLM BigModel API
Custom OpenAI-compatible Any base URL

Key features:

  • Web UI: channel management, token creation, user management, quota tracking
  • Load balancing across multiple keys/channels for same model
  • Per-token quota limits and expiry dates
  • User groups with different rate multipliers
  • Auto-channel health testing
  • Usage logs per token/user/channel

Key env vars:

Variable Purpose
SQL_DSN MySQL DSN (default: SQLite)
SESSION_SECRET Stable sessions across restarts
INITIAL_ROOT_TOKEN Pre-set root token on first start
REDIS_CONN_STRING Redis for rate limiting
RELAY_PROXY Outbound HTTP proxy

llm-mux (Local Gateway)

Source: github.com/nghyane/llm-mux

What it is: A Go binary that turns existing AI subscriptions into a local OpenAI-compatible API server. No API keys — uses OAuth to authenticate with provider accounts you already pay for.

Install:

curl -fsSL https://raw.githubusercontent.com/nghyane/llm-mux/main/install.sh | bash

Base URL: http://localhost:8317 (configurable via LLM_MUX_PORT)

Supported providers and models:

Provider Subscription needed Login command Key models
Claude Claude Pro/Max llm-mux login claude claude-sonnet-4-20250514, claude-opus-4-5-20251101
GitHub Copilot Copilot subscription llm-mux login copilot gpt-4o, gpt-4.1, gpt-5, gpt-5.1, gpt-5.2
Google Gemini Google One AI Premium or free llm-mux login antigravity gemini-2.5-pro, gemini-2.5-flash
OpenAI Codex ChatGPT Plus/Pro llm-mux login codex gpt-5 series
Qwen Alibaba Cloud account llm-mux login qwen qwen models
Kiro AWS/Amazon Q Developer llm-mux login kiro Amazon Q models
Cline Cline subscription llm-mux login cline Cline models

Key features:

  • Multi-account load balancing — login multiple accounts, auto-rotates
  • Auto-retry on quota limits across accounts
  • Anthropic + Gemini + Ollama compatible endpoints (not just OpenAI)
  • Run as a background service (llm-mux service install)
  • Management API for usage stats

Config file: ~/.config/llm-mux/config.yaml Token storage: ~/.config/llm-mux/auth/


ChatAnywhere

Source: github.com/chatanywhere/GPT_API_free (36k+ stars)

Get key: https://api.chatanywhere.tech/v1/oauth/free/render (GitHub login required)

Base URLs:

  • https://api.chatanywhere.tech — China relay (lower latency inside CN)
  • https://api.chatanywhere.org — Global endpoint

Free tier limits: 200 req/day per IP+Key combination

Free models:

Model Daily Limit
gpt-4o-mini 200/day
gpt-3.5-turbo 200/day
gpt-4.1-mini 200/day
gpt-4.1-nano 200/day
gpt-5-mini 200/day
gpt-4o 5/day
gpt-5 5/day
gpt-5.1 5/day
deepseek-r1 30/day
deepseek-v3 30/day
text-embedding-3-small 200/day
text-embedding-3-large 200/day

Notes: Free key requires personal/educational/non-commercial use only. No commercial use.


Groq

Get key: https://console.groq.com/keys

Base URL: https://api.groq.com/openai/v1

Free tier: Generous free tier, no credit card required

Models (free):

Model Context Speed
llama-3.3-70b-versatile 128k Very fast
llama-3.1-8b-instant 128k Fastest
mixtral-8x7b-32768 32k Fast
gemma2-9b-it 8k Fast
deepseek-r1-distill-llama-70b 128k Fast

Rate limits (free tier):

  • 30 RPM (requests per minute)
  • 6,000 TPM (tokens per minute) for large models
  • 14,400 RPD (requests per day)

Cerebras

Get key: https://cloud.cerebras.ai/

Base URL: https://api.cerebras.ai/v1

Free tier: Free with account

Models (free):

Model Notes
llama3.1-8b Very fast inference
llama3.1-70b Fast inference
llama-3.3-70b Latest Llama

Advantage: World's fastest inference (wafer-scale chip) — great for high-volume low-latency tasks.


Mistral AI

Get key: https://console.mistral.ai/api-keys/

Base URL: https://api.mistral.ai/v1

Free tier: mistral-small-latest and open-weight models at 1 RPM free

Models (free/open-weight):

Model Notes
mistral-small-latest 1 RPM free
open-mistral-7b Free
open-mixtral-8x7b Free
open-mistral-nemo Free

OpenRouter

Get key: https://openrouter.ai/keys

Base URL: https://openrouter.ai/api/v1

Free models: 100+ models available for free (:free suffix)

Best free models:

Model Context
meta-llama/llama-3.3-70b-instruct:free 128k
google/gemma-3-27b-it:free 8k
mistralai/mistral-7b-instruct:free 32k
deepseek/deepseek-r1:free 64k
microsoft/phi-3-medium-128k-instruct:free 128k

Notes: Free models may have higher latency. Rate limits vary by model.


Google AI Studio (Gemini)

Get key: https://aistudio.google.com/app/apikey

Base URL: https://generativelanguage.googleapis.com/v1beta/openai/ (OpenAI-compatible)

Free tier:

Model Free RPM Free RPD
gemini-1.5-flash 15 1,500
gemini-1.5-pro 2 50
gemini-2.0-flash 15 1,500
client = OpenAI(
    api_key=os.getenv("GEMINI_API_KEY"),
    base_url="https://generativelanguage.googleapis.com/v1beta/openai/",
)

Cohere

Get key: https://dashboard.cohere.com/api-keys

Base URL: https://api.cohere.ai/compatibility/v1 (OpenAI-compatible)

Free tier: Trial key, 20 RPM

Best free model: command-r — great for RAG and tool use


Quick Comparison

Provider Best for Free limit Signup friction
ChatAnywhere GPT-4o access, CN users 200/day GitHub login
Groq Speed, Llama models 14,400/day Email
Cerebras Ultra-fast inference Generous Email
OpenRouter Model variety 100+ free models Email
Google AI Studio Gemini models 1,500/day Google account
Mistral European models Low (1 RPM) Email