Adds review-fix-a11y (WCAG 2.2 a11y audit + fix) and free-llm-api skills. Includes: - review-fix-a11y: WCAG 2.2 audit workflow, a11y_audit.py scanner, contrast_checker.py - free-llm-api: ChatAnywhere, Groq, Cerebras, OpenRouter, llm-mux, One API setup - secret_scanner.py upgrade with secrets-patterns-db integration (1,600+ patterns) Co-authored-by: ivanopenclaw223-alt <ivanopenclaw223-alt@users.noreply.github.com>
8.1 KiB
Free LLM Provider Reference
Bytez (Serverless Model API)
Source: github.com/Bytez-com/docs
Get key: https://bytez.com/api
OpenAI-compatible base URL: https://api.bytez.com/models/v2/openai/v1
Scale: 175k+ open-source models, 33 ML task types, plus closed-source pass-through (OpenAI, Claude, Gemini, Mistral, Cohere)
SDKs: Python (pip install bytez), JavaScript (npm i bytez.js), Julia, HTTP
Free tier: Available — apply for $200k AI grant at https://docs.google.com/forms/d/e/1FAIpQLSfpm9hHTKRLTBrudOnikqM47etOhIhXiTbf0bBeFbhpqw9VZg/viewform
Key models (open-source, free tier):
meta-llama/Llama-3.1-8B-Instructmeta-llama/Llama-3.3-70B-Instructdeepseek-ai/DeepSeek-R1mistralai/Mistral-7B-Instruct-v0.3microsoft/phi-4- Any of 175k+ HuggingFace-hosted models
Auth for closed-source: Pass provider-key header — Bytez routes it as a pass-through, never stored.
One API (Self-Hosted Gateway)
Source: github.com/songquanpeng/one-api (30k+ stars)
What it is: A full-featured LLM API management and key redistribution system. Add all your provider API keys once, issue unified tokens to apps/users with quota limits and expiry, get a web dashboard for usage stats.
Default port: 3000
Quick install:
docker run --name one-api -d --restart always \
-p 3000:3000 -v /data/one-api:/data justsong/one-api
# Open http://localhost:3000 — login: root / 123456
Supported providers (channels):
| Provider | Notes |
|---|---|
| OpenAI | All models incl. GPT-5 |
| Azure OpenAI | API version configurable |
| Anthropic Claude | All Claude models |
| Google Gemini / PaLM | v1 and v1beta |
| DeepSeek | Chat + Coder |
| Baidu Wenxin (ERNIE) | Chinese models |
| Alibaba Qwen | Tongyi Qianwen |
| Zhipu ChatGLM | BigModel API |
| Custom OpenAI-compatible | Any base URL |
Key features:
- Web UI: channel management, token creation, user management, quota tracking
- Load balancing across multiple keys/channels for same model
- Per-token quota limits and expiry dates
- User groups with different rate multipliers
- Auto-channel health testing
- Usage logs per token/user/channel
Key env vars:
| Variable | Purpose |
|---|---|
SQL_DSN |
MySQL DSN (default: SQLite) |
SESSION_SECRET |
Stable sessions across restarts |
INITIAL_ROOT_TOKEN |
Pre-set root token on first start |
REDIS_CONN_STRING |
Redis for rate limiting |
RELAY_PROXY |
Outbound HTTP proxy |
llm-mux (Local Gateway)
Source: github.com/nghyane/llm-mux
What it is: A Go binary that turns existing AI subscriptions into a local OpenAI-compatible API server. No API keys — uses OAuth to authenticate with provider accounts you already pay for.
Install:
curl -fsSL https://raw.githubusercontent.com/nghyane/llm-mux/main/install.sh | bash
Base URL: http://localhost:8317 (configurable via LLM_MUX_PORT)
Supported providers and models:
| Provider | Subscription needed | Login command | Key models |
|---|---|---|---|
| Claude | Claude Pro/Max | llm-mux login claude |
claude-sonnet-4-20250514, claude-opus-4-5-20251101 |
| GitHub Copilot | Copilot subscription | llm-mux login copilot |
gpt-4o, gpt-4.1, gpt-5, gpt-5.1, gpt-5.2 |
| Google Gemini | Google One AI Premium or free | llm-mux login antigravity |
gemini-2.5-pro, gemini-2.5-flash |
| OpenAI Codex | ChatGPT Plus/Pro | llm-mux login codex |
gpt-5 series |
| Qwen | Alibaba Cloud account | llm-mux login qwen |
qwen models |
| Kiro | AWS/Amazon Q Developer | llm-mux login kiro |
Amazon Q models |
| Cline | Cline subscription | llm-mux login cline |
Cline models |
Key features:
- Multi-account load balancing — login multiple accounts, auto-rotates
- Auto-retry on quota limits across accounts
- Anthropic + Gemini + Ollama compatible endpoints (not just OpenAI)
- Run as a background service (
llm-mux service install) - Management API for usage stats
Config file: ~/.config/llm-mux/config.yaml
Token storage: ~/.config/llm-mux/auth/
ChatAnywhere
Source: github.com/chatanywhere/GPT_API_free (36k+ stars)
Get key: https://api.chatanywhere.tech/v1/oauth/free/render (GitHub login required)
Base URLs:
https://api.chatanywhere.tech— China relay (lower latency inside CN)https://api.chatanywhere.org— Global endpoint
Free tier limits: 200 req/day per IP+Key combination
Free models:
| Model | Daily Limit |
|---|---|
| gpt-4o-mini | 200/day |
| gpt-3.5-turbo | 200/day |
| gpt-4.1-mini | 200/day |
| gpt-4.1-nano | 200/day |
| gpt-5-mini | 200/day |
| gpt-4o | 5/day |
| gpt-5 | 5/day |
| gpt-5.1 | 5/day |
| deepseek-r1 | 30/day |
| deepseek-v3 | 30/day |
| text-embedding-3-small | 200/day |
| text-embedding-3-large | 200/day |
Notes: Free key requires personal/educational/non-commercial use only. No commercial use.
Groq
Get key: https://console.groq.com/keys
Base URL: https://api.groq.com/openai/v1
Free tier: Generous free tier, no credit card required
Models (free):
| Model | Context | Speed |
|---|---|---|
| llama-3.3-70b-versatile | 128k | Very fast |
| llama-3.1-8b-instant | 128k | Fastest |
| mixtral-8x7b-32768 | 32k | Fast |
| gemma2-9b-it | 8k | Fast |
| deepseek-r1-distill-llama-70b | 128k | Fast |
Rate limits (free tier):
- 30 RPM (requests per minute)
- 6,000 TPM (tokens per minute) for large models
- 14,400 RPD (requests per day)
Cerebras
Get key: https://cloud.cerebras.ai/
Base URL: https://api.cerebras.ai/v1
Free tier: Free with account
Models (free):
| Model | Notes |
|---|---|
| llama3.1-8b | Very fast inference |
| llama3.1-70b | Fast inference |
| llama-3.3-70b | Latest Llama |
Advantage: World's fastest inference (wafer-scale chip) — great for high-volume low-latency tasks.
Mistral AI
Get key: https://console.mistral.ai/api-keys/
Base URL: https://api.mistral.ai/v1
Free tier: mistral-small-latest and open-weight models at 1 RPM free
Models (free/open-weight):
| Model | Notes |
|---|---|
| mistral-small-latest | 1 RPM free |
| open-mistral-7b | Free |
| open-mixtral-8x7b | Free |
| open-mistral-nemo | Free |
OpenRouter
Get key: https://openrouter.ai/keys
Base URL: https://openrouter.ai/api/v1
Free models: 100+ models available for free (:free suffix)
Best free models:
| Model | Context |
|---|---|
| meta-llama/llama-3.3-70b-instruct:free | 128k |
| google/gemma-3-27b-it:free | 8k |
| mistralai/mistral-7b-instruct:free | 32k |
| deepseek/deepseek-r1:free | 64k |
| microsoft/phi-3-medium-128k-instruct:free | 128k |
Notes: Free models may have higher latency. Rate limits vary by model.
Google AI Studio (Gemini)
Get key: https://aistudio.google.com/app/apikey
Base URL: https://generativelanguage.googleapis.com/v1beta/openai/ (OpenAI-compatible)
Free tier:
| Model | Free RPM | Free RPD |
|---|---|---|
| gemini-1.5-flash | 15 | 1,500 |
| gemini-1.5-pro | 2 | 50 |
| gemini-2.0-flash | 15 | 1,500 |
client = OpenAI(
api_key=os.getenv("GEMINI_API_KEY"),
base_url="https://generativelanguage.googleapis.com/v1beta/openai/",
)
Cohere
Get key: https://dashboard.cohere.com/api-keys
Base URL: https://api.cohere.ai/compatibility/v1 (OpenAI-compatible)
Free tier: Trial key, 20 RPM
Best free model: command-r — great for RAG and tool use
Quick Comparison
| Provider | Best for | Free limit | Signup friction |
|---|---|---|---|
| ChatAnywhere | GPT-4o access, CN users | 200/day | GitHub login |
| Groq | Speed, Llama models | 14,400/day | |
| Cerebras | Ultra-fast inference | Generous | |
| OpenRouter | Model variety | 100+ free models | |
| Google AI Studio | Gemini models | 1,500/day | Google account |
| Mistral | European models | Low (1 RPM) |