Files
daymade 11b7539f10 Update promptfoo-evaluation skill with relay API and concurrency lessons
- Update model ID to claude-sonnet-4-6 (latest, Feb 2026)
- Add Relay/Proxy API Configuration section with apiBaseUrl patterns
- Document that maxConcurrency MUST be under commandLineOptions (not top-level)
- Add LLM-as-judge relay provider config (apiBaseUrl not inherited)
- Add 5 new Troubleshooting entries from real-world title-agent eval
- Add Concurrency Control section to API reference
- Clarify file:// path resolution (always relative to promptfooconfig.yaml)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-24 22:29:24 +08:00

287 lines
6.5 KiB
Markdown

# Promptfoo API Reference
## Provider Configuration
### Echo Provider (No API Calls)
```yaml
providers:
- echo # Returns prompt as-is, no API calls
```
**Use cases:**
- Preview rendered prompts without cost
- Debug variable substitution
- Verify few-shot structure
- Test configuration before production runs
**Cost:** Free - no tokens consumed.
### Anthropic
```yaml
providers:
- id: anthropic:messages:claude-sonnet-4-6
config:
max_tokens: 4096
temperature: 0.7
# For relay/proxy APIs:
# apiBaseUrl: https://your-relay.example.com/api
```
### OpenAI
```yaml
providers:
- id: openai:gpt-4.1
config:
temperature: 0.5
max_tokens: 2048
```
### Multiple Providers (A/B Testing)
```yaml
providers:
- id: anthropic:messages:claude-sonnet-4-6
label: Claude
- id: openai:gpt-4.1
label: GPT-4.1
```
## Assertion Reference
### Python Assertion Context
```python
class AssertionContext:
prompt: str # Raw prompt sent to LLM
vars: dict # Test case variables
test: dict # Complete test case
config: dict # Assertion config
provider: Any # Provider info
providerResponse: Any # Full response
```
### GradingResult Format
```python
{
"pass": bool, # Required: pass/fail
"score": float, # 0.0-1.0 score
"reason": str, # Explanation
"named_scores": dict, # Custom metrics
"component_results": [] # Nested results
}
```
### Assertion Types
| Type | Description | Parameters |
|------|-------------|------------|
| `contains` | Substring check | `value` |
| `icontains` | Case-insensitive | `value` |
| `equals` | Exact match | `value` |
| `regex` | Pattern match | `value` |
| `not-contains` | Absence check | `value` |
| `starts-with` | Prefix check | `value` |
| `contains-any` | Any substring | `value` (array) |
| `contains-all` | All substrings | `value` (array) |
| `cost` | Token cost | `threshold` |
| `latency` | Response time | `threshold` (ms) |
| `perplexity` | Model confidence | `threshold` |
| `python` | Custom Python | `value` (file/code) |
| `javascript` | Custom JS | `value` (code) |
| `llm-rubric` | LLM grading | `value`, `threshold` |
| `factuality` | Fact checking | `value` (reference) |
| `model-graded-closedqa` | Q&A grading | `value` |
| `similar` | Semantic similarity | `value`, `threshold` |
## Test Case Configuration
### Full Test Case Structure
```yaml
- description: "Test name"
vars:
var1: "value"
var2: file://path.txt
assert:
- type: contains
value: "expected"
metadata:
category: "test-category"
priority: high
options:
provider: specific-provider
transform: "output.trim()"
```
### Loading Variables from Files
```yaml
vars:
# Text file (loaded as string)
content: file://data/input.txt
# JSON/YAML (parsed to object)
config: file://config.json
# Python script (executed, returns value)
dynamic: file://scripts/generate.py
# PDF (text extracted)
document: file://docs/report.pdf
# Image (base64 encoded)
image: file://images/photo.png
```
## Advanced Patterns
### Dynamic Test Generation (Python)
```python
# tests/generate.py
def get_tests():
return [
{
"vars": {"input": f"test {i}"},
"assert": [{"type": "contains", "value": str(i)}]
}
for i in range(10)
]
```
```yaml
tests: file://tests/generate.py:get_tests
```
### Scenario-based Testing
```yaml
scenarios:
- config:
- vars:
language: "French"
- vars:
language: "Spanish"
tests:
- vars:
text: "Hello"
assert:
- type: llm-rubric
value: "Translation is accurate"
```
### Transform Output
```yaml
defaultTest:
options:
transform: |
output.replace(/\n/g, ' ').trim()
```
### Custom Grading Provider
```yaml
defaultTest:
options:
provider: openai:gpt-4.1
assert:
- type: llm-rubric
value: "Evaluate quality"
provider: anthropic:claude-3-haiku # Override for this assertion
```
### Relay/Proxy Provider with LLM-as-Judge
When using a relay or proxy API, each `llm-rubric` needs its own `provider` config — the main provider's `apiBaseUrl` is NOT inherited:
```yaml
providers:
- id: anthropic:messages:claude-sonnet-4-6
config:
apiBaseUrl: https://your-relay.example.com/api
defaultTest:
assert:
- type: llm-rubric
value: "Evaluate quality"
provider:
id: anthropic:messages:claude-sonnet-4-6
config:
apiBaseUrl: https://your-relay.example.com/api # Must repeat here
```
## Environment Variables
| Variable | Description |
|----------|-------------|
| `ANTHROPIC_API_KEY` | Anthropic API key |
| `OPENAI_API_KEY` | OpenAI API key |
| `PROMPTFOO_PYTHON` | Python binary path |
| `PROMPTFOO_CACHE_ENABLED` | Enable caching (default: true) |
| `PROMPTFOO_CACHE_PATH` | Cache directory |
## Concurrency Control
**CRITICAL**: `maxConcurrency` must be placed under `commandLineOptions:` in the YAML config. Placing it at the top level is silently ignored.
```yaml
# ✅ Correct — under commandLineOptions
commandLineOptions:
maxConcurrency: 1
# ❌ Wrong — top level (silently ignored, defaults to ~4)
maxConcurrency: 1
```
When using relay APIs with LLM-as-judge, set `maxConcurrency: 1` because generation and grading share the same concurrent request pool.
## CLI Commands
```bash
# Initialize project
npx promptfoo@latest init
# Run evaluation
npx promptfoo@latest eval [options]
# Options:
# --config <path> Config file path
# --output <path> Output file path
# --grader <provider> Override grader model
# --no-cache Disable caching (important for re-runs)
# --filter-metadata Filter tests by metadata
# --repeat <n> Repeat each test n times
# --delay <ms> Delay between requests
# --max-concurrency Parallel requests (CLI override)
# View results
npx promptfoo@latest view [options]
# Share results
npx promptfoo@latest share
# Generate report
npx promptfoo@latest generate dataset
```
## Output Formats
```bash
# JSON (default)
--output results.json
# CSV
--output results.csv
# HTML report
--output results.html
# YAML
--output results.yaml
```