### Added - **New Skill**: iOS-APP-developer v1.1.0 - iOS development with XcodeGen, SwiftUI, and SPM - XcodeGen project.yml configuration - SPM dependency resolution - Device deployment and code signing - Camera/AVFoundation debugging - iOS version compatibility handling - Library not loaded @rpath framework error fixes - State machine testing patterns for @MainActor classes - Bundled references: xcodegen-full.md, camera-avfoundation.md, swiftui-compatibility.md, testing-mainactor.md - **New Skill**: promptfoo-evaluation v1.0.0 - LLM evaluation framework using Promptfoo - Promptfoo configuration (promptfooconfig.yaml) - Python custom assertions - llm-rubric for LLM-as-judge evaluations - Few-shot example management - Model comparison and prompt testing - Bundled reference: promptfoo_api.md ### Changed - Updated marketplace version from 1.16.0 to 1.18.0 - Updated marketplace skills count from 23 to 25 - Updated skill-creator to v1.2.2: - Fixed best practices documentation URL (platform.claude.com) - Enhanced quick_validate.py to exclude file:// prefixed paths from validation - Updated marketplace.json metadata description to include new skills 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
250 lines
5.3 KiB
Markdown
250 lines
5.3 KiB
Markdown
# Promptfoo API Reference
|
|
|
|
## Provider Configuration
|
|
|
|
### Echo Provider (No API Calls)
|
|
|
|
```yaml
|
|
providers:
|
|
- echo # Returns prompt as-is, no API calls
|
|
```
|
|
|
|
**Use cases:**
|
|
- Preview rendered prompts without cost
|
|
- Debug variable substitution
|
|
- Verify few-shot structure
|
|
- Test configuration before production runs
|
|
|
|
**Cost:** Free - no tokens consumed.
|
|
|
|
### Anthropic
|
|
|
|
```yaml
|
|
providers:
|
|
- id: anthropic:messages:claude-sonnet-4-5-20250929
|
|
config:
|
|
max_tokens: 4096
|
|
temperature: 0.7
|
|
```
|
|
|
|
### OpenAI
|
|
|
|
```yaml
|
|
providers:
|
|
- id: openai:gpt-4.1
|
|
config:
|
|
temperature: 0.5
|
|
max_tokens: 2048
|
|
```
|
|
|
|
### Multiple Providers (A/B Testing)
|
|
|
|
```yaml
|
|
providers:
|
|
- id: anthropic:messages:claude-sonnet-4-5-20250929
|
|
label: Claude
|
|
- id: openai:gpt-4.1
|
|
label: GPT-4.1
|
|
```
|
|
|
|
## Assertion Reference
|
|
|
|
### Python Assertion Context
|
|
|
|
```python
|
|
class AssertionContext:
|
|
prompt: str # Raw prompt sent to LLM
|
|
vars: dict # Test case variables
|
|
test: dict # Complete test case
|
|
config: dict # Assertion config
|
|
provider: Any # Provider info
|
|
providerResponse: Any # Full response
|
|
```
|
|
|
|
### GradingResult Format
|
|
|
|
```python
|
|
{
|
|
"pass": bool, # Required: pass/fail
|
|
"score": float, # 0.0-1.0 score
|
|
"reason": str, # Explanation
|
|
"named_scores": dict, # Custom metrics
|
|
"component_results": [] # Nested results
|
|
}
|
|
```
|
|
|
|
### Assertion Types
|
|
|
|
| Type | Description | Parameters |
|
|
|------|-------------|------------|
|
|
| `contains` | Substring check | `value` |
|
|
| `icontains` | Case-insensitive | `value` |
|
|
| `equals` | Exact match | `value` |
|
|
| `regex` | Pattern match | `value` |
|
|
| `not-contains` | Absence check | `value` |
|
|
| `starts-with` | Prefix check | `value` |
|
|
| `contains-any` | Any substring | `value` (array) |
|
|
| `contains-all` | All substrings | `value` (array) |
|
|
| `cost` | Token cost | `threshold` |
|
|
| `latency` | Response time | `threshold` (ms) |
|
|
| `perplexity` | Model confidence | `threshold` |
|
|
| `python` | Custom Python | `value` (file/code) |
|
|
| `javascript` | Custom JS | `value` (code) |
|
|
| `llm-rubric` | LLM grading | `value`, `threshold` |
|
|
| `factuality` | Fact checking | `value` (reference) |
|
|
| `model-graded-closedqa` | Q&A grading | `value` |
|
|
| `similar` | Semantic similarity | `value`, `threshold` |
|
|
|
|
## Test Case Configuration
|
|
|
|
### Full Test Case Structure
|
|
|
|
```yaml
|
|
- description: "Test name"
|
|
vars:
|
|
var1: "value"
|
|
var2: file://path.txt
|
|
assert:
|
|
- type: contains
|
|
value: "expected"
|
|
metadata:
|
|
category: "test-category"
|
|
priority: high
|
|
options:
|
|
provider: specific-provider
|
|
transform: "output.trim()"
|
|
```
|
|
|
|
### Loading Variables from Files
|
|
|
|
```yaml
|
|
vars:
|
|
# Text file (loaded as string)
|
|
content: file://data/input.txt
|
|
|
|
# JSON/YAML (parsed to object)
|
|
config: file://config.json
|
|
|
|
# Python script (executed, returns value)
|
|
dynamic: file://scripts/generate.py
|
|
|
|
# PDF (text extracted)
|
|
document: file://docs/report.pdf
|
|
|
|
# Image (base64 encoded)
|
|
image: file://images/photo.png
|
|
```
|
|
|
|
## Advanced Patterns
|
|
|
|
### Dynamic Test Generation (Python)
|
|
|
|
```python
|
|
# tests/generate.py
|
|
def get_tests():
|
|
return [
|
|
{
|
|
"vars": {"input": f"test {i}"},
|
|
"assert": [{"type": "contains", "value": str(i)}]
|
|
}
|
|
for i in range(10)
|
|
]
|
|
```
|
|
|
|
```yaml
|
|
tests: file://tests/generate.py:get_tests
|
|
```
|
|
|
|
### Scenario-based Testing
|
|
|
|
```yaml
|
|
scenarios:
|
|
- config:
|
|
- vars:
|
|
language: "French"
|
|
- vars:
|
|
language: "Spanish"
|
|
tests:
|
|
- vars:
|
|
text: "Hello"
|
|
assert:
|
|
- type: llm-rubric
|
|
value: "Translation is accurate"
|
|
```
|
|
|
|
### Transform Output
|
|
|
|
```yaml
|
|
defaultTest:
|
|
options:
|
|
transform: |
|
|
output.replace(/\n/g, ' ').trim()
|
|
```
|
|
|
|
### Custom Grading Provider
|
|
|
|
```yaml
|
|
defaultTest:
|
|
options:
|
|
provider: openai:gpt-4.1
|
|
assert:
|
|
- type: llm-rubric
|
|
value: "Evaluate quality"
|
|
provider: anthropic:claude-3-haiku # Override for this assertion
|
|
```
|
|
|
|
## Environment Variables
|
|
|
|
| Variable | Description |
|
|
|----------|-------------|
|
|
| `ANTHROPIC_API_KEY` | Anthropic API key |
|
|
| `OPENAI_API_KEY` | OpenAI API key |
|
|
| `PROMPTFOO_PYTHON` | Python binary path |
|
|
| `PROMPTFOO_CACHE_ENABLED` | Enable caching (default: true) |
|
|
| `PROMPTFOO_CACHE_PATH` | Cache directory |
|
|
|
|
## CLI Commands
|
|
|
|
```bash
|
|
# Initialize project
|
|
npx promptfoo@latest init
|
|
|
|
# Run evaluation
|
|
npx promptfoo@latest eval [options]
|
|
|
|
# Options:
|
|
# --config <path> Config file path
|
|
# --output <path> Output file path
|
|
# --grader <provider> Override grader model
|
|
# --no-cache Disable caching
|
|
# --filter-metadata Filter tests by metadata
|
|
# --repeat <n> Repeat each test n times
|
|
# --delay <ms> Delay between requests
|
|
# --max-concurrency Parallel requests
|
|
|
|
# View results
|
|
npx promptfoo@latest view [options]
|
|
|
|
# Share results
|
|
npx promptfoo@latest share
|
|
|
|
# Generate report
|
|
npx promptfoo@latest generate dataset
|
|
```
|
|
|
|
## Output Formats
|
|
|
|
```bash
|
|
# JSON (default)
|
|
--output results.json
|
|
|
|
# CSV
|
|
--output results.csv
|
|
|
|
# HTML report
|
|
--output results.html
|
|
|
|
# YAML
|
|
--output results.yaml
|
|
```
|