### Added - **New Skill**: iOS-APP-developer v1.1.0 - iOS development with XcodeGen, SwiftUI, and SPM - XcodeGen project.yml configuration - SPM dependency resolution - Device deployment and code signing - Camera/AVFoundation debugging - iOS version compatibility handling - Library not loaded @rpath framework error fixes - State machine testing patterns for @MainActor classes - Bundled references: xcodegen-full.md, camera-avfoundation.md, swiftui-compatibility.md, testing-mainactor.md - **New Skill**: promptfoo-evaluation v1.0.0 - LLM evaluation framework using Promptfoo - Promptfoo configuration (promptfooconfig.yaml) - Python custom assertions - llm-rubric for LLM-as-judge evaluations - Few-shot example management - Model comparison and prompt testing - Bundled reference: promptfoo_api.md ### Changed - Updated marketplace version from 1.16.0 to 1.18.0 - Updated marketplace skills count from 23 to 25 - Updated skill-creator to v1.2.2: - Fixed best practices documentation URL (platform.claude.com) - Enhanced quick_validate.py to exclude file:// prefixed paths from validation - Updated marketplace.json metadata description to include new skills 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
5.3 KiB
5.3 KiB
Promptfoo API Reference
Provider Configuration
Echo Provider (No API Calls)
providers:
- echo # Returns prompt as-is, no API calls
Use cases:
- Preview rendered prompts without cost
- Debug variable substitution
- Verify few-shot structure
- Test configuration before production runs
Cost: Free - no tokens consumed.
Anthropic
providers:
- id: anthropic:messages:claude-sonnet-4-5-20250929
config:
max_tokens: 4096
temperature: 0.7
OpenAI
providers:
- id: openai:gpt-4.1
config:
temperature: 0.5
max_tokens: 2048
Multiple Providers (A/B Testing)
providers:
- id: anthropic:messages:claude-sonnet-4-5-20250929
label: Claude
- id: openai:gpt-4.1
label: GPT-4.1
Assertion Reference
Python Assertion Context
class AssertionContext:
prompt: str # Raw prompt sent to LLM
vars: dict # Test case variables
test: dict # Complete test case
config: dict # Assertion config
provider: Any # Provider info
providerResponse: Any # Full response
GradingResult Format
{
"pass": bool, # Required: pass/fail
"score": float, # 0.0-1.0 score
"reason": str, # Explanation
"named_scores": dict, # Custom metrics
"component_results": [] # Nested results
}
Assertion Types
| Type | Description | Parameters |
|---|---|---|
contains |
Substring check | value |
icontains |
Case-insensitive | value |
equals |
Exact match | value |
regex |
Pattern match | value |
not-contains |
Absence check | value |
starts-with |
Prefix check | value |
contains-any |
Any substring | value (array) |
contains-all |
All substrings | value (array) |
cost |
Token cost | threshold |
latency |
Response time | threshold (ms) |
perplexity |
Model confidence | threshold |
python |
Custom Python | value (file/code) |
javascript |
Custom JS | value (code) |
llm-rubric |
LLM grading | value, threshold |
factuality |
Fact checking | value (reference) |
model-graded-closedqa |
Q&A grading | value |
similar |
Semantic similarity | value, threshold |
Test Case Configuration
Full Test Case Structure
- description: "Test name"
vars:
var1: "value"
var2: file://path.txt
assert:
- type: contains
value: "expected"
metadata:
category: "test-category"
priority: high
options:
provider: specific-provider
transform: "output.trim()"
Loading Variables from Files
vars:
# Text file (loaded as string)
content: file://data/input.txt
# JSON/YAML (parsed to object)
config: file://config.json
# Python script (executed, returns value)
dynamic: file://scripts/generate.py
# PDF (text extracted)
document: file://docs/report.pdf
# Image (base64 encoded)
image: file://images/photo.png
Advanced Patterns
Dynamic Test Generation (Python)
# tests/generate.py
def get_tests():
return [
{
"vars": {"input": f"test {i}"},
"assert": [{"type": "contains", "value": str(i)}]
}
for i in range(10)
]
tests: file://tests/generate.py:get_tests
Scenario-based Testing
scenarios:
- config:
- vars:
language: "French"
- vars:
language: "Spanish"
tests:
- vars:
text: "Hello"
assert:
- type: llm-rubric
value: "Translation is accurate"
Transform Output
defaultTest:
options:
transform: |
output.replace(/\n/g, ' ').trim()
Custom Grading Provider
defaultTest:
options:
provider: openai:gpt-4.1
assert:
- type: llm-rubric
value: "Evaluate quality"
provider: anthropic:claude-3-haiku # Override for this assertion
Environment Variables
| Variable | Description |
|---|---|
ANTHROPIC_API_KEY |
Anthropic API key |
OPENAI_API_KEY |
OpenAI API key |
PROMPTFOO_PYTHON |
Python binary path |
PROMPTFOO_CACHE_ENABLED |
Enable caching (default: true) |
PROMPTFOO_CACHE_PATH |
Cache directory |
CLI Commands
# Initialize project
npx promptfoo@latest init
# Run evaluation
npx promptfoo@latest eval [options]
# Options:
# --config <path> Config file path
# --output <path> Output file path
# --grader <provider> Override grader model
# --no-cache Disable caching
# --filter-metadata Filter tests by metadata
# --repeat <n> Repeat each test n times
# --delay <ms> Delay between requests
# --max-concurrency Parallel requests
# View results
npx promptfoo@latest view [options]
# Share results
npx promptfoo@latest share
# Generate report
npx promptfoo@latest generate dataset
Output Formats
# JSON (default)
--output results.json
# CSV
--output results.csv
# HTML report
--output results.html
# YAML
--output results.yaml