Release v1.18.0: Add iOS-APP-developer and promptfoo-evaluation skills

### Added - **New Skill**: iOS-APP-developer v1.1.0 - iOS development with XcodeGen, SwiftUI, and SPM - XcodeGen project.yml configuration - SPM dependency resolution - Device deployment and code signing - Camera/AVFoundation debugging - iOS version compatibility handling - Library not loaded @rpath framework error fixes - State machine testing patterns for @MainActor classes - Bundled references: xcodegen-full.md, camera-avfoundation.md, swiftui-compatibility.md, testing-mainactor.md - **New Skill**: promptfoo-evaluation v1.0.0 - LLM evaluation framework using Promptfoo - Promptfoo configuration (promptfooconfig.yaml) - Python custom assertions - llm-rubric for LLM-as-judge evaluations - Few-shot example management - Model comparison and prompt testing - Bundled reference: promptfoo_api.md ### Changed - Updated marketplace version from 1.16.0 to 1.18.0 - Updated marketplace skills count from 23 to 25 - Updated skill-creator to v1.2.2: - Fixed best practices documentation URL (platform.claude.com) - Enhanced quick_validate.py to exclude file:// prefixed paths from validation - Updated marketplace.json metadata description to include new skills 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-20 17:23:08 +08:00
parent b1a21dc05b
commit 4e3a54175e
12 changed files with 1805 additions and 5 deletions
--- a/promptfoo-evaluation/references/promptfoo_api.md
+++ b/promptfoo-evaluation/references/promptfoo_api.md
@@ -0,0 +1,249 @@
+# Promptfoo API Reference
+
+## Provider Configuration
+
+### Echo Provider (No API Calls)
+
+```yaml
+providers:
+  - echo  # Returns prompt as-is, no API calls
+```
+
+**Use cases:**
+- Preview rendered prompts without cost
+- Debug variable substitution
+- Verify few-shot structure
+- Test configuration before production runs
+
+**Cost:** Free - no tokens consumed.
+
+### Anthropic
+
+```yaml
+providers:
+  - id: anthropic:messages:claude-sonnet-4-5-20250929
+    config:
+      max_tokens: 4096
+      temperature: 0.7
+```
+
+### OpenAI
+
+```yaml
+providers:
+  - id: openai:gpt-4.1
+    config:
+      temperature: 0.5
+      max_tokens: 2048
+```
+
+### Multiple Providers (A/B Testing)
+
+```yaml
+providers:
+  - id: anthropic:messages:claude-sonnet-4-5-20250929
+    label: Claude
+  - id: openai:gpt-4.1
+    label: GPT-4.1
+```
+
+## Assertion Reference
+
+### Python Assertion Context
+
+```python
+class AssertionContext:
+    prompt: str              # Raw prompt sent to LLM
+    vars: dict               # Test case variables
+    test: dict               # Complete test case
+    config: dict             # Assertion config
+    provider: Any            # Provider info
+    providerResponse: Any    # Full response
+```
+
+### GradingResult Format
+
+```python
+{
+    "pass": bool,           # Required: pass/fail
+    "score": float,         # 0.0-1.0 score
+    "reason": str,          # Explanation
+    "named_scores": dict,   # Custom metrics
+    "component_results": [] # Nested results
+}
+```
+
+### Assertion Types
+
+| Type | Description | Parameters |
+|------|-------------|------------|
+| `contains` | Substring check | `value` |
+| `icontains` | Case-insensitive | `value` |
+| `equals` | Exact match | `value` |
+| `regex` | Pattern match | `value` |
+| `not-contains` | Absence check | `value` |
+| `starts-with` | Prefix check | `value` |
+| `contains-any` | Any substring | `value` (array) |
+| `contains-all` | All substrings | `value` (array) |
+| `cost` | Token cost | `threshold` |
+| `latency` | Response time | `threshold` (ms) |
+| `perplexity` | Model confidence | `threshold` |
+| `python` | Custom Python | `value` (file/code) |
+| `javascript` | Custom JS | `value` (code) |
+| `llm-rubric` | LLM grading | `value`, `threshold` |
+| `factuality` | Fact checking | `value` (reference) |
+| `model-graded-closedqa` | Q&A grading | `value` |
+| `similar` | Semantic similarity | `value`, `threshold` |
+
+## Test Case Configuration
+
+### Full Test Case Structure
+
+```yaml
+- description: "Test name"
+  vars:
+    var1: "value"
+    var2: file://path.txt
+  assert:
+    - type: contains
+      value: "expected"
+  metadata:
+    category: "test-category"
+    priority: high
+  options:
+    provider: specific-provider
+    transform: "output.trim()"
+```
+
+### Loading Variables from Files
+
+```yaml
+vars:
+  # Text file (loaded as string)
+  content: file://data/input.txt
+
+  # JSON/YAML (parsed to object)
+  config: file://config.json
+
+  # Python script (executed, returns value)
+  dynamic: file://scripts/generate.py
+
+  # PDF (text extracted)
+  document: file://docs/report.pdf
+
+  # Image (base64 encoded)
+  image: file://images/photo.png
+```
+
+## Advanced Patterns
+
+### Dynamic Test Generation (Python)
+
+```python
+# tests/generate.py
+def get_tests():
+    return [
+        {
+            "vars": {"input": f"test {i}"},
+            "assert": [{"type": "contains", "value": str(i)}]
+        }
+        for i in range(10)
+    ]
+```
+
+```yaml
+tests: file://tests/generate.py:get_tests
+```
+
+### Scenario-based Testing
+
+```yaml
+scenarios:
+  - config:
+      - vars:
+          language: "French"
+      - vars:
+          language: "Spanish"
+    tests:
+      - vars:
+          text: "Hello"
+        assert:
+          - type: llm-rubric
+            value: "Translation is accurate"
+```
+
+### Transform Output
+
+```yaml
+defaultTest:
+  options:
+    transform: |
+      output.replace(/\n/g, ' ').trim()
+```
+
+### Custom Grading Provider
+
+```yaml
+defaultTest:
+  options:
+    provider: openai:gpt-4.1
+  assert:
+    - type: llm-rubric
+      value: "Evaluate quality"
+      provider: anthropic:claude-3-haiku  # Override for this assertion
+```
+
+## Environment Variables
+
+| Variable | Description |
+|----------|-------------|
+| `ANTHROPIC_API_KEY` | Anthropic API key |
+| `OPENAI_API_KEY` | OpenAI API key |
+| `PROMPTFOO_PYTHON` | Python binary path |
+| `PROMPTFOO_CACHE_ENABLED` | Enable caching (default: true) |
+| `PROMPTFOO_CACHE_PATH` | Cache directory |
+
+## CLI Commands
+
+```bash
+# Initialize project
+npx promptfoo@latest init
+
+# Run evaluation
+npx promptfoo@latest eval [options]
+
+# Options:
+#   --config <path>     Config file path
+#   --output <path>     Output file path
+#   --grader <provider> Override grader model
+#   --no-cache          Disable caching
+#   --filter-metadata   Filter tests by metadata
+#   --repeat <n>        Repeat each test n times
+#   --delay <ms>        Delay between requests
+#   --max-concurrency   Parallel requests
+
+# View results
+npx promptfoo@latest view [options]
+
+# Share results
+npx promptfoo@latest share
+
+# Generate report
+npx promptfoo@latest generate dataset
+```
+
+## Output Formats
+
+```bash
+# JSON (default)
+--output results.json
+
+# CSV
+--output results.csv
+
+# HTML report
+--output results.html
+
+# YAML
+--output results.yaml
+```