firefrost-gaming/claude-code-skills-reference

Files

daymade 4e3a54175e Release v1.18.0: Add iOS-APP-developer and promptfoo-evaluation skills

### Added
- **New Skill**: iOS-APP-developer v1.1.0 - iOS development with XcodeGen, SwiftUI, and SPM
  - XcodeGen project.yml configuration
  - SPM dependency resolution
  - Device deployment and code signing
  - Camera/AVFoundation debugging
  - iOS version compatibility handling
  - Library not loaded @rpath framework error fixes
  - State machine testing patterns for @MainActor classes
  - Bundled references: xcodegen-full.md, camera-avfoundation.md, swiftui-compatibility.md, testing-mainactor.md

- **New Skill**: promptfoo-evaluation v1.0.0 - LLM evaluation framework using Promptfoo
  - Promptfoo configuration (promptfooconfig.yaml)
  - Python custom assertions
  - llm-rubric for LLM-as-judge evaluations
  - Few-shot example management
  - Model comparison and prompt testing
  - Bundled reference: promptfoo_api.md

### Changed
- Updated marketplace version from 1.16.0 to 1.18.0
- Updated marketplace skills count from 23 to 25
- Updated skill-creator to v1.2.2:
  - Fixed best practices documentation URL (platform.claude.com)
  - Enhanced quick_validate.py to exclude file:// prefixed paths from validation
- Updated marketplace.json metadata description to include new skills

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

2025-12-20 17:23:08 +08:00

5.3 KiB

Raw Blame History

Promptfoo API Reference

Provider Configuration

Echo Provider (No API Calls)

providers:
  - echo  # Returns prompt as-is, no API calls

Use cases:

Preview rendered prompts without cost
Debug variable substitution
Verify few-shot structure
Test configuration before production runs

Cost: Free - no tokens consumed.

Anthropic

providers:
  - id: anthropic:messages:claude-sonnet-4-5-20250929
    config:
      max_tokens: 4096
      temperature: 0.7

OpenAI

providers:
  - id: openai:gpt-4.1
    config:
      temperature: 0.5
      max_tokens: 2048

Multiple Providers (A/B Testing)

providers:
  - id: anthropic:messages:claude-sonnet-4-5-20250929
    label: Claude
  - id: openai:gpt-4.1
    label: GPT-4.1

Assertion Reference

Python Assertion Context

class AssertionContext:
    prompt: str              # Raw prompt sent to LLM
    vars: dict               # Test case variables
    test: dict               # Complete test case
    config: dict             # Assertion config
    provider: Any            # Provider info
    providerResponse: Any    # Full response

GradingResult Format

{
    "pass": bool,           # Required: pass/fail
    "score": float,         # 0.0-1.0 score
    "reason": str,          # Explanation
    "named_scores": dict,   # Custom metrics
    "component_results": [] # Nested results
}

Assertion Types

Type	Description	Parameters
`contains`	Substring check	`value`
`icontains`	Case-insensitive	`value`
`equals`	Exact match	`value`
`regex`	Pattern match	`value`
`not-contains`	Absence check	`value`
`starts-with`	Prefix check	`value`
`contains-any`	Any substring	`value` (array)
`contains-all`	All substrings	`value` (array)
`cost`	Token cost	`threshold`
`latency`	Response time	`threshold` (ms)
`perplexity`	Model confidence	`threshold`
`python`	Custom Python	`value` (file/code)
`javascript`	Custom JS	`value` (code)
`llm-rubric`	LLM grading	`value`, `threshold`
`factuality`	Fact checking	`value` (reference)
`model-graded-closedqa`	Q&A grading	`value`
`similar`	Semantic similarity	`value`, `threshold`

Test Case Configuration

Full Test Case Structure

- description: "Test name"
  vars:
    var1: "value"
    var2: file://path.txt
  assert:
    - type: contains
      value: "expected"
  metadata:
    category: "test-category"
    priority: high
  options:
    provider: specific-provider
    transform: "output.trim()"

Loading Variables from Files

vars:
  # Text file (loaded as string)
  content: file://data/input.txt

  # JSON/YAML (parsed to object)
  config: file://config.json

  # Python script (executed, returns value)
  dynamic: file://scripts/generate.py

  # PDF (text extracted)
  document: file://docs/report.pdf

  # Image (base64 encoded)
  image: file://images/photo.png

Advanced Patterns

Dynamic Test Generation (Python)

# tests/generate.py
def get_tests():
    return [
        {
            "vars": {"input": f"test {i}"},
            "assert": [{"type": "contains", "value": str(i)}]
        }
        for i in range(10)
    ]

tests: file://tests/generate.py:get_tests

Scenario-based Testing

scenarios:
  - config:
      - vars:
          language: "French"
      - vars:
          language: "Spanish"
    tests:
      - vars:
          text: "Hello"
        assert:
          - type: llm-rubric
            value: "Translation is accurate"

Transform Output

defaultTest:
  options:
    transform: |
      output.replace(/\n/g, ' ').trim()

Custom Grading Provider

defaultTest:
  options:
    provider: openai:gpt-4.1
  assert:
    - type: llm-rubric
      value: "Evaluate quality"
      provider: anthropic:claude-3-haiku  # Override for this assertion

Environment Variables

Variable	Description
`ANTHROPIC_API_KEY`	Anthropic API key
`OPENAI_API_KEY`	OpenAI API key
`PROMPTFOO_PYTHON`	Python binary path
`PROMPTFOO_CACHE_ENABLED`	Enable caching (default: true)
`PROMPTFOO_CACHE_PATH`	Cache directory

CLI Commands

# Initialize project
npx promptfoo@latest init

# Run evaluation
npx promptfoo@latest eval [options]

# Options:
#   --config <path>     Config file path
#   --output <path>     Output file path
#   --grader <provider> Override grader model
#   --no-cache          Disable caching
#   --filter-metadata   Filter tests by metadata
#   --repeat <n>        Repeat each test n times
#   --delay <ms>        Delay between requests
#   --max-concurrency   Parallel requests

# View results
npx promptfoo@latest view [options]

# Share results
npx promptfoo@latest share

# Generate report
npx promptfoo@latest generate dataset

Output Formats

# JSON (default)
--output results.json

# CSV
--output results.csv

# HTML report
--output results.html

# YAML
--output results.yaml

5.3 KiB Raw Blame History

Promptfoo API Reference

Provider Configuration

Echo Provider (No API Calls)

Anthropic

OpenAI

Multiple Providers (A/B Testing)

Assertion Reference

Python Assertion Context

GradingResult Format

Assertion Types

Test Case Configuration

Full Test Case Structure

Loading Variables from Files

Advanced Patterns

Dynamic Test Generation (Python)

Scenario-based Testing

Transform Output

Custom Grading Provider

Environment Variables

CLI Commands

Output Formats

5.3 KiB

Raw Blame History