- Add eval/ directory with 10 pilot skill eval configs - Add GitHub Action (skill-eval.yml) for automated eval on PR - Add generate-eval-config.py script for bootstrapping new evals - Add reusable assertion helpers (skill-quality.js) - Add eval README with setup and usage docs Skills covered: copywriting, cto-advisor, seo-audit, content-strategy, aws-solution-architect, agile-product-owner, senior-frontend, senior-security, mcp-server-builder, launch-strategy CI integration: - Triggers on PR to dev when SKILL.md files change - Detects which skills changed and runs only those evals - Posts results as PR comments (non-blocking) - Uploads full results as artifacts No existing files modified.
58 lines
2.3 KiB
YAML
58 lines
2.3 KiB
YAML
# Eval: copywriting
|
|
# Source: marketing-skill/copywriting/SKILL.md
|
|
# Run: npx promptfoo@latest eval -c eval/skills/copywriting.yaml
|
|
|
|
description: "Evaluate copywriting skill — marketing copy generation"
|
|
|
|
prompts:
|
|
- |
|
|
You are an expert AI assistant. You have the following skill loaded:
|
|
|
|
---BEGIN SKILL---
|
|
{{skill_content}}
|
|
---END SKILL---
|
|
|
|
Now complete this task: {{task}}
|
|
|
|
providers:
|
|
- id: anthropic:messages:claude-sonnet-4-6
|
|
config:
|
|
max_tokens: 4096
|
|
temperature: 0.7
|
|
|
|
tests:
|
|
- vars:
|
|
skill_content: file://../../marketing-skill/copywriting/SKILL.md
|
|
task: "Write homepage copy for a B2B SaaS that automates invoicing for freelancers called InvoiceFlow"
|
|
assert:
|
|
- type: llm-rubric
|
|
value: "Output includes a clear headline, subheadline, at least 3 value propositions, and a call-to-action"
|
|
- type: llm-rubric
|
|
value: "Copy is specific to InvoiceFlow and freelancer invoicing, not generic B2B marketing"
|
|
- type: llm-rubric
|
|
value: "Copy follows direct-response copywriting principles with benefit-driven language"
|
|
- type: javascript
|
|
value: "output.length > 500"
|
|
|
|
- vars:
|
|
skill_content: file://../../marketing-skill/copywriting/SKILL.md
|
|
task: "Rewrite this landing page headline and subheadline: 'Welcome to our platform. We help businesses grow with our comprehensive solution for managing operations.' Make it compelling for a project management tool targeting remote teams."
|
|
assert:
|
|
- type: llm-rubric
|
|
value: "The rewritten headline is specific, benefit-driven, and not generic"
|
|
- type: llm-rubric
|
|
value: "The output specifically addresses remote teams, not generic businesses"
|
|
- type: javascript
|
|
value: "output.length > 100"
|
|
|
|
- vars:
|
|
skill_content: file://../../marketing-skill/copywriting/SKILL.md
|
|
task: "Write a pricing page for a developer tool with 3 tiers: Free, Pro ($29/mo), and Enterprise (custom). The tool is an API monitoring service called PingGuard."
|
|
assert:
|
|
- type: llm-rubric
|
|
value: "Output includes copy for all three pricing tiers with differentiated value propositions"
|
|
- type: llm-rubric
|
|
value: "Each tier has clear feature descriptions and the copy encourages upgrade paths"
|
|
- type: javascript
|
|
value: "output.length > 400"
|