- Add eval/ directory with 10 pilot skill eval configs - Add GitHub Action (skill-eval.yml) for automated eval on PR - Add generate-eval-config.py script for bootstrapping new evals - Add reusable assertion helpers (skill-quality.js) - Add eval README with setup and usage docs Skills covered: copywriting, cto-advisor, seo-audit, content-strategy, aws-solution-architect, agile-product-owner, senior-frontend, senior-security, mcp-server-builder, launch-strategy CI integration: - Triggers on PR to dev when SKILL.md files change - Detects which skills changed and runs only those evals - Posts results as PR comments (non-blocking) - Uploads full results as artifacts No existing files modified.
42 lines
1.7 KiB
YAML
42 lines
1.7 KiB
YAML
# Eval: aws-solution-architect
|
|
# Source: engineering-team/aws-solution-architect/SKILL.md
|
|
|
|
description: "Evaluate AWS solution architect skill"
|
|
|
|
prompts:
|
|
- |
|
|
You are an expert AI assistant. You have the following skill loaded:
|
|
|
|
---BEGIN SKILL---
|
|
{{skill_content}}
|
|
---END SKILL---
|
|
|
|
Now complete this task: {{task}}
|
|
|
|
providers:
|
|
- id: anthropic:messages:claude-sonnet-4-6
|
|
config:
|
|
max_tokens: 4096
|
|
temperature: 0.7
|
|
|
|
tests:
|
|
- vars:
|
|
skill_content: file://../../engineering-team/aws-solution-architect/SKILL.md
|
|
task: "Design a serverless architecture for a real-time notification system that needs to handle 10K messages per second with sub-200ms delivery. Users connect via WebSocket. Budget is $500/month."
|
|
assert:
|
|
- type: llm-rubric
|
|
value: "Response uses specific AWS services (API Gateway WebSocket, Lambda, DynamoDB, etc.) not generic cloud patterns"
|
|
- type: llm-rubric
|
|
value: "Response addresses the throughput requirement (10K msg/s) with concrete scaling strategy"
|
|
- type: llm-rubric
|
|
value: "Response includes cost estimation relative to the $500/month budget constraint"
|
|
|
|
- vars:
|
|
skill_content: file://../../engineering-team/aws-solution-architect/SKILL.md
|
|
task: "We're migrating a Django monolith from Heroku to AWS. We have PostgreSQL, Redis, Celery workers, and S3 for file storage. Team of 3 devs, no DevOps experience. What's the simplest production-ready setup?"
|
|
assert:
|
|
- type: llm-rubric
|
|
value: "Response recommends managed services appropriate for a small team without DevOps (e.g., ECS Fargate, RDS, ElastiCache)"
|
|
- type: llm-rubric
|
|
value: "Response includes a migration plan with phases, not just target architecture"
|