* fix(skill): enhance git-worktree-manager with scripts, references, and Anthropic best practices * fix(skill): enhance mcp-server-builder with scripts, references, and Anthropic best practices * fix(skill): enhance changelog-generator with scripts, references, and Anthropic best practices * fix(skill): enhance ci-cd-pipeline-builder with scripts, references, and Anthropic best practices * fix(skill): enhance prompt-engineer-toolkit with scripts, references, and Anthropic best practices * docs: update README, CHANGELOG, and plugin metadata * fix: correct marketing plugin count, expand thin references --------- Co-authored-by: Leo <leo@openclaw.ai>
328 B
328 B
Evaluation Rubric
Score each case on 0-100 via weighted criteria:
- Expected content coverage: +weight
- Forbidden content violations: -weight
- Regex/format compliance: +weight
- Output length sanity: +/-weight
Recommended acceptance gates:
- Average score >= 85
- No case below 70
- Zero critical forbidden-content hits