# Evaluation Rubric Score each case on 0-100 via weighted criteria: - Expected content coverage: +weight - Forbidden content violations: -weight - Regex/format compliance: +weight - Output length sanity: +/-weight Recommended acceptance gates: - Average score >= 85 - No case below 70 - Zero critical forbidden-content hits