From 25bb9e99a6a06d8504e067a4ef944cf72ef03925 Mon Sep 17 00:00:00 2001 From: Hamel Husain Date: Wed, 4 Mar 2026 11:23:39 -0800 Subject: [PATCH] Add skill: hamelsmu/evals-skills Add 7 LLM evaluation skills from hamelsmu/prompts to the Development and Testing section: - eval-audit: Audit LLM eval pipelines - error-analysis: Identify failure modes in LLM pipelines - generate-synthetic-data: Create synthetic test inputs - write-judge-prompt: Design LLM-as-Judge evaluators - validate-evaluator: Calibrate judges against human labels - evaluate-rag: Evaluate RAG retrieval and generation - build-review-interface: Build annotation interfaces for traces Ref: https://x.com/omerfarukaplak/status/2029270930552439281 --- README.md | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/README.md b/README.md index 219ac27..4aee8c9 100644 --- a/README.md +++ b/README.md @@ -791,6 +791,13 @@ Official Web3 and trading skills from the Binance team. Includes crypto market d - **[NeoLabHQ/ddd](https://github.com/NeoLabHQ/context-engineering-kit/tree/master/plugins/ddd)** - Domain-driven development skills that also include Clean Architecture, SOLID principles, and design patterns. - **[NeoLabHQ/sadd](https://github.com/NeoLabHQ/context-engineering-kit/tree/master/plugins/sadd)** - Dispatches independent subagents for individual tasks with code review checkpoints between iterations for rapid, controlled development. - **[NeoLabHQ/kaizen](https://github.com/NeoLabHQ/context-engineering-kit/tree/master/plugins/kaizen)** - Applies continuous improvement methodology with multiple analytical approaches, based on Japanese Kaizen philosophy and Lean methodology. +- **[hamelsmu/eval-audit](https://github.com/hamelsmu/prompts/tree/main/evals-skills/skills/eval-audit)** - Audit LLM eval pipelines and surface problems +- **[hamelsmu/error-analysis](https://github.com/hamelsmu/prompts/tree/main/evals-skills/skills/error-analysis)** - Systematically identify failure modes in LLM pipelines +- **[hamelsmu/generate-synthetic-data](https://github.com/hamelsmu/prompts/tree/main/evals-skills/skills/generate-synthetic-data)** - Create diverse synthetic test inputs for LLM evals +- **[hamelsmu/write-judge-prompt](https://github.com/hamelsmu/prompts/tree/main/evals-skills/skills/write-judge-prompt)** - Design LLM-as-Judge evaluators for subjective criteria +- **[hamelsmu/validate-evaluator](https://github.com/hamelsmu/prompts/tree/main/evals-skills/skills/validate-evaluator)** - Calibrate LLM judges against human labels +- **[hamelsmu/evaluate-rag](https://github.com/hamelsmu/prompts/tree/main/evals-skills/skills/evaluate-rag)** - Evaluate RAG retrieval and generation quality +- **[hamelsmu/build-review-interface](https://github.com/hamelsmu/prompts/tree/main/evals-skills/skills/build-review-interface)** - Build annotation interfaces for reviewing LLM traces