Issue #49 feedback implementation: SKILL.md: - Added YAML frontmatter with trigger phrases - Removed marketing language ("world-class", etc.) - Added Table of Contents - Converted vague bullets to concrete workflows - Added input/output examples for all tools Reference files (all 3 previously 100% identical): - prompt_engineering_patterns.md: 10 patterns with examples (Zero-Shot, Few-Shot, CoT, Role, Structured Output, etc.) - llm_evaluation_frameworks.md: 7 sections on metrics (BLEU, ROUGE, BERTScore, RAG metrics, A/B testing) - agentic_system_design.md: 6 agent architecture sections (ReAct, Plan-Execute, Tool Use, Multi-Agent, Memory) Python scripts (all 3 previously identical placeholders): - prompt_optimizer.py: Token counting, clarity analysis, few-shot extraction, optimization suggestions - rag_evaluator.py: Context relevance, faithfulness, retrieval metrics (Precision@K, MRR, NDCG) - agent_orchestrator.py: Config parsing, validation, ASCII/Mermaid visualization, cost estimation Total: 3,571 lines added, 587 deleted Before: ~785 lines duplicate boilerplate After: 3,750 lines unique, actionable content Closes #49 Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
13 KiB
Prompt Engineering Patterns
Specific prompt techniques with example inputs and expected outputs.
Patterns Index
- Zero-Shot Prompting
- Few-Shot Prompting
- Chain-of-Thought (CoT)
- Role Prompting
- Structured Output
- Self-Consistency
- ReAct (Reasoning + Acting)
- Tree of Thoughts
- Retrieval-Augmented Generation
- Meta-Prompting
1. Zero-Shot Prompting
When to use: Simple, well-defined tasks where the model has sufficient training knowledge.
Pattern:
[Task instruction]
[Input]
Example:
Input:
Classify the following customer review as positive, negative, or neutral.
Review: "The shipping was fast but the product quality was disappointing."
Expected Output:
negative
Best practices:
- Be explicit about output format
- Use clear, unambiguous verbs (classify, extract, summarize)
- Specify constraints (word limits, format requirements)
When to avoid:
- Tasks requiring specific formatting the model hasn't seen
- Domain-specific tasks requiring specialized knowledge
- Tasks where consistency is critical
2. Few-Shot Prompting
When to use: Tasks requiring consistent formatting or domain-specific patterns.
Pattern:
[Task description]
Example 1:
Input: [example input]
Output: [example output]
Example 2:
Input: [example input]
Output: [example output]
Now process:
Input: [actual input]
Output:
Example:
Input:
Extract the company name and founding year from the text.
Example 1:
Input: "Apple Inc. was founded in 1976 by Steve Jobs."
Output: {"company": "Apple Inc.", "year": 1976}
Example 2:
Input: "Microsoft Corporation started in 1975."
Output: {"company": "Microsoft Corporation", "year": 1975}
Example 3:
Input: "Founded in 1994, Amazon has grown into a tech giant."
Output: {"company": "Amazon", "year": 1994}
Now process:
Input: "Tesla, Inc. was established in 2003 by Martin Eberhard."
Output:
Expected Output:
{"company": "Tesla, Inc.", "year": 2003}
Example selection guidelines:
| Example Type | Purpose | Count |
|---|---|---|
| Simple/typical | Establish basic pattern | 1-2 |
| Edge case | Handle ambiguity | 1 |
| Different format | Show variations | 1 |
| Total | 3-5 |
Common mistakes:
- Too many examples (wastes tokens, may confuse)
- Inconsistent formatting between examples
- Examples too similar (model doesn't learn variation)
3. Chain-of-Thought (CoT)
When to use: Math problems, logical reasoning, multi-step analysis.
Pattern:
[Problem]
Think through this step by step:
Example:
Input:
A store sells apples for $2 each. If you buy 5 or more, you get 20% off.
How much do 7 apples cost?
Think through this step by step:
Expected Output:
Step 1: Calculate the regular price for 7 apples
7 apples × $2 = $14
Step 2: Check if discount applies
7 apples ≥ 5 apples, so 20% discount applies
Step 3: Calculate the discount amount
20% of $14 = 0.20 × $14 = $2.80
Step 4: Calculate final price
$14 - $2.80 = $11.20
The 7 apples cost $11.20
Variations:
| Variation | Trigger Phrase | Use Case |
|---|---|---|
| Basic CoT | "Think step by step" | General reasoning |
| Explicit steps | "Break this into numbered steps" | Complex procedures |
| Show work | "Show your calculations" | Math problems |
| Reasoning chain | "Explain your reasoning" | Justification needed |
Zero-shot CoT: Simply append "Let's think step by step." to any prompt.
4. Role Prompting
When to use: Tasks requiring expertise, specific perspective, or consistent persona.
Pattern:
You are a [role/expert description].
[Context about the task]
[Specific instruction]
Example:
Input:
You are a senior security engineer with 15 years of experience in web application security.
Review the following code snippet for security vulnerabilities:
```python
def login(username, password):
query = f"SELECT * FROM users WHERE username='{username}' AND password='{password}'"
result = db.execute(query)
return result
Identify vulnerabilities and provide specific fixes.
Expected Output:
Security Vulnerabilities Identified:
-
SQL Injection (Critical)
- The code directly interpolates user input into SQL query
- Attacker can input: username=' OR '1'='1' --
- This bypasses authentication entirely
Fix:
def login(username, password): query = "SELECT * FROM users WHERE username=? AND password=?" result = db.execute(query, (username, password)) return result -
Plain-text Password Storage (Critical)
- Passwords appear to be stored and compared in plain text
- Fix: Use bcrypt or argon2 for password hashing
-
Missing Rate Limiting (Medium)
- No protection against brute force attacks
- Fix: Implement login attempt throttling
**Effective role descriptions:**
| Weak | Strong |
|------|--------|
| "You are an expert" | "You are a senior data scientist with 10 years of experience in NLP" |
| "You are helpful" | "You are a patient teacher explaining concepts to beginners" |
| "You know about X" | "You are a certified AWS solutions architect specializing in serverless" |
---
## 5. Structured Output
**When to use:** When you need parseable responses (JSON, XML, CSV).
**Pattern:**
[Task instruction]
Respond in JSON format with exactly these fields:
- field1 (type): description
- field2 (type): description
[Input]
Return ONLY valid JSON, no markdown or explanation.
**Example:**
Input:
Extract meeting details from this email.
Respond in JSON format with exactly these fields:
- date (string, ISO format): Meeting date
- time (string, 24h format): Meeting time
- attendees (array of strings): List of attendees
- topic (string): Meeting topic
- location (string or null): Meeting location if mentioned
Email: "Hi team, let's meet tomorrow at 2pm to discuss Q4 planning. Sarah, Mike, and Lisa should attend. We'll use Conference Room B."
Today's date is 2024-01-15.
Return ONLY valid JSON, no markdown or explanation.
Expected Output:
```json
{
"date": "2024-01-16",
"time": "14:00",
"attendees": ["Sarah", "Mike", "Lisa"],
"topic": "Q4 planning",
"location": "Conference Room B"
}
Format enforcement techniques:
# Strong enforcement
"Return ONLY valid JSON. Start with { and end with }"
# Schema validation hint
"The output must be valid JSON matching this TypeScript type:
type Output = { name: string; age: number; active: boolean }"
# Negative instruction
"Do NOT include markdown code blocks. Do NOT add explanations."
6. Self-Consistency
When to use: Complex reasoning where multiple valid paths exist.
Pattern:
- Generate multiple reasoning paths (temperature > 0)
- Extract final answers from each path
- Select most common answer (majority vote)
Example approach:
# Run this prompt 5 times with temperature=0.7
Solve this logic puzzle. Think through it step by step.
Three friends (Alice, Bob, Carol) each have a different pet (cat, dog, bird).
- Alice doesn't have the dog
- The person with the bird is not Carol
- Bob's pet has four legs
What pet does each person have?
Implementation:
# Pseudo-code for self-consistency
answers = []
for i in range(5):
response = llm.generate(prompt, temperature=0.7)
answer = extract_final_answer(response)
answers.append(answer)
final_answer = majority_vote(answers)
When to use:
- Math word problems
- Logic puzzles
- Any task where you can verify consistency
7. ReAct (Reasoning + Acting)
When to use: Tasks requiring external tool use or information gathering.
Pattern:
Answer the question using the available tools.
Tools available:
- search(query): Search the web
- calculate(expression): Evaluate math
- lookup(term): Look up in knowledge base
Format your response as:
Thought: [reasoning about what to do]
Action: [tool_name(arguments)]
Observation: [result from tool]
... (repeat as needed)
Thought: I now have enough information
Final Answer: [answer]
Question: [user question]
Example:
Input:
Question: What is the population of France divided by the population of Belgium?
Expected Output:
Thought: I need to find the population of France and Belgium, then calculate the ratio.
Action: search("population of France 2024")
Observation: France has a population of approximately 67.75 million.
Thought: Now I need Belgium's population.
Action: search("population of Belgium 2024")
Observation: Belgium has a population of approximately 11.7 million.
Thought: Now I can calculate the ratio.
Action: calculate(67.75 / 11.7)
Observation: 5.79
Thought: I now have enough information to answer.
Final Answer: The population of France is approximately 5.79 times the population of Belgium.
8. Tree of Thoughts
When to use: Complex problems requiring exploration of multiple solution paths.
Pattern:
Problem: [complex problem]
Generate 3 different approaches to solve this:
Approach 1: [description]
Evaluation: [pros/cons]
Approach 2: [description]
Evaluation: [pros/cons]
Approach 3: [description]
Evaluation: [pros/cons]
Best approach: [selection with reasoning]
Solution using best approach: [detailed solution]
Example:
Input:
Problem: Design a system to handle 10,000 concurrent user sessions with sub-100ms response times.
Generate 3 different architectural approaches:
Expected Output:
Approach 1: Horizontal scaling with load balancer
- Multiple stateless app servers behind load balancer
- Session state in Redis cluster
Evaluation: Simple, proven pattern. May have Redis as bottleneck.
Approach 2: Serverless with edge computing
- AWS Lambda@Edge or Cloudflare Workers
- Session state in DynamoDB Global Tables
Evaluation: Auto-scales, but cold starts may impact latency.
Approach 3: Actor model with sticky sessions
- Akka/Orleans cluster with virtual actors
- Session state co-located with actor
Evaluation: Lowest latency, but complex to operate.
Best approach: Approach 1 (Horizontal scaling)
Reasoning: Best balance of simplicity, reliability, and performance for the given requirements.
Solution: [detailed architecture follows]
9. Retrieval-Augmented Generation
When to use: Tasks requiring factual accuracy or current information.
Pattern:
Use the following context to answer the question. If the answer is not in the context, say "I don't have information about that."
Context:
[retrieved documents]
Question: [user question]
Answer based only on the context provided:
Example:
Input:
Use the following context to answer the question.
Context:
Document 1: "The Acme Widget 3000 was released in March 2024. It features
a 12-hour battery life and weighs 250 grams."
Document 2: "Pricing for the Acme Widget 3000 starts at $299 for the base
model. The Pro version with extended battery is $399."
Question: How much does the Acme Widget 3000 cost and what is its battery life?
Answer based only on the context provided:
Expected Output:
The Acme Widget 3000 starts at $299 for the base model, with a Pro version
available at $399. The standard model has a 12-hour battery life.
Chunking strategies:
| Strategy | Chunk Size | Overlap | Best For |
|---|---|---|---|
| Fixed | 500 tokens | 50 tokens | General purpose |
| Semantic | Paragraph | None | Well-structured docs |
| Sentence | 3-5 sentences | 1 sentence | Q&A systems |
10. Meta-Prompting
When to use: Generating or improving prompts programmatically.
Pattern:
You are a prompt engineering expert.
Task: [description of what the prompt should do]
Target model: [GPT-4/Claude/etc.]
Constraints: [length limits, format requirements]
Generate an optimized prompt for this task.
Example:
Input:
You are a prompt engineering expert.
Task: Create a prompt that extracts action items from meeting notes
Target model: GPT-4
Constraints:
- Output must be valid JSON
- Each action item needs: task, owner, due_date
- Handle cases where information is missing
Generate an optimized prompt for this task.
Expected Output:
Extract action items from the following meeting notes.
For each action item, identify:
- task: The specific action to be taken
- owner: Person responsible (use "Unassigned" if not mentioned)
- due_date: Deadline in YYYY-MM-DD format (use null if not specified)
Meeting Notes:
{meeting_notes}
Respond with a JSON array. Example format:
[
{"task": "Review proposal", "owner": "Sarah", "due_date": "2024-01-20"},
{"task": "Send update", "owner": "Unassigned", "due_date": null}
]
Return ONLY the JSON array, no additional text.
Pattern Selection Guide
| Task Type | Recommended Pattern |
|---|---|
| Simple classification | Zero-shot |
| Consistent formatting needed | Few-shot |
| Math/logic problems | Chain-of-Thought |
| Need expertise/perspective | Role Prompting |
| API integration | Structured Output |
| High-stakes decisions | Self-Consistency |
| Tool use required | ReAct |
| Complex problem solving | Tree of Thoughts |
| Factual Q&A | RAG |
| Prompt generation | Meta-Prompting |