feat(bundles): add editorial bundle plugins
This commit is contained in:
@@ -0,0 +1,33 @@
|
||||
{
|
||||
"name": "antigravity-bundle-llm-application-developer",
|
||||
"version": "8.10.0",
|
||||
"description": "Install the \"LLM Application Developer\" editorial skill bundle from Antigravity Awesome Skills.",
|
||||
"author": {
|
||||
"name": "sickn33 and contributors",
|
||||
"url": "https://github.com/sickn33/antigravity-awesome-skills"
|
||||
},
|
||||
"homepage": "https://github.com/sickn33/antigravity-awesome-skills",
|
||||
"repository": "https://github.com/sickn33/antigravity-awesome-skills",
|
||||
"license": "MIT",
|
||||
"keywords": [
|
||||
"codex",
|
||||
"skills",
|
||||
"bundle",
|
||||
"llm-application-developer",
|
||||
"productivity"
|
||||
],
|
||||
"skills": "./skills/",
|
||||
"interface": {
|
||||
"displayName": "LLM Application Developer",
|
||||
"shortDescription": "AI & Agents · 5 curated skills",
|
||||
"longDescription": "For building production LLM applications. Covers LLM App Patterns, RAG Implementation, and 3 more skills.",
|
||||
"developerName": "sickn33 and contributors",
|
||||
"category": "AI & Agents",
|
||||
"capabilities": [
|
||||
"Interactive",
|
||||
"Write"
|
||||
],
|
||||
"websiteURL": "https://github.com/sickn33/antigravity-awesome-skills",
|
||||
"brandColor": "#111827"
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,58 @@
|
||||
---
|
||||
name: context-window-management
|
||||
description: "You're a context engineering specialist who has optimized LLM applications handling millions of conversations. You've seen systems hit token limits, suffer context rot, and lose critical information mid-dialogue."
|
||||
risk: unknown
|
||||
source: "vibeship-spawner-skills (Apache 2.0)"
|
||||
date_added: "2026-02-27"
|
||||
---
|
||||
|
||||
# Context Window Management
|
||||
|
||||
You're a context engineering specialist who has optimized LLM applications handling
|
||||
millions of conversations. You've seen systems hit token limits, suffer context rot,
|
||||
and lose critical information mid-dialogue.
|
||||
|
||||
You understand that context is a finite resource with diminishing returns. More tokens
|
||||
doesn't mean better results—the art is in curating the right information. You know
|
||||
the serial position effect, the lost-in-the-middle problem, and when to summarize
|
||||
versus when to retrieve.
|
||||
|
||||
Your cor
|
||||
|
||||
## Capabilities
|
||||
|
||||
- context-engineering
|
||||
- context-summarization
|
||||
- context-trimming
|
||||
- context-routing
|
||||
- token-counting
|
||||
- context-prioritization
|
||||
|
||||
## Patterns
|
||||
|
||||
### Tiered Context Strategy
|
||||
|
||||
Different strategies based on context size
|
||||
|
||||
### Serial Position Optimization
|
||||
|
||||
Place important content at start and end
|
||||
|
||||
### Intelligent Summarization
|
||||
|
||||
Summarize by importance, not just recency
|
||||
|
||||
## Anti-Patterns
|
||||
|
||||
### ❌ Naive Truncation
|
||||
|
||||
### ❌ Ignoring Token Costs
|
||||
|
||||
### ❌ One-Size-Fits-All
|
||||
|
||||
## Related Skills
|
||||
|
||||
Works well with: `rag-implementation`, `conversation-memory`, `prompt-caching`, `llm-npc-dialogue`
|
||||
|
||||
## When to Use
|
||||
This skill is applicable to execute the workflow or actions described in the overview.
|
||||
@@ -0,0 +1,243 @@
|
||||
---
|
||||
name: langfuse
|
||||
description: "You are an expert in LLM observability and evaluation. You think in terms of traces, spans, and metrics. You know that LLM applications need monitoring just like traditional software - but with different dimensions (cost, quality, latency)."
|
||||
risk: unknown
|
||||
source: "vibeship-spawner-skills (Apache 2.0)"
|
||||
date_added: "2026-02-27"
|
||||
---
|
||||
|
||||
# Langfuse
|
||||
|
||||
**Role**: LLM Observability Architect
|
||||
|
||||
You are an expert in LLM observability and evaluation. You think in terms of
|
||||
traces, spans, and metrics. You know that LLM applications need monitoring
|
||||
just like traditional software - but with different dimensions (cost, quality,
|
||||
latency). You use data to drive prompt improvements and catch regressions.
|
||||
|
||||
## Capabilities
|
||||
|
||||
- LLM tracing and observability
|
||||
- Prompt management and versioning
|
||||
- Evaluation and scoring
|
||||
- Dataset management
|
||||
- Cost tracking
|
||||
- Performance monitoring
|
||||
- A/B testing prompts
|
||||
|
||||
## Requirements
|
||||
|
||||
- Python or TypeScript/JavaScript
|
||||
- Langfuse account (cloud or self-hosted)
|
||||
- LLM API keys
|
||||
|
||||
## Patterns
|
||||
|
||||
### Basic Tracing Setup
|
||||
|
||||
Instrument LLM calls with Langfuse
|
||||
|
||||
**When to use**: Any LLM application
|
||||
|
||||
```python
|
||||
from langfuse import Langfuse
|
||||
|
||||
# Initialize client
|
||||
langfuse = Langfuse(
|
||||
public_key="pk-...",
|
||||
secret_key="sk-...",
|
||||
host="https://cloud.langfuse.com" # or self-hosted URL
|
||||
)
|
||||
|
||||
# Create a trace for a user request
|
||||
trace = langfuse.trace(
|
||||
name="chat-completion",
|
||||
user_id="user-123",
|
||||
session_id="session-456", # Groups related traces
|
||||
metadata={"feature": "customer-support"},
|
||||
tags=["production", "v2"]
|
||||
)
|
||||
|
||||
# Log a generation (LLM call)
|
||||
generation = trace.generation(
|
||||
name="gpt-4o-response",
|
||||
model="gpt-4o",
|
||||
model_parameters={"temperature": 0.7},
|
||||
input={"messages": [{"role": "user", "content": "Hello"}]},
|
||||
metadata={"attempt": 1}
|
||||
)
|
||||
|
||||
# Make actual LLM call
|
||||
response = openai.chat.completions.create(
|
||||
model="gpt-4o",
|
||||
messages=[{"role": "user", "content": "Hello"}]
|
||||
)
|
||||
|
||||
# Complete the generation with output
|
||||
generation.end(
|
||||
output=response.choices[0].message.content,
|
||||
usage={
|
||||
"input": response.usage.prompt_tokens,
|
||||
"output": response.usage.completion_tokens
|
||||
}
|
||||
)
|
||||
|
||||
# Score the trace
|
||||
trace.score(
|
||||
name="user-feedback",
|
||||
value=1, # 1 = positive, 0 = negative
|
||||
comment="User clicked helpful"
|
||||
)
|
||||
|
||||
# Flush before exit (important in serverless)
|
||||
langfuse.flush()
|
||||
```
|
||||
|
||||
### OpenAI Integration
|
||||
|
||||
Automatic tracing with OpenAI SDK
|
||||
|
||||
**When to use**: OpenAI-based applications
|
||||
|
||||
```python
|
||||
from langfuse.openai import openai
|
||||
|
||||
# Drop-in replacement for OpenAI client
|
||||
# All calls automatically traced
|
||||
|
||||
response = openai.chat.completions.create(
|
||||
model="gpt-4o",
|
||||
messages=[{"role": "user", "content": "Hello"}],
|
||||
# Langfuse-specific parameters
|
||||
name="greeting", # Trace name
|
||||
session_id="session-123",
|
||||
user_id="user-456",
|
||||
tags=["test"],
|
||||
metadata={"feature": "chat"}
|
||||
)
|
||||
|
||||
# Works with streaming
|
||||
stream = openai.chat.completions.create(
|
||||
model="gpt-4o",
|
||||
messages=[{"role": "user", "content": "Tell me a story"}],
|
||||
stream=True,
|
||||
name="story-generation"
|
||||
)
|
||||
|
||||
for chunk in stream:
|
||||
print(chunk.choices[0].delta.content, end="")
|
||||
|
||||
# Works with async
|
||||
import asyncio
|
||||
from langfuse.openai import AsyncOpenAI
|
||||
|
||||
async_client = AsyncOpenAI()
|
||||
|
||||
async def main():
|
||||
response = await async_client.chat.completions.create(
|
||||
model="gpt-4o",
|
||||
messages=[{"role": "user", "content": "Hello"}],
|
||||
name="async-greeting"
|
||||
)
|
||||
```
|
||||
|
||||
### LangChain Integration
|
||||
|
||||
Trace LangChain applications
|
||||
|
||||
**When to use**: LangChain-based applications
|
||||
|
||||
```python
|
||||
from langchain_openai import ChatOpenAI
|
||||
from langchain_core.prompts import ChatPromptTemplate
|
||||
from langfuse.callback import CallbackHandler
|
||||
|
||||
# Create Langfuse callback handler
|
||||
langfuse_handler = CallbackHandler(
|
||||
public_key="pk-...",
|
||||
secret_key="sk-...",
|
||||
host="https://cloud.langfuse.com",
|
||||
session_id="session-123",
|
||||
user_id="user-456"
|
||||
)
|
||||
|
||||
# Use with any LangChain component
|
||||
llm = ChatOpenAI(model="gpt-4o")
|
||||
|
||||
prompt = ChatPromptTemplate.from_messages([
|
||||
("system", "You are a helpful assistant."),
|
||||
("user", "{input}")
|
||||
])
|
||||
|
||||
chain = prompt | llm
|
||||
|
||||
# Pass handler to invoke
|
||||
response = chain.invoke(
|
||||
{"input": "Hello"},
|
||||
config={"callbacks": [langfuse_handler]}
|
||||
)
|
||||
|
||||
# Or set as default
|
||||
import langchain
|
||||
langchain.callbacks.manager.set_handler(langfuse_handler)
|
||||
|
||||
# Then all calls are traced
|
||||
response = chain.invoke({"input": "Hello"})
|
||||
|
||||
# Works with agents, retrievers, etc.
|
||||
from langchain.agents import create_openai_tools_agent
|
||||
|
||||
agent = create_openai_tools_agent(llm, tools, prompt)
|
||||
agent_executor = AgentExecutor(agent=agent, tools=tools)
|
||||
|
||||
result = agent_executor.invoke(
|
||||
{"input": "What's the weather?"},
|
||||
config={"callbacks": [langfuse_handler]}
|
||||
)
|
||||
```
|
||||
|
||||
## Anti-Patterns
|
||||
|
||||
### ❌ Not Flushing in Serverless
|
||||
|
||||
**Why bad**: Traces are batched.
|
||||
Serverless may exit before flush.
|
||||
Data is lost.
|
||||
|
||||
**Instead**: Always call langfuse.flush() at end.
|
||||
Use context managers where available.
|
||||
Consider sync mode for critical traces.
|
||||
|
||||
### ❌ Tracing Everything
|
||||
|
||||
**Why bad**: Noisy traces.
|
||||
Performance overhead.
|
||||
Hard to find important info.
|
||||
|
||||
**Instead**: Focus on: LLM calls, key logic, user actions.
|
||||
Group related operations.
|
||||
Use meaningful span names.
|
||||
|
||||
### ❌ No User/Session IDs
|
||||
|
||||
**Why bad**: Can't debug specific users.
|
||||
Can't track sessions.
|
||||
Analytics limited.
|
||||
|
||||
**Instead**: Always pass user_id and session_id.
|
||||
Use consistent identifiers.
|
||||
Add relevant metadata.
|
||||
|
||||
## Limitations
|
||||
|
||||
- Self-hosted requires infrastructure
|
||||
- High-volume may need optimization
|
||||
- Real-time dashboard has latency
|
||||
- Evaluation requires setup
|
||||
|
||||
## Related Skills
|
||||
|
||||
Works well with: `langgraph`, `crewai`, `structured-output`, `autonomous-agents`
|
||||
|
||||
## When to Use
|
||||
This skill is applicable to execute the workflow or actions described in the overview.
|
||||
@@ -0,0 +1,763 @@
|
||||
---
|
||||
name: llm-app-patterns
|
||||
description: "Production-ready patterns for building LLM applications, inspired by [Dify](https://github.com/langgenius/dify) and industry best practices."
|
||||
risk: unknown
|
||||
source: community
|
||||
date_added: "2026-02-27"
|
||||
---
|
||||
|
||||
# 🤖 LLM Application Patterns
|
||||
|
||||
> Production-ready patterns for building LLM applications, inspired by [Dify](https://github.com/langgenius/dify) and industry best practices.
|
||||
|
||||
## When to Use This Skill
|
||||
|
||||
Use this skill when:
|
||||
|
||||
- Designing LLM-powered applications
|
||||
- Implementing RAG (Retrieval-Augmented Generation)
|
||||
- Building AI agents with tools
|
||||
- Setting up LLMOps monitoring
|
||||
- Choosing between agent architectures
|
||||
|
||||
---
|
||||
|
||||
## 1. RAG Pipeline Architecture
|
||||
|
||||
### Overview
|
||||
|
||||
RAG (Retrieval-Augmented Generation) grounds LLM responses in your data.
|
||||
|
||||
```
|
||||
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
|
||||
│ Ingest │────▶│ Retrieve │────▶│ Generate │
|
||||
│ Documents │ │ Context │ │ Response │
|
||||
└─────────────┘ └─────────────┘ └─────────────┘
|
||||
│ │ │
|
||||
▼ ▼ ▼
|
||||
┌─────────┐ ┌───────────┐ ┌───────────┐
|
||||
│ Chunking│ │ Vector │ │ LLM │
|
||||
│Embedding│ │ Search │ │ + Context│
|
||||
└─────────┘ └───────────┘ └───────────┘
|
||||
```
|
||||
|
||||
### 1.1 Document Ingestion
|
||||
|
||||
```python
|
||||
# Chunking strategies
|
||||
class ChunkingStrategy:
|
||||
# Fixed-size chunks (simple but may break context)
|
||||
FIXED_SIZE = "fixed_size" # e.g., 512 tokens
|
||||
|
||||
# Semantic chunking (preserves meaning)
|
||||
SEMANTIC = "semantic" # Split on paragraphs/sections
|
||||
|
||||
# Recursive splitting (tries multiple separators)
|
||||
RECURSIVE = "recursive" # ["\n\n", "\n", " ", ""]
|
||||
|
||||
# Document-aware (respects structure)
|
||||
DOCUMENT_AWARE = "document_aware" # Headers, lists, etc.
|
||||
|
||||
# Recommended settings
|
||||
CHUNK_CONFIG = {
|
||||
"chunk_size": 512, # tokens
|
||||
"chunk_overlap": 50, # token overlap between chunks
|
||||
"separators": ["\n\n", "\n", ". ", " "],
|
||||
}
|
||||
```
|
||||
|
||||
### 1.2 Embedding & Storage
|
||||
|
||||
```python
|
||||
# Vector database selection
|
||||
VECTOR_DB_OPTIONS = {
|
||||
"pinecone": {
|
||||
"use_case": "Production, managed service",
|
||||
"scale": "Billions of vectors",
|
||||
"features": ["Hybrid search", "Metadata filtering"]
|
||||
},
|
||||
"weaviate": {
|
||||
"use_case": "Self-hosted, multi-modal",
|
||||
"scale": "Millions of vectors",
|
||||
"features": ["GraphQL API", "Modules"]
|
||||
},
|
||||
"chromadb": {
|
||||
"use_case": "Development, prototyping",
|
||||
"scale": "Thousands of vectors",
|
||||
"features": ["Simple API", "In-memory option"]
|
||||
},
|
||||
"pgvector": {
|
||||
"use_case": "Existing Postgres infrastructure",
|
||||
"scale": "Millions of vectors",
|
||||
"features": ["SQL integration", "ACID compliance"]
|
||||
}
|
||||
}
|
||||
|
||||
# Embedding model selection
|
||||
EMBEDDING_MODELS = {
|
||||
"openai/text-embedding-3-small": {
|
||||
"dimensions": 1536,
|
||||
"cost": "$0.02/1M tokens",
|
||||
"quality": "Good for most use cases"
|
||||
},
|
||||
"openai/text-embedding-3-large": {
|
||||
"dimensions": 3072,
|
||||
"cost": "$0.13/1M tokens",
|
||||
"quality": "Best for complex queries"
|
||||
},
|
||||
"local/bge-large": {
|
||||
"dimensions": 1024,
|
||||
"cost": "Free (compute only)",
|
||||
"quality": "Comparable to OpenAI small"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 1.3 Retrieval Strategies
|
||||
|
||||
```python
|
||||
# Basic semantic search
|
||||
def semantic_search(query: str, top_k: int = 5):
|
||||
query_embedding = embed(query)
|
||||
results = vector_db.similarity_search(
|
||||
query_embedding,
|
||||
top_k=top_k
|
||||
)
|
||||
return results
|
||||
|
||||
# Hybrid search (semantic + keyword)
|
||||
def hybrid_search(query: str, top_k: int = 5, alpha: float = 0.5):
|
||||
"""
|
||||
alpha=1.0: Pure semantic
|
||||
alpha=0.0: Pure keyword (BM25)
|
||||
alpha=0.5: Balanced
|
||||
"""
|
||||
semantic_results = vector_db.similarity_search(query)
|
||||
keyword_results = bm25_search(query)
|
||||
|
||||
# Reciprocal Rank Fusion
|
||||
return rrf_merge(semantic_results, keyword_results, alpha)
|
||||
|
||||
# Multi-query retrieval
|
||||
def multi_query_retrieval(query: str):
|
||||
"""Generate multiple query variations for better recall"""
|
||||
queries = llm.generate_query_variations(query, n=3)
|
||||
all_results = []
|
||||
for q in queries:
|
||||
all_results.extend(semantic_search(q))
|
||||
return deduplicate(all_results)
|
||||
|
||||
# Contextual compression
|
||||
def compressed_retrieval(query: str):
|
||||
"""Retrieve then compress to relevant parts only"""
|
||||
docs = semantic_search(query, top_k=10)
|
||||
compressed = llm.extract_relevant_parts(docs, query)
|
||||
return compressed
|
||||
```
|
||||
|
||||
### 1.4 Generation with Context
|
||||
|
||||
```python
|
||||
RAG_PROMPT_TEMPLATE = """
|
||||
Answer the user's question based ONLY on the following context.
|
||||
If the context doesn't contain enough information, say "I don't have enough information to answer that."
|
||||
|
||||
Context:
|
||||
{context}
|
||||
|
||||
Question: {question}
|
||||
|
||||
Answer:"""
|
||||
|
||||
def generate_with_rag(question: str):
|
||||
# Retrieve
|
||||
context_docs = hybrid_search(question, top_k=5)
|
||||
context = "\n\n".join([doc.content for doc in context_docs])
|
||||
|
||||
# Generate
|
||||
prompt = RAG_PROMPT_TEMPLATE.format(
|
||||
context=context,
|
||||
question=question
|
||||
)
|
||||
|
||||
response = llm.generate(prompt)
|
||||
|
||||
# Return with citations
|
||||
return {
|
||||
"answer": response,
|
||||
"sources": [doc.metadata for doc in context_docs]
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 2. Agent Architectures
|
||||
|
||||
### 2.1 ReAct Pattern (Reasoning + Acting)
|
||||
|
||||
```
|
||||
Thought: I need to search for information about X
|
||||
Action: search("X")
|
||||
Observation: [search results]
|
||||
Thought: Based on the results, I should...
|
||||
Action: calculate(...)
|
||||
Observation: [calculation result]
|
||||
Thought: I now have enough information
|
||||
Action: final_answer("The answer is...")
|
||||
```
|
||||
|
||||
```python
|
||||
REACT_PROMPT = """
|
||||
You are an AI assistant that can use tools to answer questions.
|
||||
|
||||
Available tools:
|
||||
{tools_description}
|
||||
|
||||
Use this format:
|
||||
Thought: [your reasoning about what to do next]
|
||||
Action: [tool_name(arguments)]
|
||||
Observation: [tool result - this will be filled in]
|
||||
... (repeat Thought/Action/Observation as needed)
|
||||
Thought: I have enough information to answer
|
||||
Final Answer: [your final response]
|
||||
|
||||
Question: {question}
|
||||
"""
|
||||
|
||||
class ReActAgent:
|
||||
def __init__(self, tools: list, llm):
|
||||
self.tools = {t.name: t for t in tools}
|
||||
self.llm = llm
|
||||
self.max_iterations = 10
|
||||
|
||||
def run(self, question: str) -> str:
|
||||
prompt = REACT_PROMPT.format(
|
||||
tools_description=self._format_tools(),
|
||||
question=question
|
||||
)
|
||||
|
||||
for _ in range(self.max_iterations):
|
||||
response = self.llm.generate(prompt)
|
||||
|
||||
if "Final Answer:" in response:
|
||||
return self._extract_final_answer(response)
|
||||
|
||||
action = self._parse_action(response)
|
||||
observation = self._execute_tool(action)
|
||||
prompt += f"\nObservation: {observation}\n"
|
||||
|
||||
return "Max iterations reached"
|
||||
```
|
||||
|
||||
### 2.2 Function Calling Pattern
|
||||
|
||||
```python
|
||||
# Define tools as functions with schemas
|
||||
TOOLS = [
|
||||
{
|
||||
"name": "search_web",
|
||||
"description": "Search the web for current information",
|
||||
"parameters": {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"query": {
|
||||
"type": "string",
|
||||
"description": "Search query"
|
||||
}
|
||||
},
|
||||
"required": ["query"]
|
||||
}
|
||||
},
|
||||
{
|
||||
"name": "calculate",
|
||||
"description": "Perform mathematical calculations",
|
||||
"parameters": {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"expression": {
|
||||
"type": "string",
|
||||
"description": "Math expression to evaluate"
|
||||
}
|
||||
},
|
||||
"required": ["expression"]
|
||||
}
|
||||
}
|
||||
]
|
||||
|
||||
class FunctionCallingAgent:
|
||||
def run(self, question: str) -> str:
|
||||
messages = [{"role": "user", "content": question}]
|
||||
|
||||
while True:
|
||||
response = self.llm.chat(
|
||||
messages=messages,
|
||||
tools=TOOLS,
|
||||
tool_choice="auto"
|
||||
)
|
||||
|
||||
if response.tool_calls:
|
||||
for tool_call in response.tool_calls:
|
||||
result = self._execute_tool(
|
||||
tool_call.name,
|
||||
tool_call.arguments
|
||||
)
|
||||
messages.append({
|
||||
"role": "tool",
|
||||
"tool_call_id": tool_call.id,
|
||||
"content": str(result)
|
||||
})
|
||||
else:
|
||||
return response.content
|
||||
```
|
||||
|
||||
### 2.3 Plan-and-Execute Pattern
|
||||
|
||||
```python
|
||||
class PlanAndExecuteAgent:
|
||||
"""
|
||||
1. Create a plan (list of steps)
|
||||
2. Execute each step
|
||||
3. Replan if needed
|
||||
"""
|
||||
|
||||
def run(self, task: str) -> str:
|
||||
# Planning phase
|
||||
plan = self.planner.create_plan(task)
|
||||
# Returns: ["Step 1: ...", "Step 2: ...", ...]
|
||||
|
||||
results = []
|
||||
for step in plan:
|
||||
# Execute each step
|
||||
result = self.executor.execute(step, context=results)
|
||||
results.append(result)
|
||||
|
||||
# Check if replan needed
|
||||
if self._needs_replan(task, results):
|
||||
new_plan = self.planner.replan(
|
||||
task,
|
||||
completed=results,
|
||||
remaining=plan[len(results):]
|
||||
)
|
||||
plan = new_plan
|
||||
|
||||
# Synthesize final answer
|
||||
return self.synthesizer.summarize(task, results)
|
||||
```
|
||||
|
||||
### 2.4 Multi-Agent Collaboration
|
||||
|
||||
```python
|
||||
class AgentTeam:
|
||||
"""
|
||||
Specialized agents collaborating on complex tasks
|
||||
"""
|
||||
|
||||
def __init__(self):
|
||||
self.agents = {
|
||||
"researcher": ResearchAgent(),
|
||||
"analyst": AnalystAgent(),
|
||||
"writer": WriterAgent(),
|
||||
"critic": CriticAgent()
|
||||
}
|
||||
self.coordinator = CoordinatorAgent()
|
||||
|
||||
def solve(self, task: str) -> str:
|
||||
# Coordinator assigns subtasks
|
||||
assignments = self.coordinator.decompose(task)
|
||||
|
||||
results = {}
|
||||
for assignment in assignments:
|
||||
agent = self.agents[assignment.agent]
|
||||
result = agent.execute(
|
||||
assignment.subtask,
|
||||
context=results
|
||||
)
|
||||
results[assignment.id] = result
|
||||
|
||||
# Critic reviews
|
||||
critique = self.agents["critic"].review(results)
|
||||
|
||||
if critique.needs_revision:
|
||||
# Iterate with feedback
|
||||
return self.solve_with_feedback(task, results, critique)
|
||||
|
||||
return self.coordinator.synthesize(results)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 3. Prompt IDE Patterns
|
||||
|
||||
### 3.1 Prompt Templates with Variables
|
||||
|
||||
```python
|
||||
class PromptTemplate:
|
||||
def __init__(self, template: str, variables: list[str]):
|
||||
self.template = template
|
||||
self.variables = variables
|
||||
|
||||
def format(self, **kwargs) -> str:
|
||||
# Validate all variables provided
|
||||
missing = set(self.variables) - set(kwargs.keys())
|
||||
if missing:
|
||||
raise ValueError(f"Missing variables: {missing}")
|
||||
|
||||
return self.template.format(**kwargs)
|
||||
|
||||
def with_examples(self, examples: list[dict]) -> str:
|
||||
"""Add few-shot examples"""
|
||||
example_text = "\n\n".join([
|
||||
f"Input: {ex['input']}\nOutput: {ex['output']}"
|
||||
for ex in examples
|
||||
])
|
||||
return f"{example_text}\n\n{self.template}"
|
||||
|
||||
# Usage
|
||||
summarizer = PromptTemplate(
|
||||
template="Summarize the following text in {style} style:\n\n{text}",
|
||||
variables=["style", "text"]
|
||||
)
|
||||
|
||||
prompt = summarizer.format(
|
||||
style="professional",
|
||||
text="Long article content..."
|
||||
)
|
||||
```
|
||||
|
||||
### 3.2 Prompt Versioning & A/B Testing
|
||||
|
||||
```python
|
||||
class PromptRegistry:
|
||||
def __init__(self, db):
|
||||
self.db = db
|
||||
|
||||
def register(self, name: str, template: str, version: str):
|
||||
"""Store prompt with version"""
|
||||
self.db.save({
|
||||
"name": name,
|
||||
"template": template,
|
||||
"version": version,
|
||||
"created_at": datetime.now(),
|
||||
"metrics": {}
|
||||
})
|
||||
|
||||
def get(self, name: str, version: str = "latest") -> str:
|
||||
"""Retrieve specific version"""
|
||||
return self.db.get(name, version)
|
||||
|
||||
def ab_test(self, name: str, user_id: str) -> str:
|
||||
"""Return variant based on user bucket"""
|
||||
variants = self.db.get_all_versions(name)
|
||||
bucket = hash(user_id) % len(variants)
|
||||
return variants[bucket]
|
||||
|
||||
def record_outcome(self, prompt_id: str, outcome: dict):
|
||||
"""Track prompt performance"""
|
||||
self.db.update_metrics(prompt_id, outcome)
|
||||
```
|
||||
|
||||
### 3.3 Prompt Chaining
|
||||
|
||||
```python
|
||||
class PromptChain:
|
||||
"""
|
||||
Chain prompts together, passing output as input to next
|
||||
"""
|
||||
|
||||
def __init__(self, steps: list[dict]):
|
||||
self.steps = steps
|
||||
|
||||
def run(self, initial_input: str) -> dict:
|
||||
context = {"input": initial_input}
|
||||
results = []
|
||||
|
||||
for step in self.steps:
|
||||
prompt = step["prompt"].format(**context)
|
||||
output = llm.generate(prompt)
|
||||
|
||||
# Parse output if needed
|
||||
if step.get("parser"):
|
||||
output = step"parser"
|
||||
|
||||
context[step["output_key"]] = output
|
||||
results.append({
|
||||
"step": step["name"],
|
||||
"output": output
|
||||
})
|
||||
|
||||
return {
|
||||
"final_output": context[self.steps[-1]["output_key"]],
|
||||
"intermediate_results": results
|
||||
}
|
||||
|
||||
# Example: Research → Analyze → Summarize
|
||||
chain = PromptChain([
|
||||
{
|
||||
"name": "research",
|
||||
"prompt": "Research the topic: {input}",
|
||||
"output_key": "research"
|
||||
},
|
||||
{
|
||||
"name": "analyze",
|
||||
"prompt": "Analyze these findings:\n{research}",
|
||||
"output_key": "analysis"
|
||||
},
|
||||
{
|
||||
"name": "summarize",
|
||||
"prompt": "Summarize this analysis in 3 bullet points:\n{analysis}",
|
||||
"output_key": "summary"
|
||||
}
|
||||
])
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 4. LLMOps & Observability
|
||||
|
||||
### 4.1 Metrics to Track
|
||||
|
||||
```python
|
||||
LLM_METRICS = {
|
||||
# Performance
|
||||
"latency_p50": "50th percentile response time",
|
||||
"latency_p99": "99th percentile response time",
|
||||
"tokens_per_second": "Generation speed",
|
||||
|
||||
# Quality
|
||||
"user_satisfaction": "Thumbs up/down ratio",
|
||||
"task_completion": "% tasks completed successfully",
|
||||
"hallucination_rate": "% responses with factual errors",
|
||||
|
||||
# Cost
|
||||
"cost_per_request": "Average $ per API call",
|
||||
"tokens_per_request": "Average tokens used",
|
||||
"cache_hit_rate": "% requests served from cache",
|
||||
|
||||
# Reliability
|
||||
"error_rate": "% failed requests",
|
||||
"timeout_rate": "% requests that timed out",
|
||||
"retry_rate": "% requests needing retry"
|
||||
}
|
||||
```
|
||||
|
||||
### 4.2 Logging & Tracing
|
||||
|
||||
```python
|
||||
import logging
|
||||
from opentelemetry import trace
|
||||
|
||||
tracer = trace.get_tracer(__name__)
|
||||
|
||||
class LLMLogger:
|
||||
def log_request(self, request_id: str, data: dict):
|
||||
"""Log LLM request for debugging and analysis"""
|
||||
log_entry = {
|
||||
"request_id": request_id,
|
||||
"timestamp": datetime.now().isoformat(),
|
||||
"model": data["model"],
|
||||
"prompt": data["prompt"][:500], # Truncate for storage
|
||||
"prompt_tokens": data["prompt_tokens"],
|
||||
"temperature": data.get("temperature", 1.0),
|
||||
"user_id": data.get("user_id"),
|
||||
}
|
||||
logging.info(f"LLM_REQUEST: {json.dumps(log_entry)}")
|
||||
|
||||
def log_response(self, request_id: str, data: dict):
|
||||
"""Log LLM response"""
|
||||
log_entry = {
|
||||
"request_id": request_id,
|
||||
"completion_tokens": data["completion_tokens"],
|
||||
"total_tokens": data["total_tokens"],
|
||||
"latency_ms": data["latency_ms"],
|
||||
"finish_reason": data["finish_reason"],
|
||||
"cost_usd": self._calculate_cost(data),
|
||||
}
|
||||
logging.info(f"LLM_RESPONSE: {json.dumps(log_entry)}")
|
||||
|
||||
# Distributed tracing
|
||||
@tracer.start_as_current_span("llm_call")
|
||||
def call_llm(prompt: str) -> str:
|
||||
span = trace.get_current_span()
|
||||
span.set_attribute("prompt.length", len(prompt))
|
||||
|
||||
response = llm.generate(prompt)
|
||||
|
||||
span.set_attribute("response.length", len(response))
|
||||
span.set_attribute("tokens.total", response.usage.total_tokens)
|
||||
|
||||
return response.content
|
||||
```
|
||||
|
||||
### 4.3 Evaluation Framework
|
||||
|
||||
```python
|
||||
class LLMEvaluator:
|
||||
"""
|
||||
Evaluate LLM outputs for quality
|
||||
"""
|
||||
|
||||
def evaluate_response(self,
|
||||
question: str,
|
||||
response: str,
|
||||
ground_truth: str = None) -> dict:
|
||||
scores = {}
|
||||
|
||||
# Relevance: Does it answer the question?
|
||||
scores["relevance"] = self._score_relevance(question, response)
|
||||
|
||||
# Coherence: Is it well-structured?
|
||||
scores["coherence"] = self._score_coherence(response)
|
||||
|
||||
# Groundedness: Is it based on provided context?
|
||||
scores["groundedness"] = self._score_groundedness(response)
|
||||
|
||||
# Accuracy: Does it match ground truth?
|
||||
if ground_truth:
|
||||
scores["accuracy"] = self._score_accuracy(response, ground_truth)
|
||||
|
||||
# Harmfulness: Is it safe?
|
||||
scores["safety"] = self._score_safety(response)
|
||||
|
||||
return scores
|
||||
|
||||
def run_benchmark(self, test_cases: list[dict]) -> dict:
|
||||
"""Run evaluation on test set"""
|
||||
results = []
|
||||
for case in test_cases:
|
||||
response = llm.generate(case["prompt"])
|
||||
scores = self.evaluate_response(
|
||||
question=case["prompt"],
|
||||
response=response,
|
||||
ground_truth=case.get("expected")
|
||||
)
|
||||
results.append(scores)
|
||||
|
||||
return self._aggregate_scores(results)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 5. Production Patterns
|
||||
|
||||
### 5.1 Caching Strategy
|
||||
|
||||
```python
|
||||
import hashlib
|
||||
from functools import lru_cache
|
||||
|
||||
class LLMCache:
|
||||
def __init__(self, redis_client, ttl_seconds=3600):
|
||||
self.redis = redis_client
|
||||
self.ttl = ttl_seconds
|
||||
|
||||
def _cache_key(self, prompt: str, model: str, **kwargs) -> str:
|
||||
"""Generate deterministic cache key"""
|
||||
content = f"{model}:{prompt}:{json.dumps(kwargs, sort_keys=True)}"
|
||||
return hashlib.sha256(content.encode()).hexdigest()
|
||||
|
||||
def get_or_generate(self, prompt: str, model: str, **kwargs) -> str:
|
||||
key = self._cache_key(prompt, model, **kwargs)
|
||||
|
||||
# Check cache
|
||||
cached = self.redis.get(key)
|
||||
if cached:
|
||||
return cached.decode()
|
||||
|
||||
# Generate
|
||||
response = llm.generate(prompt, model=model, **kwargs)
|
||||
|
||||
# Cache (only cache deterministic outputs)
|
||||
if kwargs.get("temperature", 1.0) == 0:
|
||||
self.redis.setex(key, self.ttl, response)
|
||||
|
||||
return response
|
||||
```
|
||||
|
||||
### 5.2 Rate Limiting & Retry
|
||||
|
||||
```python
|
||||
import time
|
||||
from tenacity import retry, wait_exponential, stop_after_attempt
|
||||
|
||||
class RateLimiter:
|
||||
def __init__(self, requests_per_minute: int):
|
||||
self.rpm = requests_per_minute
|
||||
self.timestamps = []
|
||||
|
||||
def acquire(self):
|
||||
"""Wait if rate limit would be exceeded"""
|
||||
now = time.time()
|
||||
|
||||
# Remove old timestamps
|
||||
self.timestamps = [t for t in self.timestamps if now - t < 60]
|
||||
|
||||
if len(self.timestamps) >= self.rpm:
|
||||
sleep_time = 60 - (now - self.timestamps[0])
|
||||
time.sleep(sleep_time)
|
||||
|
||||
self.timestamps.append(time.time())
|
||||
|
||||
# Retry with exponential backoff
|
||||
@retry(
|
||||
wait=wait_exponential(multiplier=1, min=4, max=60),
|
||||
stop=stop_after_attempt(5)
|
||||
)
|
||||
def call_llm_with_retry(prompt: str) -> str:
|
||||
try:
|
||||
return llm.generate(prompt)
|
||||
except RateLimitError:
|
||||
raise # Will trigger retry
|
||||
except APIError as e:
|
||||
if e.status_code >= 500:
|
||||
raise # Retry server errors
|
||||
raise # Don't retry client errors
|
||||
```
|
||||
|
||||
### 5.3 Fallback Strategy
|
||||
|
||||
```python
|
||||
class LLMWithFallback:
|
||||
def __init__(self, primary: str, fallbacks: list[str]):
|
||||
self.primary = primary
|
||||
self.fallbacks = fallbacks
|
||||
|
||||
def generate(self, prompt: str, **kwargs) -> str:
|
||||
models = [self.primary] + self.fallbacks
|
||||
|
||||
for model in models:
|
||||
try:
|
||||
return llm.generate(prompt, model=model, **kwargs)
|
||||
except (RateLimitError, APIError) as e:
|
||||
logging.warning(f"Model {model} failed: {e}")
|
||||
continue
|
||||
|
||||
raise AllModelsFailedError("All models exhausted")
|
||||
|
||||
# Usage
|
||||
llm_client = LLMWithFallback(
|
||||
primary="gpt-4-turbo",
|
||||
fallbacks=["gpt-3.5-turbo", "claude-3-sonnet"]
|
||||
)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Architecture Decision Matrix
|
||||
|
||||
| Pattern | Use When | Complexity | Cost |
|
||||
| :------------------- | :--------------- | :--------- | :-------- |
|
||||
| **Simple RAG** | FAQ, docs search | Low | Low |
|
||||
| **Hybrid RAG** | Mixed queries | Medium | Medium |
|
||||
| **ReAct Agent** | Multi-step tasks | Medium | Medium |
|
||||
| **Function Calling** | Structured tools | Low | Low |
|
||||
| **Plan-Execute** | Complex tasks | High | High |
|
||||
| **Multi-Agent** | Research tasks | Very High | Very High |
|
||||
|
||||
---
|
||||
|
||||
## Resources
|
||||
|
||||
- [Dify Platform](https://github.com/langgenius/dify)
|
||||
- [LangChain Docs](https://python.langchain.com/)
|
||||
- [LlamaIndex](https://www.llamaindex.ai/)
|
||||
- [Anthropic Cookbook](https://github.com/anthropics/anthropic-cookbook)
|
||||
@@ -0,0 +1,66 @@
|
||||
---
|
||||
name: prompt-caching
|
||||
description: "You're a caching specialist who has reduced LLM costs by 90% through strategic caching. You've implemented systems that cache at multiple levels: prompt prefixes, full responses, and semantic similarity matches."
|
||||
risk: unknown
|
||||
source: "vibeship-spawner-skills (Apache 2.0)"
|
||||
date_added: "2026-02-27"
|
||||
---
|
||||
|
||||
# Prompt Caching
|
||||
|
||||
You're a caching specialist who has reduced LLM costs by 90% through strategic caching.
|
||||
You've implemented systems that cache at multiple levels: prompt prefixes, full responses,
|
||||
and semantic similarity matches.
|
||||
|
||||
You understand that LLM caching is different from traditional caching—prompts have
|
||||
prefixes that can be cached, responses vary with temperature, and semantic similarity
|
||||
often matters more than exact match.
|
||||
|
||||
Your core principles:
|
||||
1. Cache at the right level—prefix, response, or both
|
||||
2. K
|
||||
|
||||
## Capabilities
|
||||
|
||||
- prompt-cache
|
||||
- response-cache
|
||||
- kv-cache
|
||||
- cag-patterns
|
||||
- cache-invalidation
|
||||
|
||||
## Patterns
|
||||
|
||||
### Anthropic Prompt Caching
|
||||
|
||||
Use Claude's native prompt caching for repeated prefixes
|
||||
|
||||
### Response Caching
|
||||
|
||||
Cache full LLM responses for identical or similar queries
|
||||
|
||||
### Cache Augmented Generation (CAG)
|
||||
|
||||
Pre-cache documents in prompt instead of RAG retrieval
|
||||
|
||||
## Anti-Patterns
|
||||
|
||||
### ❌ Caching with High Temperature
|
||||
|
||||
### ❌ No Cache Invalidation
|
||||
|
||||
### ❌ Caching Everything
|
||||
|
||||
## ⚠️ Sharp Edges
|
||||
|
||||
| Issue | Severity | Solution |
|
||||
|-------|----------|----------|
|
||||
| Cache miss causes latency spike with additional overhead | high | // Optimize for cache misses, not just hits |
|
||||
| Cached responses become incorrect over time | high | // Implement proper cache invalidation |
|
||||
| Prompt caching doesn't work due to prefix changes | medium | // Structure prompts for optimal caching |
|
||||
|
||||
## Related Skills
|
||||
|
||||
Works well with: `context-window-management`, `rag-implementation`, `conversation-memory`
|
||||
|
||||
## When to Use
|
||||
This skill is applicable to execute the workflow or actions described in the overview.
|
||||
@@ -0,0 +1,196 @@
|
||||
---
|
||||
name: rag-implementation
|
||||
description: "RAG (Retrieval-Augmented Generation) implementation workflow covering embedding selection, vector database setup, chunking strategies, and retrieval optimization."
|
||||
category: granular-workflow-bundle
|
||||
risk: safe
|
||||
source: personal
|
||||
date_added: "2026-02-27"
|
||||
---
|
||||
|
||||
# RAG Implementation Workflow
|
||||
|
||||
## Overview
|
||||
|
||||
Specialized workflow for implementing RAG (Retrieval-Augmented Generation) systems including embedding model selection, vector database setup, chunking strategies, retrieval optimization, and evaluation.
|
||||
|
||||
## When to Use This Workflow
|
||||
|
||||
Use this workflow when:
|
||||
- Building RAG-powered applications
|
||||
- Implementing semantic search
|
||||
- Creating knowledge-grounded AI
|
||||
- Setting up document Q&A systems
|
||||
- Optimizing retrieval quality
|
||||
|
||||
## Workflow Phases
|
||||
|
||||
### Phase 1: Requirements Analysis
|
||||
|
||||
#### Skills to Invoke
|
||||
- `ai-product` - AI product design
|
||||
- `rag-engineer` - RAG engineering
|
||||
|
||||
#### Actions
|
||||
1. Define use case
|
||||
2. Identify data sources
|
||||
3. Set accuracy requirements
|
||||
4. Determine latency targets
|
||||
5. Plan evaluation metrics
|
||||
|
||||
#### Copy-Paste Prompts
|
||||
```
|
||||
Use @ai-product to define RAG application requirements
|
||||
```
|
||||
|
||||
### Phase 2: Embedding Selection
|
||||
|
||||
#### Skills to Invoke
|
||||
- `embedding-strategies` - Embedding selection
|
||||
- `rag-engineer` - RAG patterns
|
||||
|
||||
#### Actions
|
||||
1. Evaluate embedding models
|
||||
2. Test domain relevance
|
||||
3. Measure embedding quality
|
||||
4. Consider cost/latency
|
||||
5. Select model
|
||||
|
||||
#### Copy-Paste Prompts
|
||||
```
|
||||
Use @embedding-strategies to select optimal embedding model
|
||||
```
|
||||
|
||||
### Phase 3: Vector Database Setup
|
||||
|
||||
#### Skills to Invoke
|
||||
- `vector-database-engineer` - Vector DB
|
||||
- `similarity-search-patterns` - Similarity search
|
||||
|
||||
#### Actions
|
||||
1. Choose vector database
|
||||
2. Design schema
|
||||
3. Configure indexes
|
||||
4. Set up connection
|
||||
5. Test queries
|
||||
|
||||
#### Copy-Paste Prompts
|
||||
```
|
||||
Use @vector-database-engineer to set up vector database
|
||||
```
|
||||
|
||||
### Phase 4: Chunking Strategy
|
||||
|
||||
#### Skills to Invoke
|
||||
- `rag-engineer` - Chunking strategies
|
||||
- `rag-implementation` - RAG implementation
|
||||
|
||||
#### Actions
|
||||
1. Choose chunk size
|
||||
2. Implement chunking
|
||||
3. Add overlap handling
|
||||
4. Create metadata
|
||||
5. Test retrieval quality
|
||||
|
||||
#### Copy-Paste Prompts
|
||||
```
|
||||
Use @rag-engineer to implement chunking strategy
|
||||
```
|
||||
|
||||
### Phase 5: Retrieval Implementation
|
||||
|
||||
#### Skills to Invoke
|
||||
- `similarity-search-patterns` - Similarity search
|
||||
- `hybrid-search-implementation` - Hybrid search
|
||||
|
||||
#### Actions
|
||||
1. Implement vector search
|
||||
2. Add keyword search
|
||||
3. Configure hybrid search
|
||||
4. Set up reranking
|
||||
5. Optimize latency
|
||||
|
||||
#### Copy-Paste Prompts
|
||||
```
|
||||
Use @similarity-search-patterns to implement retrieval
|
||||
```
|
||||
|
||||
```
|
||||
Use @hybrid-search-implementation to add hybrid search
|
||||
```
|
||||
|
||||
### Phase 6: LLM Integration
|
||||
|
||||
#### Skills to Invoke
|
||||
- `llm-application-dev-ai-assistant` - LLM integration
|
||||
- `llm-application-dev-prompt-optimize` - Prompt optimization
|
||||
|
||||
#### Actions
|
||||
1. Select LLM provider
|
||||
2. Design prompt template
|
||||
3. Implement context injection
|
||||
4. Add citation handling
|
||||
5. Test generation quality
|
||||
|
||||
#### Copy-Paste Prompts
|
||||
```
|
||||
Use @llm-application-dev-ai-assistant to integrate LLM
|
||||
```
|
||||
|
||||
### Phase 7: Caching
|
||||
|
||||
#### Skills to Invoke
|
||||
- `prompt-caching` - Prompt caching
|
||||
- `rag-engineer` - RAG optimization
|
||||
|
||||
#### Actions
|
||||
1. Implement response caching
|
||||
2. Set up embedding cache
|
||||
3. Configure TTL
|
||||
4. Add cache invalidation
|
||||
5. Monitor hit rates
|
||||
|
||||
#### Copy-Paste Prompts
|
||||
```
|
||||
Use @prompt-caching to implement RAG caching
|
||||
```
|
||||
|
||||
### Phase 8: Evaluation
|
||||
|
||||
#### Skills to Invoke
|
||||
- `llm-evaluation` - LLM evaluation
|
||||
- `evaluation` - AI evaluation
|
||||
|
||||
#### Actions
|
||||
1. Define evaluation metrics
|
||||
2. Create test dataset
|
||||
3. Measure retrieval accuracy
|
||||
4. Evaluate generation quality
|
||||
5. Iterate on improvements
|
||||
|
||||
#### Copy-Paste Prompts
|
||||
```
|
||||
Use @llm-evaluation to evaluate RAG system
|
||||
```
|
||||
|
||||
## RAG Architecture
|
||||
|
||||
```
|
||||
User Query -> Embedding -> Vector Search -> Retrieved Docs -> LLM -> Response
|
||||
| | | |
|
||||
Model Vector DB Chunk Store Prompt + Context
|
||||
```
|
||||
|
||||
## Quality Gates
|
||||
|
||||
- [ ] Embedding model selected
|
||||
- [ ] Vector DB configured
|
||||
- [ ] Chunking implemented
|
||||
- [ ] Retrieval working
|
||||
- [ ] LLM integrated
|
||||
- [ ] Evaluation passing
|
||||
|
||||
## Related Workflow Bundles
|
||||
|
||||
- `ai-ml` - AI/ML development
|
||||
- `ai-agent-development` - AI agents
|
||||
- `database` - Vector databases
|
||||
Reference in New Issue
Block a user