feat(bundles): add editorial bundle plugins

2026-03-27 08:48:03 +01:00
parent 8eff08b706
commit dffac91d3b
1052 changed files with 212282 additions and 68 deletions
--- a/plugins/antigravity-bundle-llm-application-developer/.codex-plugin/plugin.json
+++ b/plugins/antigravity-bundle-llm-application-developer/.codex-plugin/plugin.json
@@ -0,0 +1,33 @@
+{
+  "name": "antigravity-bundle-llm-application-developer",
+  "version": "8.10.0",
+  "description": "Install the \"LLM Application Developer\" editorial skill bundle from Antigravity Awesome Skills.",
+  "author": {
+    "name": "sickn33 and contributors",
+    "url": "https://github.com/sickn33/antigravity-awesome-skills"
+  },
+  "homepage": "https://github.com/sickn33/antigravity-awesome-skills",
+  "repository": "https://github.com/sickn33/antigravity-awesome-skills",
+  "license": "MIT",
+  "keywords": [
+    "codex",
+    "skills",
+    "bundle",
+    "llm-application-developer",
+    "productivity"
+  ],
+  "skills": "./skills/",
+  "interface": {
+    "displayName": "LLM Application Developer",
+    "shortDescription": "AI & Agents · 5 curated skills",
+    "longDescription": "For building production LLM applications. Covers LLM App Patterns, RAG Implementation, and 3 more skills.",
+    "developerName": "sickn33 and contributors",
+    "category": "AI & Agents",
+    "capabilities": [
+      "Interactive",
+      "Write"
+    ],
+    "websiteURL": "https://github.com/sickn33/antigravity-awesome-skills",
+    "brandColor": "#111827"
+  }
+}
--- a/plugins/antigravity-bundle-llm-application-developer/skills/context-window-management/SKILL.md
+++ b/plugins/antigravity-bundle-llm-application-developer/skills/context-window-management/SKILL.md
@@ -0,0 +1,58 @@
+---
+name: context-window-management
+description: "You're a context engineering specialist who has optimized LLM applications handling millions of conversations. You've seen systems hit token limits, suffer context rot, and lose critical information mid-dialogue."
+risk: unknown
+source: "vibeship-spawner-skills (Apache 2.0)"
+date_added: "2026-02-27"
+---
+
+# Context Window Management
+
+You're a context engineering specialist who has optimized LLM applications handling
+millions of conversations. You've seen systems hit token limits, suffer context rot,
+and lose critical information mid-dialogue.
+
+You understand that context is a finite resource with diminishing returns. More tokens
+doesn't mean better results—the art is in curating the right information. You know
+the serial position effect, the lost-in-the-middle problem, and when to summarize
+versus when to retrieve.
+
+Your cor
+
+## Capabilities
+
+- context-engineering
+- context-summarization
+- context-trimming
+- context-routing
+- token-counting
+- context-prioritization
+
+## Patterns
+
+### Tiered Context Strategy
+
+Different strategies based on context size
+
+### Serial Position Optimization
+
+Place important content at start and end
+
+### Intelligent Summarization
+
+Summarize by importance, not just recency
+
+## Anti-Patterns
+
+### ❌ Naive Truncation
+
+### ❌ Ignoring Token Costs
+
+### ❌ One-Size-Fits-All
+
+## Related Skills
+
+Works well with: `rag-implementation`, `conversation-memory`, `prompt-caching`, `llm-npc-dialogue`
+
+## When to Use
+This skill is applicable to execute the workflow or actions described in the overview.
--- a/plugins/antigravity-bundle-llm-application-developer/skills/langfuse/SKILL.md
+++ b/plugins/antigravity-bundle-llm-application-developer/skills/langfuse/SKILL.md
@@ -0,0 +1,243 @@
+---
+name: langfuse
+description: "You are an expert in LLM observability and evaluation. You think in terms of traces, spans, and metrics. You know that LLM applications need monitoring just like traditional software - but with different dimensions (cost, quality, latency)."
+risk: unknown
+source: "vibeship-spawner-skills (Apache 2.0)"
+date_added: "2026-02-27"
+---
+
+# Langfuse
+
+**Role**: LLM Observability Architect
+
+You are an expert in LLM observability and evaluation. You think in terms of
+traces, spans, and metrics. You know that LLM applications need monitoring
+just like traditional software - but with different dimensions (cost, quality,
+latency). You use data to drive prompt improvements and catch regressions.
+
+## Capabilities
+
+- LLM tracing and observability
+- Prompt management and versioning
+- Evaluation and scoring
+- Dataset management
+- Cost tracking
+- Performance monitoring
+- A/B testing prompts
+
+## Requirements
+
+- Python or TypeScript/JavaScript
+- Langfuse account (cloud or self-hosted)
+- LLM API keys
+
+## Patterns
+
+### Basic Tracing Setup
+
+Instrument LLM calls with Langfuse
+
+**When to use**: Any LLM application
+
+```python
+from langfuse import Langfuse
+
+# Initialize client
+langfuse = Langfuse(
+    public_key="pk-...",
+    secret_key="sk-...",
+    host="https://cloud.langfuse.com"  # or self-hosted URL
+)
+
+# Create a trace for a user request
+trace = langfuse.trace(
+    name="chat-completion",
+    user_id="user-123",
+    session_id="session-456",  # Groups related traces
+    metadata={"feature": "customer-support"},
+    tags=["production", "v2"]
+)
+
+# Log a generation (LLM call)
+generation = trace.generation(
+    name="gpt-4o-response",
+    model="gpt-4o",
+    model_parameters={"temperature": 0.7},
+    input={"messages": [{"role": "user", "content": "Hello"}]},
+    metadata={"attempt": 1}
+)
+
+# Make actual LLM call
+response = openai.chat.completions.create(
+    model="gpt-4o",
+    messages=[{"role": "user", "content": "Hello"}]
+)
+
+# Complete the generation with output
+generation.end(
+    output=response.choices[0].message.content,
+    usage={
+        "input": response.usage.prompt_tokens,
+        "output": response.usage.completion_tokens
+    }
+)
+
+# Score the trace
+trace.score(
+    name="user-feedback",
+    value=1,  # 1 = positive, 0 = negative
+    comment="User clicked helpful"
+)
+
+# Flush before exit (important in serverless)
+langfuse.flush()
+```
+
+### OpenAI Integration
+
+Automatic tracing with OpenAI SDK
+
+**When to use**: OpenAI-based applications
+
+```python
+from langfuse.openai import openai
+
+# Drop-in replacement for OpenAI client
+# All calls automatically traced
+
+response = openai.chat.completions.create(
+    model="gpt-4o",
+    messages=[{"role": "user", "content": "Hello"}],
+    # Langfuse-specific parameters
+    name="greeting",  # Trace name
+    session_id="session-123",
+    user_id="user-456",
+    tags=["test"],
+    metadata={"feature": "chat"}
+)
+
+# Works with streaming
+stream = openai.chat.completions.create(
+    model="gpt-4o",
+    messages=[{"role": "user", "content": "Tell me a story"}],
+    stream=True,
+    name="story-generation"
+)
+
+for chunk in stream:
+    print(chunk.choices[0].delta.content, end="")
+
+# Works with async
+import asyncio
+from langfuse.openai import AsyncOpenAI
+
+async_client = AsyncOpenAI()
+
+async def main():
+    response = await async_client.chat.completions.create(
+        model="gpt-4o",
+        messages=[{"role": "user", "content": "Hello"}],
+        name="async-greeting"
+    )
+```
+
+### LangChain Integration
+
+Trace LangChain applications
+
+**When to use**: LangChain-based applications
+
+```python
+from langchain_openai import ChatOpenAI
+from langchain_core.prompts import ChatPromptTemplate
+from langfuse.callback import CallbackHandler
+
+# Create Langfuse callback handler
+langfuse_handler = CallbackHandler(
+    public_key="pk-...",
+    secret_key="sk-...",
+    host="https://cloud.langfuse.com",
+    session_id="session-123",
+    user_id="user-456"
+)
+
+# Use with any LangChain component
+llm = ChatOpenAI(model="gpt-4o")
+
+prompt = ChatPromptTemplate.from_messages([
+    ("system", "You are a helpful assistant."),
+    ("user", "{input}")
+])
+
+chain = prompt | llm
+
+# Pass handler to invoke
+response = chain.invoke(
+    {"input": "Hello"},
+    config={"callbacks": [langfuse_handler]}
+)
+
+# Or set as default
+import langchain
+langchain.callbacks.manager.set_handler(langfuse_handler)
+
+# Then all calls are traced
+response = chain.invoke({"input": "Hello"})
+
+# Works with agents, retrievers, etc.
+from langchain.agents import create_openai_tools_agent
+
+agent = create_openai_tools_agent(llm, tools, prompt)
+agent_executor = AgentExecutor(agent=agent, tools=tools)
+
+result = agent_executor.invoke(
+    {"input": "What's the weather?"},
+    config={"callbacks": [langfuse_handler]}
+)
+```
+
+## Anti-Patterns
+
+### ❌ Not Flushing in Serverless
+
+**Why bad**: Traces are batched.
+Serverless may exit before flush.
+Data is lost.
+
+**Instead**: Always call langfuse.flush() at end.
+Use context managers where available.
+Consider sync mode for critical traces.
+
+### ❌ Tracing Everything
+
+**Why bad**: Noisy traces.
+Performance overhead.
+Hard to find important info.
+
+**Instead**: Focus on: LLM calls, key logic, user actions.
+Group related operations.
+Use meaningful span names.
+
+### ❌ No User/Session IDs
+
+**Why bad**: Can't debug specific users.
+Can't track sessions.
+Analytics limited.
+
+**Instead**: Always pass user_id and session_id.
+Use consistent identifiers.
+Add relevant metadata.
+
+## Limitations
+
+- Self-hosted requires infrastructure
+- High-volume may need optimization
+- Real-time dashboard has latency
+- Evaluation requires setup
+
+## Related Skills
+
+Works well with: `langgraph`, `crewai`, `structured-output`, `autonomous-agents`
+
+## When to Use
+This skill is applicable to execute the workflow or actions described in the overview.
--- a/plugins/antigravity-bundle-llm-application-developer/skills/llm-app-patterns/SKILL.md
+++ b/plugins/antigravity-bundle-llm-application-developer/skills/llm-app-patterns/SKILL.md
@@ -0,0 +1,763 @@
+---
+name: llm-app-patterns
+description: "Production-ready patterns for building LLM applications, inspired by [Dify](https://github.com/langgenius/dify) and industry best practices."
+risk: unknown
+source: community
+date_added: "2026-02-27"
+---
+
+# 🤖 LLM Application Patterns
+
+> Production-ready patterns for building LLM applications, inspired by [Dify](https://github.com/langgenius/dify) and industry best practices.
+
+## When to Use This Skill
+
+Use this skill when:
+
+- Designing LLM-powered applications
+- Implementing RAG (Retrieval-Augmented Generation)
+- Building AI agents with tools
+- Setting up LLMOps monitoring
+- Choosing between agent architectures
+
+---
+
+## 1. RAG Pipeline Architecture
+
+### Overview
+
+RAG (Retrieval-Augmented Generation) grounds LLM responses in your data.
+
+```
+┌─────────────┐     ┌─────────────┐     ┌─────────────┐
+│   Ingest    │────▶│   Retrieve  │────▶│   Generate  │
+│  Documents  │     │   Context   │     │   Response  │
+└─────────────┘     └─────────────┘     └─────────────┘
+      │                   │                   │
+      ▼                   ▼                   ▼
+ ┌─────────┐       ┌───────────┐       ┌───────────┐
+ │ Chunking│       │  Vector   │       │    LLM    │
+ │Embedding│       │  Search   │       │  + Context│
+ └─────────┘       └───────────┘       └───────────┘
+```
+
+### 1.1 Document Ingestion
+
+```python
+# Chunking strategies
+class ChunkingStrategy:
+    # Fixed-size chunks (simple but may break context)
+    FIXED_SIZE = "fixed_size"  # e.g., 512 tokens
+
+    # Semantic chunking (preserves meaning)
+    SEMANTIC = "semantic"      # Split on paragraphs/sections
+
+    # Recursive splitting (tries multiple separators)
+    RECURSIVE = "recursive"    # ["\n\n", "\n", " ", ""]
+
+    # Document-aware (respects structure)
+    DOCUMENT_AWARE = "document_aware"  # Headers, lists, etc.
+
+# Recommended settings
+CHUNK_CONFIG = {
+    "chunk_size": 512,       # tokens
+    "chunk_overlap": 50,     # token overlap between chunks
+    "separators": ["\n\n", "\n", ". ", " "],
+}
+```
+
+### 1.2 Embedding & Storage
+
+```python
+# Vector database selection
+VECTOR_DB_OPTIONS = {
+    "pinecone": {
+        "use_case": "Production, managed service",
+        "scale": "Billions of vectors",
+        "features": ["Hybrid search", "Metadata filtering"]
+    },
+    "weaviate": {
+        "use_case": "Self-hosted, multi-modal",
+        "scale": "Millions of vectors",
+        "features": ["GraphQL API", "Modules"]
+    },
+    "chromadb": {
+        "use_case": "Development, prototyping",
+        "scale": "Thousands of vectors",
+        "features": ["Simple API", "In-memory option"]
+    },
+    "pgvector": {
+        "use_case": "Existing Postgres infrastructure",
+        "scale": "Millions of vectors",
+        "features": ["SQL integration", "ACID compliance"]
+    }
+}
+
+# Embedding model selection
+EMBEDDING_MODELS = {
+    "openai/text-embedding-3-small": {
+        "dimensions": 1536,
+        "cost": "$0.02/1M tokens",
+        "quality": "Good for most use cases"
+    },
+    "openai/text-embedding-3-large": {
+        "dimensions": 3072,
+        "cost": "$0.13/1M tokens",
+        "quality": "Best for complex queries"
+    },
+    "local/bge-large": {
+        "dimensions": 1024,
+        "cost": "Free (compute only)",
+        "quality": "Comparable to OpenAI small"
+    }
+}
+```
+
+### 1.3 Retrieval Strategies
+
+```python
+# Basic semantic search
+def semantic_search(query: str, top_k: int = 5):
+    query_embedding = embed(query)
+    results = vector_db.similarity_search(
+        query_embedding,
+        top_k=top_k
+    )
+    return results
+
+# Hybrid search (semantic + keyword)
+def hybrid_search(query: str, top_k: int = 5, alpha: float = 0.5):
+    """
+    alpha=1.0: Pure semantic
+    alpha=0.0: Pure keyword (BM25)
+    alpha=0.5: Balanced
+    """
+    semantic_results = vector_db.similarity_search(query)
+    keyword_results = bm25_search(query)
+
+    # Reciprocal Rank Fusion
+    return rrf_merge(semantic_results, keyword_results, alpha)
+
+# Multi-query retrieval
+def multi_query_retrieval(query: str):
+    """Generate multiple query variations for better recall"""
+    queries = llm.generate_query_variations(query, n=3)
+    all_results = []
+    for q in queries:
+        all_results.extend(semantic_search(q))
+    return deduplicate(all_results)
+
+# Contextual compression
+def compressed_retrieval(query: str):
+    """Retrieve then compress to relevant parts only"""
+    docs = semantic_search(query, top_k=10)
+    compressed = llm.extract_relevant_parts(docs, query)
+    return compressed
+```
+
+### 1.4 Generation with Context
+
+```python
+RAG_PROMPT_TEMPLATE = """
+Answer the user's question based ONLY on the following context.
+If the context doesn't contain enough information, say "I don't have enough information to answer that."
+
+Context:
+{context}
+
+Question: {question}
+
+Answer:"""
+
+def generate_with_rag(question: str):
+    # Retrieve
+    context_docs = hybrid_search(question, top_k=5)
+    context = "\n\n".join([doc.content for doc in context_docs])
+
+    # Generate
+    prompt = RAG_PROMPT_TEMPLATE.format(
+        context=context,
+        question=question
+    )
+
+    response = llm.generate(prompt)
+
+    # Return with citations
+    return {
+        "answer": response,
+        "sources": [doc.metadata for doc in context_docs]
+    }
+```
+
+---
+
+## 2. Agent Architectures
+
+### 2.1 ReAct Pattern (Reasoning + Acting)
+
+```
+Thought: I need to search for information about X
+Action: search("X")
+Observation: [search results]
+Thought: Based on the results, I should...
+Action: calculate(...)
+Observation: [calculation result]
+Thought: I now have enough information
+Action: final_answer("The answer is...")
+```
+
+```python
+REACT_PROMPT = """
+You are an AI assistant that can use tools to answer questions.
+
+Available tools:
+{tools_description}
+
+Use this format:
+Thought: [your reasoning about what to do next]
+Action: [tool_name(arguments)]
+Observation: [tool result - this will be filled in]
+... (repeat Thought/Action/Observation as needed)
+Thought: I have enough information to answer
+Final Answer: [your final response]
+
+Question: {question}
+"""
+
+class ReActAgent:
+    def __init__(self, tools: list, llm):
+        self.tools = {t.name: t for t in tools}
+        self.llm = llm
+        self.max_iterations = 10
+
+    def run(self, question: str) -> str:
+        prompt = REACT_PROMPT.format(
+            tools_description=self._format_tools(),
+            question=question
+        )
+
+        for _ in range(self.max_iterations):
+            response = self.llm.generate(prompt)
+
+            if "Final Answer:" in response:
+                return self._extract_final_answer(response)
+
+            action = self._parse_action(response)
+            observation = self._execute_tool(action)
+            prompt += f"\nObservation: {observation}\n"
+
+        return "Max iterations reached"
+```
+
+### 2.2 Function Calling Pattern
+
+```python
+# Define tools as functions with schemas
+TOOLS = [
+    {
+        "name": "search_web",
+        "description": "Search the web for current information",
+        "parameters": {
+            "type": "object",
+            "properties": {
+                "query": {
+                    "type": "string",
+                    "description": "Search query"
+                }
+            },
+            "required": ["query"]
+        }
+    },
+    {
+        "name": "calculate",
+        "description": "Perform mathematical calculations",
+        "parameters": {
+            "type": "object",
+            "properties": {
+                "expression": {
+                    "type": "string",
+                    "description": "Math expression to evaluate"
+                }
+            },
+            "required": ["expression"]
+        }
+    }
+]
+
+class FunctionCallingAgent:
+    def run(self, question: str) -> str:
+        messages = [{"role": "user", "content": question}]
+
+        while True:
+            response = self.llm.chat(
+                messages=messages,
+                tools=TOOLS,
+                tool_choice="auto"
+            )
+
+            if response.tool_calls:
+                for tool_call in response.tool_calls:
+                    result = self._execute_tool(
+                        tool_call.name,
+                        tool_call.arguments
+                    )
+                    messages.append({
+                        "role": "tool",
+                        "tool_call_id": tool_call.id,
+                        "content": str(result)
+                    })
+            else:
+                return response.content
+```
+
+### 2.3 Plan-and-Execute Pattern
+
+```python
+class PlanAndExecuteAgent:
+    """
+    1. Create a plan (list of steps)
+    2. Execute each step
+    3. Replan if needed
+    """
+
+    def run(self, task: str) -> str:
+        # Planning phase
+        plan = self.planner.create_plan(task)
+        # Returns: ["Step 1: ...", "Step 2: ...", ...]
+
+        results = []
+        for step in plan:
+            # Execute each step
+            result = self.executor.execute(step, context=results)
+            results.append(result)
+
+            # Check if replan needed
+            if self._needs_replan(task, results):
+                new_plan = self.planner.replan(
+                    task,
+                    completed=results,
+                    remaining=plan[len(results):]
+                )
+                plan = new_plan
+
+        # Synthesize final answer
+        return self.synthesizer.summarize(task, results)
+```
+
+### 2.4 Multi-Agent Collaboration
+
+```python
+class AgentTeam:
+    """
+    Specialized agents collaborating on complex tasks
+    """
+
+    def __init__(self):
+        self.agents = {
+            "researcher": ResearchAgent(),
+            "analyst": AnalystAgent(),
+            "writer": WriterAgent(),
+            "critic": CriticAgent()
+        }
+        self.coordinator = CoordinatorAgent()
+
+    def solve(self, task: str) -> str:
+        # Coordinator assigns subtasks
+        assignments = self.coordinator.decompose(task)
+
+        results = {}
+        for assignment in assignments:
+            agent = self.agents[assignment.agent]
+            result = agent.execute(
+                assignment.subtask,
+                context=results
+            )
+            results[assignment.id] = result
+
+        # Critic reviews
+        critique = self.agents["critic"].review(results)
+
+        if critique.needs_revision:
+            # Iterate with feedback
+            return self.solve_with_feedback(task, results, critique)
+
+        return self.coordinator.synthesize(results)
+```
+
+---
+
+## 3. Prompt IDE Patterns
+
+### 3.1 Prompt Templates with Variables
+
+```python
+class PromptTemplate:
+    def __init__(self, template: str, variables: list[str]):
+        self.template = template
+        self.variables = variables
+
+    def format(self, **kwargs) -> str:
+        # Validate all variables provided
+        missing = set(self.variables) - set(kwargs.keys())
+        if missing:
+            raise ValueError(f"Missing variables: {missing}")
+
+        return self.template.format(**kwargs)
+
+    def with_examples(self, examples: list[dict]) -> str:
+        """Add few-shot examples"""
+        example_text = "\n\n".join([
+            f"Input: {ex['input']}\nOutput: {ex['output']}"
+            for ex in examples
+        ])
+        return f"{example_text}\n\n{self.template}"
+
+# Usage
+summarizer = PromptTemplate(
+    template="Summarize the following text in {style} style:\n\n{text}",
+    variables=["style", "text"]
+)
+
+prompt = summarizer.format(
+    style="professional",
+    text="Long article content..."
+)
+```
+
+### 3.2 Prompt Versioning & A/B Testing
+
+```python
+class PromptRegistry:
+    def __init__(self, db):
+        self.db = db
+
+    def register(self, name: str, template: str, version: str):
+        """Store prompt with version"""
+        self.db.save({
+            "name": name,
+            "template": template,
+            "version": version,
+            "created_at": datetime.now(),
+            "metrics": {}
+        })
+
+    def get(self, name: str, version: str = "latest") -> str:
+        """Retrieve specific version"""
+        return self.db.get(name, version)
+
+    def ab_test(self, name: str, user_id: str) -> str:
+        """Return variant based on user bucket"""
+        variants = self.db.get_all_versions(name)
+        bucket = hash(user_id) % len(variants)
+        return variants[bucket]
+
+    def record_outcome(self, prompt_id: str, outcome: dict):
+        """Track prompt performance"""
+        self.db.update_metrics(prompt_id, outcome)
+```
+
+### 3.3 Prompt Chaining
+
+```python
+class PromptChain:
+    """
+    Chain prompts together, passing output as input to next
+    """
+
+    def __init__(self, steps: list[dict]):
+        self.steps = steps
+
+    def run(self, initial_input: str) -> dict:
+        context = {"input": initial_input}
+        results = []
+
+        for step in self.steps:
+            prompt = step["prompt"].format(**context)
+            output = llm.generate(prompt)
+
+            # Parse output if needed
+            if step.get("parser"):
+                output = step"parser"
+
+            context[step["output_key"]] = output
+            results.append({
+                "step": step["name"],
+                "output": output
+            })
+
+        return {
+            "final_output": context[self.steps[-1]["output_key"]],
+            "intermediate_results": results
+        }
+
+# Example: Research → Analyze → Summarize
+chain = PromptChain([
+    {
+        "name": "research",
+        "prompt": "Research the topic: {input}",
+        "output_key": "research"
+    },
+    {
+        "name": "analyze",
+        "prompt": "Analyze these findings:\n{research}",
+        "output_key": "analysis"
+    },
+    {
+        "name": "summarize",
+        "prompt": "Summarize this analysis in 3 bullet points:\n{analysis}",
+        "output_key": "summary"
+    }
+])
+```
+
+---
+
+## 4. LLMOps & Observability
+
+### 4.1 Metrics to Track
+
+```python
+LLM_METRICS = {
+    # Performance
+    "latency_p50": "50th percentile response time",
+    "latency_p99": "99th percentile response time",
+    "tokens_per_second": "Generation speed",
+
+    # Quality
+    "user_satisfaction": "Thumbs up/down ratio",
+    "task_completion": "% tasks completed successfully",
+    "hallucination_rate": "% responses with factual errors",
+
+    # Cost
+    "cost_per_request": "Average $ per API call",
+    "tokens_per_request": "Average tokens used",
+    "cache_hit_rate": "% requests served from cache",
+
+    # Reliability
+    "error_rate": "% failed requests",
+    "timeout_rate": "% requests that timed out",
+    "retry_rate": "% requests needing retry"
+}
+```
+
+### 4.2 Logging & Tracing
+
+```python
+import logging
+from opentelemetry import trace
+
+tracer = trace.get_tracer(__name__)
+
+class LLMLogger:
+    def log_request(self, request_id: str, data: dict):
+        """Log LLM request for debugging and analysis"""
+        log_entry = {
+            "request_id": request_id,
+            "timestamp": datetime.now().isoformat(),
+            "model": data["model"],
+            "prompt": data["prompt"][:500],  # Truncate for storage
+            "prompt_tokens": data["prompt_tokens"],
+            "temperature": data.get("temperature", 1.0),
+            "user_id": data.get("user_id"),
+        }
+        logging.info(f"LLM_REQUEST: {json.dumps(log_entry)}")
+
+    def log_response(self, request_id: str, data: dict):
+        """Log LLM response"""
+        log_entry = {
+            "request_id": request_id,
+            "completion_tokens": data["completion_tokens"],
+            "total_tokens": data["total_tokens"],
+            "latency_ms": data["latency_ms"],
+            "finish_reason": data["finish_reason"],
+            "cost_usd": self._calculate_cost(data),
+        }
+        logging.info(f"LLM_RESPONSE: {json.dumps(log_entry)}")
+
+# Distributed tracing
+@tracer.start_as_current_span("llm_call")
+def call_llm(prompt: str) -> str:
+    span = trace.get_current_span()
+    span.set_attribute("prompt.length", len(prompt))
+
+    response = llm.generate(prompt)
+
+    span.set_attribute("response.length", len(response))
+    span.set_attribute("tokens.total", response.usage.total_tokens)
+
+    return response.content
+```
+
+### 4.3 Evaluation Framework
+
+```python
+class LLMEvaluator:
+    """
+    Evaluate LLM outputs for quality
+    """
+
+    def evaluate_response(self,
+                          question: str,
+                          response: str,
+                          ground_truth: str = None) -> dict:
+        scores = {}
+
+        # Relevance: Does it answer the question?
+        scores["relevance"] = self._score_relevance(question, response)
+
+        # Coherence: Is it well-structured?
+        scores["coherence"] = self._score_coherence(response)
+
+        # Groundedness: Is it based on provided context?
+        scores["groundedness"] = self._score_groundedness(response)
+
+        # Accuracy: Does it match ground truth?
+        if ground_truth:
+            scores["accuracy"] = self._score_accuracy(response, ground_truth)
+
+        # Harmfulness: Is it safe?
+        scores["safety"] = self._score_safety(response)
+
+        return scores
+
+    def run_benchmark(self, test_cases: list[dict]) -> dict:
+        """Run evaluation on test set"""
+        results = []
+        for case in test_cases:
+            response = llm.generate(case["prompt"])
+            scores = self.evaluate_response(
+                question=case["prompt"],
+                response=response,
+                ground_truth=case.get("expected")
+            )
+            results.append(scores)
+
+        return self._aggregate_scores(results)
+```
+
+---
+
+## 5. Production Patterns
+
+### 5.1 Caching Strategy
+
+```python
+import hashlib
+from functools import lru_cache
+
+class LLMCache:
+    def __init__(self, redis_client, ttl_seconds=3600):
+        self.redis = redis_client
+        self.ttl = ttl_seconds
+
+    def _cache_key(self, prompt: str, model: str, **kwargs) -> str:
+        """Generate deterministic cache key"""
+        content = f"{model}:{prompt}:{json.dumps(kwargs, sort_keys=True)}"
+        return hashlib.sha256(content.encode()).hexdigest()
+
+    def get_or_generate(self, prompt: str, model: str, **kwargs) -> str:
+        key = self._cache_key(prompt, model, **kwargs)
+
+        # Check cache
+        cached = self.redis.get(key)
+        if cached:
+            return cached.decode()
+
+        # Generate
+        response = llm.generate(prompt, model=model, **kwargs)
+
+        # Cache (only cache deterministic outputs)
+        if kwargs.get("temperature", 1.0) == 0:
+            self.redis.setex(key, self.ttl, response)
+
+        return response
+```
+
+### 5.2 Rate Limiting & Retry
+
+```python
+import time
+from tenacity import retry, wait_exponential, stop_after_attempt
+
+class RateLimiter:
+    def __init__(self, requests_per_minute: int):
+        self.rpm = requests_per_minute
+        self.timestamps = []
+
+    def acquire(self):
+        """Wait if rate limit would be exceeded"""
+        now = time.time()
+
+        # Remove old timestamps
+        self.timestamps = [t for t in self.timestamps if now - t < 60]
+
+        if len(self.timestamps) >= self.rpm:
+            sleep_time = 60 - (now - self.timestamps[0])
+            time.sleep(sleep_time)
+
+        self.timestamps.append(time.time())
+
+# Retry with exponential backoff
+@retry(
+    wait=wait_exponential(multiplier=1, min=4, max=60),
+    stop=stop_after_attempt(5)
+)
+def call_llm_with_retry(prompt: str) -> str:
+    try:
+        return llm.generate(prompt)
+    except RateLimitError:
+        raise  # Will trigger retry
+    except APIError as e:
+        if e.status_code >= 500:
+            raise  # Retry server errors
+        raise  # Don't retry client errors
+```
+
+### 5.3 Fallback Strategy
+
+```python
+class LLMWithFallback:
+    def __init__(self, primary: str, fallbacks: list[str]):
+        self.primary = primary
+        self.fallbacks = fallbacks
+
+    def generate(self, prompt: str, **kwargs) -> str:
+        models = [self.primary] + self.fallbacks
+
+        for model in models:
+            try:
+                return llm.generate(prompt, model=model, **kwargs)
+            except (RateLimitError, APIError) as e:
+                logging.warning(f"Model {model} failed: {e}")
+                continue
+
+        raise AllModelsFailedError("All models exhausted")
+
+# Usage
+llm_client = LLMWithFallback(
+    primary="gpt-4-turbo",
+    fallbacks=["gpt-3.5-turbo", "claude-3-sonnet"]
+)
+```
+
+---
+
+## Architecture Decision Matrix
+
+| Pattern              | Use When         | Complexity | Cost      |
+| :------------------- | :--------------- | :--------- | :-------- |
+| **Simple RAG**       | FAQ, docs search | Low        | Low       |
+| **Hybrid RAG**       | Mixed queries    | Medium     | Medium    |
+| **ReAct Agent**      | Multi-step tasks | Medium     | Medium    |
+| **Function Calling** | Structured tools | Low        | Low       |
+| **Plan-Execute**     | Complex tasks    | High       | High      |
+| **Multi-Agent**      | Research tasks   | Very High  | Very High |
+
+---
+
+## Resources
+
+- [Dify Platform](https://github.com/langgenius/dify)
+- [LangChain Docs](https://python.langchain.com/)
+- [LlamaIndex](https://www.llamaindex.ai/)
+- [Anthropic Cookbook](https://github.com/anthropics/anthropic-cookbook)
--- a/plugins/antigravity-bundle-llm-application-developer/skills/prompt-caching/SKILL.md
+++ b/plugins/antigravity-bundle-llm-application-developer/skills/prompt-caching/SKILL.md
@@ -0,0 +1,66 @@
+---
+name: prompt-caching
+description: "You're a caching specialist who has reduced LLM costs by 90% through strategic caching. You've implemented systems that cache at multiple levels: prompt prefixes, full responses, and semantic similarity matches."
+risk: unknown
+source: "vibeship-spawner-skills (Apache 2.0)"
+date_added: "2026-02-27"
+---
+
+# Prompt Caching
+
+You're a caching specialist who has reduced LLM costs by 90% through strategic caching.
+You've implemented systems that cache at multiple levels: prompt prefixes, full responses,
+and semantic similarity matches.
+
+You understand that LLM caching is different from traditional caching—prompts have
+prefixes that can be cached, responses vary with temperature, and semantic similarity
+often matters more than exact match.
+
+Your core principles:
+1. Cache at the right level—prefix, response, or both
+2. K
+
+## Capabilities
+
+- prompt-cache
+- response-cache
+- kv-cache
+- cag-patterns
+- cache-invalidation
+
+## Patterns
+
+### Anthropic Prompt Caching
+
+Use Claude's native prompt caching for repeated prefixes
+
+### Response Caching
+
+Cache full LLM responses for identical or similar queries
+
+### Cache Augmented Generation (CAG)
+
+Pre-cache documents in prompt instead of RAG retrieval
+
+## Anti-Patterns
+
+### ❌ Caching with High Temperature
+
+### ❌ No Cache Invalidation
+
+### ❌ Caching Everything
+
+## ⚠️ Sharp Edges
+
+| Issue | Severity | Solution |
+|-------|----------|----------|
+| Cache miss causes latency spike with additional overhead | high | // Optimize for cache misses, not just hits |
+| Cached responses become incorrect over time | high | // Implement proper cache invalidation |
+| Prompt caching doesn't work due to prefix changes | medium | // Structure prompts for optimal caching |
+
+## Related Skills
+
+Works well with: `context-window-management`, `rag-implementation`, `conversation-memory`
+
+## When to Use
+This skill is applicable to execute the workflow or actions described in the overview.
--- a/plugins/antigravity-bundle-llm-application-developer/skills/rag-implementation/SKILL.md
+++ b/plugins/antigravity-bundle-llm-application-developer/skills/rag-implementation/SKILL.md
@@ -0,0 +1,196 @@
+---
+name: rag-implementation
+description: "RAG (Retrieval-Augmented Generation) implementation workflow covering embedding selection, vector database setup, chunking strategies, and retrieval optimization."
+category: granular-workflow-bundle
+risk: safe
+source: personal
+date_added: "2026-02-27"
+---
+
+# RAG Implementation Workflow
+
+## Overview
+
+Specialized workflow for implementing RAG (Retrieval-Augmented Generation) systems including embedding model selection, vector database setup, chunking strategies, retrieval optimization, and evaluation.
+
+## When to Use This Workflow
+
+Use this workflow when:
+- Building RAG-powered applications
+- Implementing semantic search
+- Creating knowledge-grounded AI
+- Setting up document Q&A systems
+- Optimizing retrieval quality
+
+## Workflow Phases
+
+### Phase 1: Requirements Analysis
+
+#### Skills to Invoke
+- `ai-product` - AI product design
+- `rag-engineer` - RAG engineering
+
+#### Actions
+1. Define use case
+2. Identify data sources
+3. Set accuracy requirements
+4. Determine latency targets
+5. Plan evaluation metrics
+
+#### Copy-Paste Prompts
+```
+Use @ai-product to define RAG application requirements
+```
+
+### Phase 2: Embedding Selection
+
+#### Skills to Invoke
+- `embedding-strategies` - Embedding selection
+- `rag-engineer` - RAG patterns
+
+#### Actions
+1. Evaluate embedding models
+2. Test domain relevance
+3. Measure embedding quality
+4. Consider cost/latency
+5. Select model
+
+#### Copy-Paste Prompts
+```
+Use @embedding-strategies to select optimal embedding model
+```
+
+### Phase 3: Vector Database Setup
+
+#### Skills to Invoke
+- `vector-database-engineer` - Vector DB
+- `similarity-search-patterns` - Similarity search
+
+#### Actions
+1. Choose vector database
+2. Design schema
+3. Configure indexes
+4. Set up connection
+5. Test queries
+
+#### Copy-Paste Prompts
+```
+Use @vector-database-engineer to set up vector database
+```
+
+### Phase 4: Chunking Strategy
+
+#### Skills to Invoke
+- `rag-engineer` - Chunking strategies
+- `rag-implementation` - RAG implementation
+
+#### Actions
+1. Choose chunk size
+2. Implement chunking
+3. Add overlap handling
+4. Create metadata
+5. Test retrieval quality
+
+#### Copy-Paste Prompts
+```
+Use @rag-engineer to implement chunking strategy
+```
+
+### Phase 5: Retrieval Implementation
+
+#### Skills to Invoke
+- `similarity-search-patterns` - Similarity search
+- `hybrid-search-implementation` - Hybrid search
+
+#### Actions
+1. Implement vector search
+2. Add keyword search
+3. Configure hybrid search
+4. Set up reranking
+5. Optimize latency
+
+#### Copy-Paste Prompts
+```
+Use @similarity-search-patterns to implement retrieval
+```
+
+```
+Use @hybrid-search-implementation to add hybrid search
+```
+
+### Phase 6: LLM Integration
+
+#### Skills to Invoke
+- `llm-application-dev-ai-assistant` - LLM integration
+- `llm-application-dev-prompt-optimize` - Prompt optimization
+
+#### Actions
+1. Select LLM provider
+2. Design prompt template
+3. Implement context injection
+4. Add citation handling
+5. Test generation quality
+
+#### Copy-Paste Prompts
+```
+Use @llm-application-dev-ai-assistant to integrate LLM
+```
+
+### Phase 7: Caching
+
+#### Skills to Invoke
+- `prompt-caching` - Prompt caching
+- `rag-engineer` - RAG optimization
+
+#### Actions
+1. Implement response caching
+2. Set up embedding cache
+3. Configure TTL
+4. Add cache invalidation
+5. Monitor hit rates
+
+#### Copy-Paste Prompts
+```
+Use @prompt-caching to implement RAG caching
+```
+
+### Phase 8: Evaluation
+
+#### Skills to Invoke
+- `llm-evaluation` - LLM evaluation
+- `evaluation` - AI evaluation
+
+#### Actions
+1. Define evaluation metrics
+2. Create test dataset
+3. Measure retrieval accuracy
+4. Evaluate generation quality
+5. Iterate on improvements
+
+#### Copy-Paste Prompts
+```
+Use @llm-evaluation to evaluate RAG system
+```
+
+## RAG Architecture
+
+```
+User Query -> Embedding -> Vector Search -> Retrieved Docs -> LLM -> Response
+                |              |              |              |
+            Model         Vector DB     Chunk Store    Prompt + Context
+```
+
+## Quality Gates
+
+- [ ] Embedding model selected
+- [ ] Vector DB configured
+- [ ] Chunking implemented
+- [ ] Retrieval working
+- [ ] LLM integrated
+- [ ] Evaluation passing
+
+## Related Workflow Bundles
+
+- `ai-ml` - AI/ML development
+- `ai-agent-development` - AI agents
+- `database` - Vector databases