* fix: add missing plugin.json files and restore trailing newlines - Add plugin.json for review-fix-a11y skill - Add plugin.json for free-llm-api skill - Restore POSIX-compliant trailing newlines in JSON index files * feat(engineering): add review-fix-a11y skill (WCAG 2.2 a11y audit + fix) (#375) Adds review-fix-a11y (WCAG 2.2 a11y audit + fix) and free-llm-api skills. Includes: - review-fix-a11y: WCAG 2.2 audit workflow, a11y_audit.py scanner, contrast_checker.py - free-llm-api: ChatAnywhere, Groq, Cerebras, OpenRouter, llm-mux, One API setup - secret_scanner.py upgrade with secrets-patterns-db integration (1,600+ patterns) Co-authored-by: ivanopenclaw223-alt <ivanopenclaw223-alt@users.noreply.github.com> * chore: sync codex skills symlinks [automated] * Revert "feat(engineering): add review-fix-a11y skill (WCAG 2.2 a11y audit + fix) (#375)" This reverts commit49c9f2109f. * chore: sync codex skills symlinks [automated] * Revert "feat(engineering): add review-fix-a11y skill (WCAG 2.2 a11y audit + fix) (#375)" This reverts commit49c9f2109f. * feat(engineering-team): add a11y-audit skill — WCAG 2.2 accessibility audit & fix (#376) Built from scratch (replaces reverted PR #375 contribution). Skill package: - SKILL.md: 1132 lines, 3-phase workflow (scan → fix → verify), per-framework fix patterns (React, Next.js, Vue, Angular, Svelte, HTML), CI/CD integration guide, 20+ issue type coverage - scripts/a11y_scanner.py: static scanner detecting 20+ violation types across HTML/JSX/TSX/Vue/Svelte/CSS — severity-ranked, CI-friendly exit codes - scripts/contrast_checker.py: WCAG contrast calculator with AA/AAA checks, --suggest mode, --batch CSS scanning, named color support - references/wcag-quick-ref.md: WCAG 2.2 Level A/AA criteria table - references/aria-patterns.md: ARIA roles, live regions, keyboard interaction - references/framework-a11y-patterns.md: React, Vue, Angular, Svelte fix patterns - assets/sample-component.tsx: sample file with intentional violations - expected_outputs/: scan report, contrast output, JSON output samples - /a11y-audit slash command, settings.json, plugin.json, README.md Validation: 97.6/100 (EXCELLENT), quality 73.9/100 (B-), scripts 2/2 PASS Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: sync codex skills symlinks [automated] * docs: sync counts across all docs — 205 skills, 268 tools, 19 commands, 22 plugins Update CLAUDE.md, README.md, docs/index.md, docs/getting-started.md, mkdocs.yml, marketplace.json with consistent counts. Sync Gemini CLI index with new skills (code-to-prd, plugin-audit). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat(marketplace): add 6 missing standalone plugins — total 22→28 Added to marketplace: - a11y-audit (WCAG 2.2 accessibility audit) - executive-mentor (adversarial thinking partner) - docker-development (Dockerfile, compose, multi-stage) - helm-chart-builder (Helm chart scaffolding) - terraform-patterns (IaC module design) - research-summarizer (structured research synthesis) Also fixed version 1.0.0 → 2.1.2 on 4 plugin.json files (executive-mentor, docker-development, helm-chart-builder, research-summarizer) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat(commands): add /seo-auditor — 7-phase SEO audit pipeline for documentation - 7 phases: discovery → meta tags → content quality → keywords → links → sitemap → report - Integrates 8 marketing-skill scripts: seo_checker, content_scorer, humanizer_scorer, headline_scorer, seo_optimizer, sitemap_analyzer, schema_validator, topic_cluster_mapper - References 6 SEO knowledge bases for audit framework, AI search, content optimization, URL design, internal linking, AI detection - Auto-fixes: generic titles, missing descriptions, broken links, orphan pages - Preserves high-ranking pages — only fixes critical issues on those - Registered in both commands/ (distributable) and .claude/commands/ (local) Also: sync all doc counts — 28 plugins, 26 eng-core skills, 21 commands Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(seo): fix multi-line YAML description parser, add 2 orphan pages to nav - generate-docs.py: extract_description_from_frontmatter() now handles multi-line YAML block scalars (|, >, indented continuation) — fixes 14 pages that had 56-65 char truncated descriptions - mkdocs.yml: add epic-design and research-summarizer to nav (orphan pages) - Regenerated 251 pages, rebuilt sitemap (278 URLs) - SEO audit: 0 broken links, 17→3 short descriptions, 278/278 pages have "Claude Code Skills" in <title> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(plugins): change author from string to object in plugin.json Claude Code plugin manifest requires author as {"name": "..."}, not a plain string. Fixes install error: "author: Invalid input: expected object, received string" Affected: agenthub, a11y-audit Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: correct broken install paths, improve skill descriptions, standardize counts Cherry-picked from PR #387 (ssmanji89) and rebased on dev. - Fix 6 wrong PM skill install paths in INSTALLATION.md - Fix content-creator → content-production script paths - Fix senior-devops CLI flags to match actual deployment_manager.py - Replace vague descriptions with trigger-oriented "Use when..." on 7 engineering skills - Standardize skill count 170 → 205+, finance 1 → 2, version 2.1.1 → 2.1.2 - Use python3 instead of python for macOS compatibility - Remove broken integrations/ link in README.md Excluded: *.zip gitignore wildcard (overrides intentional design decision) Co-Authored-By: sully <ssmanji89@gmail.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(seo): add Google Search Console verification file to docs The GSC verification HTML file existed locally but was never committed, so it was never deployed to GitHub Pages. This caused GSC to fail reading the sitemap for 3+ weeks ("Sitemap konnte nicht gelesen werden"). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: sync codex skills symlinks [automated] --------- Co-authored-by: Leo <leo@openclaw.ai> Co-authored-by: ivanopenclaw223-alt <ivanopenclaw223@gmail.com> Co-authored-by: ivanopenclaw223-alt <ivanopenclaw223-alt@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: sully <ssmanji89@gmail.com>
15 KiB
name, description
| name | description |
|---|---|
| rag-architect | Use when the user asks to design RAG pipelines, optimize retrieval strategies, choose embedding models, implement vector search, or build knowledge retrieval systems. |
RAG Architect - POWERFUL
Overview
The RAG (Retrieval-Augmented Generation) Architect skill provides comprehensive tools and knowledge for designing, implementing, and optimizing production-grade RAG pipelines. This skill covers the entire RAG ecosystem from document chunking strategies to evaluation frameworks, enabling you to build scalable, efficient, and accurate retrieval systems.
Core Competencies
1. Document Processing & Chunking Strategies
Fixed-Size Chunking
- Character-based chunking: Simple splitting by character count (e.g., 512, 1024, 2048 chars)
- Token-based chunking: Splitting by token count to respect model limits
- Overlap strategies: 10-20% overlap to maintain context continuity
- Pros: Predictable chunk sizes, simple implementation, consistent processing time
- Cons: May break semantic units, context boundaries ignored
- Best for: Uniform documents, when consistent chunk sizes are critical
Sentence-Based Chunking
- Sentence boundary detection: Using NLTK, spaCy, or regex patterns
- Sentence grouping: Combining sentences until size threshold is reached
- Paragraph preservation: Avoiding mid-paragraph splits when possible
- Pros: Preserves natural language boundaries, better readability
- Cons: Variable chunk sizes, potential for very short/long chunks
- Best for: Narrative text, articles, books
Paragraph-Based Chunking
- Paragraph detection: Double newlines, HTML tags, markdown formatting
- Hierarchical splitting: Respecting document structure (sections, subsections)
- Size balancing: Merging small paragraphs, splitting large ones
- Pros: Preserves logical document structure, maintains topic coherence
- Cons: Highly variable sizes, may create very large chunks
- Best for: Structured documents, technical documentation
Semantic Chunking
- Topic modeling: Using TF-IDF, embeddings similarity for topic detection
- Heading-aware splitting: Respecting document hierarchy (H1, H2, H3)
- Content-based boundaries: Detecting topic shifts using semantic similarity
- Pros: Maintains semantic coherence, respects document structure
- Cons: Complex implementation, computationally expensive
- Best for: Long-form content, technical manuals, research papers
Recursive Chunking
- Hierarchical approach: Try larger chunks first, recursively split if needed
- Multi-level splitting: Different strategies at different levels
- Size optimization: Minimize number of chunks while respecting size limits
- Pros: Optimal chunk utilization, preserves context when possible
- Cons: Complex logic, potential performance overhead
- Best for: Mixed content types, when chunk count optimization is important
Document-Aware Chunking
- File type detection: PDF pages, Word sections, HTML elements
- Metadata preservation: Headers, footers, page numbers, sections
- Table and image handling: Special processing for non-text elements
- Pros: Preserves document structure and metadata
- Cons: Format-specific implementation required
- Best for: Multi-format document collections, when metadata is important
2. Embedding Model Selection
Dimension Considerations
- 128-256 dimensions: Fast retrieval, lower memory usage, suitable for simple domains
- 512-768 dimensions: Balanced performance, good for most applications
- 1024-1536 dimensions: High quality, better for complex domains, higher cost
- 2048+ dimensions: Maximum quality, specialized use cases, significant resources
Speed vs Quality Tradeoffs
- Fast models: sentence-transformers/all-MiniLM-L6-v2 (384 dim, ~14k tokens/sec)
- Balanced models: sentence-transformers/all-mpnet-base-v2 (768 dim, ~2.8k tokens/sec)
- Quality models: text-embedding-ada-002 (1536 dim, OpenAI API)
- Specialized models: Domain-specific fine-tuned models
Model Categories
- General purpose: all-MiniLM, all-mpnet, Universal Sentence Encoder
- Code embeddings: CodeBERT, GraphCodeBERT, CodeT5
- Scientific text: SciBERT, BioBERT, ClinicalBERT
- Multilingual: LaBSE, multilingual-e5, paraphrase-multilingual
3. Vector Database Selection
Pinecone
- Managed service: Fully hosted, auto-scaling
- Features: Metadata filtering, hybrid search, real-time updates
- Pricing: $70/month for 1M vectors (1536 dim), pay-per-use scaling
- Best for: Production applications, when managed service is preferred
- Cons: Vendor lock-in, costs can scale quickly
Weaviate
- Open source: Self-hosted or cloud options available
- Features: GraphQL API, multi-modal search, automatic vectorization
- Scaling: Horizontal scaling, HNSW indexing
- Best for: Complex data types, when GraphQL API is preferred
- Cons: Learning curve, requires infrastructure management
Qdrant
- Rust-based: High performance, low memory footprint
- Features: Payload filtering, clustering, distributed deployment
- API: REST and gRPC interfaces
- Best for: High-performance requirements, resource-constrained environments
- Cons: Smaller community, fewer integrations
Chroma
- Embedded database: SQLite-based, easy local development
- Features: Collections, metadata filtering, persistence
- Scaling: Limited, suitable for prototyping and small deployments
- Best for: Development, testing, small-scale applications
- Cons: Not suitable for production scale
pgvector (PostgreSQL)
- SQL integration: Leverage existing PostgreSQL infrastructure
- Features: ACID compliance, joins with relational data, mature ecosystem
- Performance: ivfflat and HNSW indexing, parallel query processing
- Best for: When you already use PostgreSQL, need ACID compliance
- Cons: Requires PostgreSQL expertise, less specialized than purpose-built DBs
4. Retrieval Strategies
Dense Retrieval
- Semantic similarity: Using embedding cosine similarity
- Advantages: Captures semantic meaning, handles paraphrasing well
- Limitations: May miss exact keyword matches, requires good embeddings
- Implementation: Vector similarity search with k-NN or ANN algorithms
Sparse Retrieval
- Keyword-based: TF-IDF, BM25, Elasticsearch
- Advantages: Exact keyword matching, interpretable results
- Limitations: Misses semantic similarity, vulnerable to vocabulary mismatch
- Implementation: Inverted indexes, term frequency analysis
Hybrid Retrieval
- Combination approach: Dense + sparse retrieval with score fusion
- Fusion strategies: Reciprocal Rank Fusion (RRF), weighted combination
- Benefits: Combines semantic understanding with exact matching
- Complexity: Requires tuning fusion weights, more complex infrastructure
Reranking
- Two-stage approach: Initial retrieval followed by reranking
- Reranking models: Cross-encoders, specialized reranking transformers
- Benefits: Higher precision, can use more sophisticated models for final ranking
- Tradeoff: Additional latency, computational cost
5. Query Transformation Techniques
HyDE (Hypothetical Document Embeddings)
- Approach: Generate hypothetical answer, embed answer instead of query
- Benefits: Improves retrieval by matching document style rather than query style
- Implementation: Use LLM to generate hypothetical document, embed that
- Use cases: When queries and documents have different styles
Multi-Query Generation
- Approach: Generate multiple query variations, retrieve for each, merge results
- Benefits: Increases recall, handles query ambiguity
- Implementation: LLM generates 3-5 query variations, deduplicate results
- Considerations: Higher cost and latency due to multiple retrievals
Step-Back Prompting
- Approach: Generate broader, more general version of specific query
- Benefits: Retrieves more general context that helps answer specific questions
- Implementation: Transform "What is the capital of France?" to "What are European capitals?"
- Use cases: When specific questions need general context
6. Context Window Optimization
Dynamic Context Assembly
- Relevance-based ordering: Most relevant chunks first
- Diversity optimization: Avoid redundant information
- Token budget management: Fit within model context limits
- Hierarchical inclusion: Include summaries before detailed chunks
Context Compression
- Summarization: Compress less relevant chunks while preserving key information
- Key information extraction: Extract only relevant facts/entities
- Template-based compression: Use structured formats to reduce token usage
- Selective inclusion: Include only chunks above relevance threshold
7. Evaluation Frameworks
Faithfulness Metrics
- Definition: How well generated answers are grounded in retrieved context
- Measurement: Fact verification against source documents
- Implementation: NLI models to check entailment between answer and context
- Threshold: >90% for production systems
Relevance Metrics
- Context relevance: How relevant retrieved chunks are to the query
- Answer relevance: How well the answer addresses the original question
- Measurement: Embedding similarity, human evaluation, LLM-as-judge
- Targets: Context relevance >0.8, Answer relevance >0.85
Context Precision & Recall
- Precision@K: Percentage of top-K results that are relevant
- Recall@K: Percentage of relevant documents found in top-K results
- Mean Reciprocal Rank (MRR): Average of reciprocal ranks of first relevant result
- NDCG@K: Normalized Discounted Cumulative Gain at K
End-to-End Metrics
- RAGAS: Comprehensive RAG evaluation framework
- Correctness: Factual accuracy of generated answers
- Completeness: Coverage of all relevant aspects
- Consistency: Consistency across multiple runs with same query
8. Production Patterns
Caching Strategies
- Query-level caching: Cache results for identical queries
- Semantic caching: Cache for semantically similar queries
- Chunk-level caching: Cache embedding computations
- Multi-level caching: Redis for hot queries, disk for warm queries
Streaming Retrieval
- Progressive loading: Stream results as they become available
- Incremental generation: Generate answers while still retrieving
- Real-time updates: Handle document updates without full reprocessing
- Connection management: Handle client disconnections gracefully
Fallback Mechanisms
- Graceful degradation: Fallback to simpler retrieval if primary fails
- Cache fallbacks: Serve stale results when retrieval is unavailable
- Alternative sources: Multiple vector databases for redundancy
- Error handling: Comprehensive error recovery and user communication
9. Cost Optimization
Embedding Cost Management
- Batch processing: Batch documents for embedding to reduce API costs
- Caching strategies: Cache embeddings to avoid recomputation
- Model selection: Balance cost vs quality for embedding models
- Update optimization: Only re-embed changed documents
Vector Database Optimization
- Index optimization: Choose appropriate index types for use case
- Compression: Use quantization to reduce storage costs
- Tiered storage: Hot/warm/cold data strategies
- Resource scaling: Auto-scaling based on query patterns
Query Optimization
- Query routing: Route simple queries to cheaper methods
- Result caching: Avoid repeated expensive retrievals
- Batch querying: Process multiple queries together when possible
- Smart filtering: Use metadata filters to reduce search space
10. Guardrails & Safety
Content Filtering
- Toxicity detection: Filter harmful or inappropriate content
- PII detection: Identify and handle personally identifiable information
- Content validation: Ensure retrieved content meets quality standards
- Source verification: Validate document authenticity and reliability
Query Safety
- Injection prevention: Prevent malicious query injection attacks
- Rate limiting: Prevent abuse and ensure fair usage
- Query validation: Sanitize and validate user inputs
- Access controls: Ensure users can only access authorized content
Response Safety
- Hallucination detection: Identify when model generates unsupported claims
- Confidence scoring: Provide confidence levels for generated responses
- Source attribution: Always provide sources for factual claims
- Uncertainty handling: Gracefully handle cases where answer is uncertain
Implementation Best Practices
Development Workflow
- Requirements gathering: Understand use case, scale, and quality requirements
- Data analysis: Analyze document corpus characteristics
- Prototype development: Build minimal viable RAG pipeline
- Chunking optimization: Test different chunking strategies
- Retrieval tuning: Optimize retrieval parameters and thresholds
- Evaluation setup: Implement comprehensive evaluation metrics
- Production deployment: Scale-ready implementation with monitoring
Monitoring & Observability
- Query analytics: Track query patterns and performance
- Retrieval metrics: Monitor precision, recall, and latency
- Generation quality: Track faithfulness and relevance scores
- System health: Monitor database performance and availability
- Cost tracking: Monitor embedding and vector database costs
Maintenance & Updates
- Document refresh: Handle new documents and updates
- Index maintenance: Regular vector database optimization
- Model updates: Evaluate and migrate to improved models
- Performance tuning: Continuous optimization based on usage patterns
- Security updates: Regular security assessments and updates
Common Pitfalls & Solutions
Poor Chunking Strategy
- Problem: Chunks break mid-sentence or lose context
- Solution: Use boundary-aware chunking with overlap
Low Retrieval Precision
- Problem: Retrieved chunks are not relevant to query
- Solution: Improve embedding model, add reranking, tune similarity threshold
High Latency
- Problem: Slow retrieval and generation
- Solution: Optimize vector indexing, implement caching, use faster embedding models
Inconsistent Quality
- Problem: Variable answer quality across different queries
- Solution: Implement comprehensive evaluation, add quality scoring, improve fallbacks
Scalability Issues
- Problem: System doesn't scale with increased load
- Solution: Implement proper caching, database sharding, and auto-scaling
Conclusion
Building effective RAG systems requires careful consideration of each component in the pipeline. The key to success is understanding the tradeoffs between different approaches and choosing the right combination of techniques for your specific use case. Start with simple approaches and gradually add sophistication based on evaluation results and production requirements.
This skill provides the foundation for making informed decisions throughout the RAG development lifecycle, from initial design to production deployment and ongoing maintenance.