Commit Graph

9 Commits

Author SHA1 Message Date
yusyus
73adda0b17 docs: update all chunk flag names to match renamed CLI flags
Replace all occurrences of old ambiguous flag names with the new explicit ones:
  --chunk-size (tokens)  → --chunk-tokens
  --chunk-overlap        → --chunk-overlap-tokens
  --chunk                → --chunk-for-rag
  --streaming-chunk-size → --streaming-chunk-chars
  --streaming-overlap    → --streaming-overlap-chars
  --chunk-size (pages)   → --pdf-pages-per-chunk

Updated: CLI_REFERENCE (EN+ZH), user-guide (EN+ZH), integrations (Haystack,
Chroma, Weaviate, FAISS, Qdrant), features/PDF_CHUNKING, examples/haystack-pipeline,
strategy docs, archive docs, and CHANGELOG.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-24 22:15:14 +03:00
yusyus
53d37e61dd docs: Add 4 comprehensive vector database examples (Weaviate, Chroma, FAISS, Qdrant)
Created complete working examples for all 4 vector databases with RAG adaptors:

Weaviate Example:
- Comprehensive README with hybrid search guide
- 3 Python scripts (generate, upload, query)
- Sample outputs and query results
- Covers hybrid search, filtering, schema design

Chroma Example:
- Simple, local-first approach
- In-memory and persistent storage options
- Semantic search and metadata filtering
- Comparison with Weaviate

FAISS Example:
- Facebook AI Similarity Search integration
- OpenAI embeddings generation
- Index building and persistence
- Performance-focused for scale

Qdrant Example:
- Advanced filtering capabilities
- Production-ready features
- Complex query patterns
- Rust-based performance

Each example includes:
- Detailed README with setup and troubleshooting
- requirements.txt with dependencies
- 3 working Python scripts
- Sample outputs directory

Total files: 20 (4 examples × 5 files each)
Documentation: 4 comprehensive READMEs (~800 lines total)

Phase 2 of optional enhancements complete.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-07 22:38:15 +03:00
yusyus
bad84ceac2 feat: Add Cursor React example repo (Task 3.2)
Complete working example demonstrating Cursor + Skill Seekers workflow:

**Main Example (examples/cursor-react-skill/):**
- README.md (400+ lines) - Comprehensive guide with expected outputs
- generate_cursorrules.py - Automation script for complete workflow
- .cursorrules.example - Sample generated rules (React 18+ patterns)
- requirements.txt - Python dependencies

**Example Project (example-project/):**
- package.json - React 18 + TypeScript + Vite
- tsconfig.json - Strict TypeScript configuration
- src/App.tsx - Sample counter component
- src/index.tsx - React entry point
- README.md - Testing instructions

**Workflow Demonstrated:**
1. Scrape React docs → skill-seekers scrape
2. Package for Cursor → skill-seekers package --target claude
3. Extract and copy → unzip + cp to .cursorrules
4. Test in Cursor IDE with AI prompts

**Example Prompts Included:**
- useState hook patterns
- Data fetching with useEffect
- Custom hooks for validation
- TypeScript typing examples

Shows before/after comparison of AI suggestions with and without .cursorrules.

Updates: README.md + INTEGRATIONS.md (added Haystack to supported list)
2026-02-07 21:07:11 +03:00
yusyus
1c888e7817 feat: Add Haystack RAG framework adaptor (Task 2.2)
Implements complete Haystack 2.x integration for RAG pipelines:

**Haystack Adaptor (src/skill_seekers/cli/adaptors/haystack.py):**
- Document format: {content: str, meta: dict}
- JSON packaging for Haystack pipelines
- Compatible with InMemoryDocumentStore, BM25Retriever
- Registered in adaptor factory as 'haystack'

**Example Pipeline (examples/haystack-pipeline/):**
- README.md with comprehensive guide and troubleshooting
- quickstart.py demonstrating BM25 retrieval
- requirements.txt (haystack-ai>=2.0.0)
- Shows document loading, indexing, and querying

**Tests (tests/test_adaptors/test_haystack_adaptor.py):**
- 11 tests covering all adaptor functionality
- Format validation, packaging, upload messages
- Edge cases: empty dirs, references-only skills
- All 93 adaptor tests passing (100% suite pass rate)

**Features:**
- No upload endpoint (local use only like LangChain/LlamaIndex)
- No AI enhancement (enhance before packaging)
- Same packaging pattern as other RAG frameworks
- InMemoryDocumentStore + BM25Retriever example

Test: pytest tests/test_adaptors/test_haystack_adaptor.py -v
2026-02-07 21:01:49 +03:00
yusyus
bdd61687c5 feat: Complete Phase 1 - AI Coding Assistant Integrations (v2.10.0)
Add comprehensive integration guides for 4 AI coding assistants:

## New Integration Guides (98KB total)
- docs/integrations/WINDSURF.md (20KB) - Windsurf IDE with .windsurfrules
- docs/integrations/CLINE.md (25KB) - Cline VS Code extension with MCP
- docs/integrations/CONTINUE_DEV.md (28KB) - Continue.dev for any IDE
- docs/integrations/INTEGRATIONS.md (25KB) - Comprehensive hub with decision tree

## Working Examples (3 directories, 11 files)
- examples/windsurf-fastapi-context/ - FastAPI + Windsurf automation
- examples/cline-django-assistant/ - Django + Cline with MCP server
- examples/continue-dev-universal/ - HTTP context server for all IDEs

## README.md Updates
- Updated tagline: Universal preprocessor for 10+ AI systems
- Expanded Supported Integrations table (7 → 10 platforms)
- Added 'AI Coding Assistant Integrations' section (60+ lines)
- Cross-links to all new guides and examples

## Impact
- Week 2 of ACTION_PLAN.md: 4/4 tasks complete (100%) 
- Total new documentation: ~3,000 lines
- Total new code: ~1,000 lines (automation scripts, servers)
- Integration coverage: LangChain, LlamaIndex, Pinecone, Cursor, Windsurf,
  Cline, Continue.dev, Claude, Gemini, ChatGPT

## Key Features
- All guides follow proven 11-section pattern from CURSOR.md
- Real-world examples with automation scripts
- Multi-IDE consistency (Continue.dev works in VS Code, JetBrains, Vim)
- MCP integration for dynamic documentation access
- Complete troubleshooting sections with solutions

Positions Skill Seekers as universal preprocessor for ANY AI system.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-07 20:46:26 +03:00
yusyus
1552e1212d feat: Week 1 Complete - Universal RAG Preprocessor Foundation
Implements Week 1 of the 4-week strategic plan to position Skill Seekers
as universal infrastructure for AI systems. Adds RAG ecosystem integrations
(LangChain, LlamaIndex, Pinecone, Cursor) with comprehensive documentation.

## Technical Implementation (Tasks #1-2)

### New Platform Adaptors
- Add LangChain adaptor (langchain.py) - exports Document format
- Add LlamaIndex adaptor (llama_index.py) - exports TextNode format
- Implement platform adaptor pattern with clean abstractions
- Preserve all metadata (source, category, file, type)
- Generate stable unique IDs for LlamaIndex nodes

### CLI Integration
- Update main.py with --target argument
- Modify package_skill.py for new targets
- Register adaptors in factory pattern (__init__.py)

## Documentation (Tasks #3-7)

### Integration Guides Created (2,300+ lines)
- docs/integrations/LANGCHAIN.md (400+ lines)
  * Quick start, setup guide, advanced usage
  * Real-world examples, troubleshooting
- docs/integrations/LLAMA_INDEX.md (400+ lines)
  * VectorStoreIndex, query/chat engines
  * Advanced features, best practices
- docs/integrations/PINECONE.md (500+ lines)
  * Production deployment, hybrid search
  * Namespace management, cost optimization
- docs/integrations/CURSOR.md (400+ lines)
  * .cursorrules generation, multi-framework
  * Project-specific patterns
- docs/integrations/RAG_PIPELINES.md (600+ lines)
  * Complete RAG architecture
  * 5 pipeline patterns, 2 deployment examples
  * Performance benchmarks, 3 real-world use cases

### Working Examples (Tasks #3-5)
- examples/langchain-rag-pipeline/
  * Complete QA chain with Chroma vector store
  * Interactive query mode
- examples/llama-index-query-engine/
  * Query engine with chat memory
  * Source attribution
- examples/pinecone-upsert/
  * Batch upsert with progress tracking
  * Semantic search with filters

Each example includes:
- quickstart.py (production-ready code)
- README.md (usage instructions)
- requirements.txt (dependencies)

## Marketing & Positioning (Tasks #8-9)

### Blog Post
- docs/blog/UNIVERSAL_RAG_PREPROCESSOR.md (500+ lines)
  * Problem statement: 70% of RAG time = preprocessing
  * Solution: Skill Seekers as universal preprocessor
  * Architecture diagrams and data flow
  * Real-world impact: 3 case studies with ROI
  * Platform adaptor pattern explanation
  * Time/quality/cost comparisons
  * Getting started paths (quick/custom/full)
  * Integration code examples
  * Vision & roadmap (Weeks 2-4)

### README Updates
- New tagline: "Universal preprocessing layer for AI systems"
- Prominent "Universal RAG Preprocessor" hero section
- Integrations table with links to all guides
- RAG Quick Start (4-step getting started)
- Updated "Why Use This?" - RAG use cases first
- New "RAG Framework Integrations" section
- Version badge updated to v2.9.0-dev

## Key Features

 Platform-agnostic preprocessing
 99% faster than manual preprocessing (days → 15-45 min)
 Rich metadata for better retrieval accuracy
 Smart chunking preserves code blocks
 Multi-source combining (docs + GitHub + PDFs)
 Backward compatible (all existing features work)

## Impact

Before: Claude-only skill generator
After: Universal preprocessing layer for AI systems

Integrations:
- LangChain Documents 
- LlamaIndex TextNodes 
- Pinecone (ready for upsert) 
- Cursor IDE (.cursorrules) 
- Claude AI Skills (existing) 
- Gemini (existing) 
- OpenAI ChatGPT (existing) 

Documentation: 2,300+ lines
Examples: 3 complete projects
Time: 12 hours (50% faster than estimated 24-30h)

## Breaking Changes

None - fully backward compatible

## Testing

All existing tests pass
Ready for Week 2 implementation

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-05 23:32:58 +03:00
MiaoDX
bd974148a2 feat: Update MCP to use server_fastmcp with venv Python support
This PR improves MCP server configuration by updating all documentation
to use the current server_fastmcp module and ensuring setup scripts
automatically use virtual environment Python instead of system Python.

## Changes

### 1. Documentation Updates (server → server_fastmcp)

Updated all references from deprecated `server` module to `server_fastmcp`:

**User-facing documentation:**
- examples/http_transport_examples.sh: All 13 command examples
- README.md: Configuration examples and troubleshooting commands
- docs/guides/MCP_SETUP.md: Enhanced migration guide with stdio/HTTP examples
- docs/guides/TESTING_GUIDE.md: Test import statements
- docs/guides/MULTI_AGENT_SETUP.md: Updated examples
- docs/guides/SETUP_QUICK_REFERENCE.md: Updated paths
- CLAUDE.md: CLI command examples

**MCP module:**
- src/skill_seekers/mcp/README.md: Updated config examples
- src/skill_seekers/mcp/agent_detector.py: Use server_fastmcp module

Note: Historical release notes (CHANGELOG.md) preserved unchanged.

### 2. Venv Python Configuration

**setup_mcp.sh improvements:**
- Added automatic venv detection (checks .venv, venv, and $VIRTUAL_ENV)
- Sets PYTHON_CMD to venv Python path when available
- **CRITICAL FIX**: Now updates PYTHON_CMD after creating/activating venv
- Generates MCP configs with full venv Python path
- Falls back to system python3 if no venv found
- Displays detected Python version and path

**Config examples updated:**
- .claude/mcp_config.example.json: Use venv Python path
- example-mcp-config.json: Use venv Python path
- Added "type": "stdio" for clarity
- Updated to use server_fastmcp module

### 3. Bug Fix: PYTHON_CMD Not Updated After Venv Creation

Previously, when setup_mcp.sh created or activated a venv, it failed to
update PYTHON_CMD, causing generated configs to still use system python3.

**Fixed cases:**
- When $VIRTUAL_ENV is already set → Update PYTHON_CMD to venv Python
- When existing venv is activated → Set PYTHON_CMD="$REPO_PATH/venv/bin/python3"
- When new venv is created → Set PYTHON_CMD="$REPO_PATH/venv/bin/python3"

## Benefits

### For Users:
 No deprecation warnings - All docs show current module
 Proper Python environment - MCP uses venv with all dependencies
 No system Python issues - Avoids "module not found" errors
 No global installation needed - No --break-system-packages required
 Automatic detection - setup_mcp.sh finds venv automatically
 Clean isolation - Projects don't interfere with system Python

### For Maintainers:
 Prepared for v3.0.0 - Documentation ready for server.py removal
 Reduced support burden - Fewer MCP configuration issues
 Consistent examples - All docs use same module/pattern

## Testing

**Verified:**
-  All command examples use server_fastmcp
-  No deprecated module references in user-facing docs (0 results)
-  New module correctly referenced (129 instances)
-  setup_mcp.sh detects venv and generates correct config
-  PYTHON_CMD properly updated after venv creation
-  MCP server starts correctly with venv Python

**Files changed:** 12 files (+262/-107 lines)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-18 15:55:46 +08:00
Pablo Estevez
5ed767ff9a run ruff 2026-01-17 17:29:21 +00:00
yusyus
9e41094436 feat: v2.4.0 - MCP 2025 upgrade with multi-agent support (#217)
* feat: v2.4.0 - MCP 2025 upgrade with multi-agent support

Major MCP infrastructure upgrade to 2025 specification with HTTP + stdio
transport and automatic configuration for 5+ AI coding agents.

### 🚀 What's New

**MCP 2025 Specification (SDK v1.25.0)**
- FastMCP framework integration (68% code reduction)
- HTTP + stdio dual transport support
- Multi-agent auto-configuration
- 17 MCP tools (up from 9)
- Improved performance and reliability

**Multi-Agent Support**
- Auto-detects 5 AI coding agents (Claude Code, Cursor, Windsurf, VS Code, IntelliJ)
- Generates correct config for each agent (stdio vs HTTP)
- One-command setup via ./setup_mcp.sh
- HTTP server for concurrent multi-client support

**Architecture Improvements**
- Modular tool organization (tools/ package)
- Graceful degradation for testing
- Backward compatibility maintained
- Comprehensive test coverage (606 tests passing)

### 📦 Changed Files

**Core MCP Server:**
- src/skill_seekers/mcp/server_fastmcp.py (NEW - 300 lines, FastMCP-based)
- src/skill_seekers/mcp/server.py (UPDATED - compatibility shim)
- src/skill_seekers/mcp/agent_detector.py (NEW - multi-agent detection)

**Tool Modules:**
- src/skill_seekers/mcp/tools/config_tools.py (NEW)
- src/skill_seekers/mcp/tools/scraping_tools.py (NEW)
- src/skill_seekers/mcp/tools/packaging_tools.py (NEW)
- src/skill_seekers/mcp/tools/splitting_tools.py (NEW)
- src/skill_seekers/mcp/tools/source_tools.py (NEW)

**Version Updates:**
- pyproject.toml: 2.3.0 → 2.4.0
- src/skill_seekers/cli/main.py: version string updated
- src/skill_seekers/mcp/__init__.py: 2.0.0 → 2.4.0

**Documentation:**
- README.md: Added multi-agent support section
- docs/MCP_SETUP.md: Complete rewrite for MCP 2025
- docs/HTTP_TRANSPORT.md (NEW)
- docs/MULTI_AGENT_SETUP.md (NEW)
- CHANGELOG.md: v2.4.0 entry with migration guide

**Tests:**
- tests/test_mcp_fastmcp.py (NEW - 57 tests)
- tests/test_server_fastmcp_http.py (NEW - HTTP transport tests)
- All existing tests updated and passing (606/606)

###  Test Results

**E2E Testing:**
- Fresh venv installation: 
- stdio transport: 
- HTTP transport:  (health check, SSE endpoint)
- Agent detection:  (found Claude Code)
- Full test suite:  606 passed, 152 skipped

**Test Coverage:**
- Core functionality: 100% passing
- Backward compatibility: Verified
- No breaking changes: Confirmed

### 🔄 Migration Path

**Existing Users:**
- Old `python -m skill_seekers.mcp.server` still works
- Existing configs unchanged
- All tools function identically
- Deprecation warnings added (removal in v3.0.0)

**New Users:**
- Use `./setup_mcp.sh` for auto-configuration
- Or manually use `python -m skill_seekers.mcp.server_fastmcp`
- HTTP mode: `--http --port 8000`

### 📊 Metrics

- Lines of code: 2200 → 300 (87% reduction in server.py)
- Tools: 9 → 17 (88% increase)
- Agents supported: 1 → 5 (400% increase)
- Tests: 427 → 606 (42% increase)
- All tests passing: 

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* fix: Add backward compatibility exports to server.py for tests

Re-export tool functions from server.py to maintain backward compatibility
with test_mcp_server.py which imports from the legacy server module.

This fixes CI test failures where tests expected functions like list_tools()
and generate_config_tool() to be importable from skill_seekers.mcp.server.

All tool functions are now re-exported for compatibility while maintaining
the deprecation warning for direct server execution.

* fix: Export run_subprocess_with_streaming and fix tool schemas for backward compatibility

- Add run_subprocess_with_streaming export from scraping_tools
- Fix tool schemas to include properties field (required by tests)
- Resolves 9 failing tests in test_mcp_server.py

* fix: Add call_tool router and fix test patches for modular architecture

- Add call_tool function to server.py for backward compatibility
- Fix test patches to use correct module paths (scraping_tools instead of server)
- Update 7 test decorators to patch the correct function locations
- Resolves remaining CI test failures

---------

Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-26 00:45:48 +03:00