Commit Graph

3 Commits

Author SHA1 Message Date
yusyus
459c6cfd5b fix: Add YAML frontmatter to unified, GitHub, and PDF skill builders
**Problem:** (PR #170 verified)
Three skill builders were generating SKILL.md files without YAML
frontmatter, making skills invisible to Claude after upload:
- unified_skill_builder.py
- github_scraper.py
- pdf_scraper.py

Only doc_scraper.py had frontmatter implemented.

**Root Cause:**
Claude requires YAML frontmatter with 'name' and 'description' fields
to recognize and index skills. Without it, uploaded skills don't appear
in skill lists and can't be triggered.

**Fix:**
Added consistent frontmatter generation to all three builders:
- Normalizes skill name (lowercase, hyphens, max 64 chars)
- Truncates description to 1024 chars (Claude requirement)
- Generates YAML frontmatter with proper formatting

**Test Results:**
 All 390/390 tests passing (0 failures, 0 skipped)
 Consistent implementation across all builders
 Meets Claude's official skill specification

**Example Output:**
```yaml
---
name: my-skill-name
description: Skill description here
---

# My Skill Name
...
```

**Credits:**
Original fix by @AbdelrahmanHafez in PR #170
Rebased to current development by Claude Code

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: AbdelrahmanHafez <AbdelrahmanHafez@users.noreply.github.com>
2025-11-06 23:56:31 +03:00
yusyus
7cc3d8b175 Fix all tests: 297/297 passing, 0 skipped, 0 failed
CHANGES:

1. **Fixed 9 PDF Scraper Test Failures:**
   - Added .get() safety for missing page keys (headings, text, code_blocks, images)
   - Supported both 'code_samples' and 'code_blocks' keys for compatibility
   - Fixed extract_pdf() to raise RuntimeError on failure (tests expect exception)
   - Added image saving functionality to _generate_reference_file()
   - Updated all test methods to override skill_dir with temp directory
   - Fixed categorization to handle pre-categorized test data

2. **Fixed 25 MCP Test Skips:**
   - Renamed mcp/ directory to skill_seeker_mcp/ to avoid shadowing external mcp package
   - Updated all imports in tests/test_mcp_server.py
   - Simplified skill_seeker_mcp/server.py import logic (no more shadowing workarounds)
   - Updated tests/test_package_structure.py to reference skill_seeker_mcp

3. **Test Results:**
   -  297 tests passing (100%)
   -  0 tests skipped
   -  0 tests failed
   - All test categories passing:
     * 23 package structure tests
     * 18 PDF scraper tests
     * 67 PDF extractor/advanced tests
     * 25 MCP server tests
     * 164 other core tests

BREAKING CHANGE: MCP server directory renamed from `mcp/` to `skill_seeker_mcp/`

📦 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-26 00:51:18 +03:00
yusyus
6936057820 Add PDF documentation support (Tasks B1.1-B1.8)
Complete PDF extraction and skill conversion functionality:
- pdf_extractor_poc.py (1,004 lines): Extract text, code, images from PDFs
- pdf_scraper.py (353 lines): Convert PDFs to Claude skills
- MCP tool scrape_pdf: PDF scraping via Claude Code
- 7 comprehensive documentation guides (4,705 lines)
- Example PDF config format (configs/example_pdf.json)

Features:
- 3 code detection methods (font, indent, pattern)
- 19+ programming languages detected with confidence scoring
- Syntax validation and quality scoring (0-10 scale)
- Image extraction with size filtering (--extract-images)
- Chapter/section detection and page chunking
- Quality-filtered code examples (--min-quality)
- Three usage modes: config file, direct PDF, from extracted JSON

Technical:
- PyMuPDF (fitz) as primary library (60x faster than alternatives)
- Language detection with confidence scoring
- Code block merging across pages
- Comprehensive metadata and statistics
- Compatible with existing Skill Seeker workflow

MCP Integration:
- New scrape_pdf tool (10th MCP tool total)
- Supports all three usage modes
- 10-minute timeout for large PDFs
- Real-time streaming output

Documentation (4,705 lines):
- B1_COMPLETE_SUMMARY.md: Overview of all 8 tasks
- PDF_PARSING_RESEARCH.md: Library comparison and benchmarks
- PDF_EXTRACTOR_POC.md: POC documentation
- PDF_CHUNKING.md: Page chunking guide
- PDF_SYNTAX_DETECTION.md: Syntax detection guide
- PDF_IMAGE_EXTRACTION.md: Image extraction guide
- PDF_SCRAPER.md: PDF scraper usage guide
- PDF_MCP_TOOL.md: MCP integration guide

Tasks completed: B1.1-B1.8
Addresses Issue #27
See docs/B1_COMPLETE_SUMMARY.md for complete details

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-23 00:23:16 +03:00