Commit Graph

85 Commits

Author SHA1 Message Date
Edgar I.
4e871588ae feat: add get_proper_filename() for .txt to .md conversion 2025-10-24 18:27:17 +04:00
Edgar I.
e123de9055 feat: add detect_all() for multi-variant detection 2025-10-24 18:27:17 +04:00
Edgar I.
38ebc66749 docs: add Phase 1 implementation plan for active skills 2025-10-24 18:27:17 +04:00
Edgar I.
38aa2cecec docs: add active skills design for demand-driven documentation 2025-10-24 18:27:17 +04:00
Edgar I.
812c0992b3 docs: add comprehensive llms.txt feature documentation 2025-10-24 18:27:17 +04:00
Edgar I.
697b42e9eb docs: update MCP tool description for llms.txt 2025-10-24 18:27:17 +04:00
Edgar I.
41d1846278 test: add e2e test for llms.txt workflow 2025-10-24 18:27:17 +04:00
Edgar I.
104818f983 feat: enable llms.txt for hono config 2025-10-24 18:27:17 +04:00
Edgar I.
99a40d3a1b feat: support explicit llms_txt_url in config 2025-10-24 18:27:17 +04:00
Edgar I.
0b6c2ed593 docs: add llms.txt support documentation 2025-10-24 18:27:17 +04:00
Edgar I.
12424e390c feat: integrate llms.txt detection into scraping workflow 2025-10-24 18:26:10 +04:00
Edgar I.
e88a4b0fcc fix: add retries, markdown validation, and test mocking to downloader
- Implement retry logic with exponential backoff (default: 3 retries)
- Add markdown validation to check for markdown patterns
- Replace flaky HTTP tests with comprehensive mocking
- Add 10 test cases covering all scenarios:
  - Successful download
  - Timeout with retry
  - Empty content rejection (<100 chars)
  - Non-markdown rejection
  - HTTP error handling
  - Exponential backoff validation
  - Markdown pattern detection
  - Custom timeout parameter
  - Custom max_retries parameter
  - User agent header verification

All tests now pass reliably (10/10) without making real HTTP requests.
2025-10-24 18:26:10 +04:00
Edgar I.
3dd928b34b feat: add llms.txt downloader with error handling 2025-10-24 18:26:10 +04:00
Edgar I.
a18ea8cf68 feat: add llms.txt markdown parser 2025-10-24 18:26:10 +04:00
Edgar I.
60fefb6c0b fix: improve URL parsing and add test mocking for llms.txt detector 2025-10-24 18:26:10 +04:00
Edgar I.
8f44193b61 feat: add llms.txt detection module 2025-10-24 18:26:10 +04:00
yusyus
691318117c Reorganize Key Features section with clear categories 2025-10-23 22:02:39 +03:00
yusyus
d309e1cfe7 Fix formatting in Key Features section
Add blank line after PDF Documentation Support section for better readability

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-23 21:57:56 +03:00
yusyus
a612096fd3 Merge development into main (v1.2.0 release)
Release v1.2.0 - PDF Advanced Features

This release includes:
- v1.1.0: Documentation Scraping Enhancements (unlimited scraping, parallel mode)
- v1.2.0: PDF Advanced Features (OCR, passwords, tables, 3x faster)

Priority 2 Features:
- OCR support for scanned PDFs
- Password-protected PDF support
- Complex table extraction

Priority 3 Features:
- Parallel page processing (3x faster)
- Intelligent caching (50% faster re-runs)

Testing: 142/142 tests passing (100%)

See CHANGELOG.md for full details.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-23 21:46:52 +03:00
yusyus
7c853e5e9c Merge feature/pdf-support-clean into development
Adds PDF Advanced Features (v1.2.0)

This merge brings Priority 2 & 3 PDF features:
- OCR support for scanned PDFs
- Password-protected PDF support
- Complex table extraction
- Parallel page processing (3x faster)
- Intelligent caching (50% faster re-runs)

All 142 tests passing (100%)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-23 21:44:15 +03:00
yusyus
394eab218e Add PDF Advanced Features (v1.2.0)
Priority 2 & 3 Features Implemented:
- OCR support for scanned PDFs (pytesseract + Pillow)
- Password-protected PDF support
- Complex table extraction
- Parallel page processing (3x faster)
- Intelligent caching (50% faster re-runs)

Testing:
- New test file: test_pdf_advanced_features.py (26 tests)
- Updated test_pdf_extractor.py (23 tests)
- Updated test_pdf_scraper.py (18 tests)
- Total: 49/49 PDF tests passing (100%)
- Overall: 142/142 tests passing (100%)

Documentation:
- Added docs/PDF_ADVANCED_FEATURES.md (580 lines)
- Updated CHANGELOG.md with v1.1.0 and v1.2.0
- Updated README.md version badges and features
- Updated docs/TESTING.md with new test counts

Dependencies:
- Added Pillow==11.0.0
- Added pytesseract==0.3.13

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-23 21:43:05 +03:00
yusyus
8ebd736055 Update documentation to include PDF support
- Add PDF support to README.md Key Features
- Add PDF CLI example (Option 3)
- Update MCP README from 9 to 10 tools
- Add scrape_pdf tool documentation
- Add PDF workflow example
- Update tool descriptions

All main documentation now reflects PDF functionality
2025-10-23 00:33:44 +03:00
yusyus
6936057820 Add PDF documentation support (Tasks B1.1-B1.8)
Complete PDF extraction and skill conversion functionality:
- pdf_extractor_poc.py (1,004 lines): Extract text, code, images from PDFs
- pdf_scraper.py (353 lines): Convert PDFs to Claude skills
- MCP tool scrape_pdf: PDF scraping via Claude Code
- 7 comprehensive documentation guides (4,705 lines)
- Example PDF config format (configs/example_pdf.json)

Features:
- 3 code detection methods (font, indent, pattern)
- 19+ programming languages detected with confidence scoring
- Syntax validation and quality scoring (0-10 scale)
- Image extraction with size filtering (--extract-images)
- Chapter/section detection and page chunking
- Quality-filtered code examples (--min-quality)
- Three usage modes: config file, direct PDF, from extracted JSON

Technical:
- PyMuPDF (fitz) as primary library (60x faster than alternatives)
- Language detection with confidence scoring
- Code block merging across pages
- Comprehensive metadata and statistics
- Compatible with existing Skill Seeker workflow

MCP Integration:
- New scrape_pdf tool (10th MCP tool total)
- Supports all three usage modes
- 10-minute timeout for large PDFs
- Real-time streaming output

Documentation (4,705 lines):
- B1_COMPLETE_SUMMARY.md: Overview of all 8 tasks
- PDF_PARSING_RESEARCH.md: Library comparison and benchmarks
- PDF_EXTRACTOR_POC.md: POC documentation
- PDF_CHUNKING.md: Page chunking guide
- PDF_SYNTAX_DETECTION.md: Syntax detection guide
- PDF_IMAGE_EXTRACTION.md: Image extraction guide
- PDF_SCRAPER.md: PDF scraper usage guide
- PDF_MCP_TOOL.md: MCP integration guide

Tasks completed: B1.1-B1.8
Addresses Issue #27
See docs/B1_COMPLETE_SUMMARY.md for complete details

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-23 00:23:16 +03:00
yusyus
05dc5c1cf6 Update GitHub Actions to use development branch
Changed:
- tests.yml: Run on 'development' instead of 'dev'
- Triggers on push to: main, development
- Triggers on PRs to: main, development

This ensures:
 All PRs to development run tests
 Pushes to development run tests
 Branch protection can require 'Tests' check
 CI works with new two-branch workflow

Related: Two-branch workflow setup

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-22 23:35:47 +03:00
yusyus
15fffd236b Establish two-branch workflow: main + development
Changes:
1. Created 'development' branch as integration branch
2. Set 'development' as default branch for all PRs
3. Protected both branches with appropriate rules

Branch Protection:
- main: Requires tests + 1 review, only maintainer merges
- development: Requires tests, open for all contributor PRs

Updated CONTRIBUTING.md:
- Added comprehensive Branch Workflow section
- Updated all examples to use 'development' branch
- Clear visual diagram of branch structure
- Step-by-step workflow example

Workflow:
- Contributors: Create feature branches from 'development'
- PRs: Always target 'development' (not main)
- Releases: Maintainer merges 'development' → 'main'

This ensures:
 main always stable and production-ready
 development integrates all ongoing work
 Clear separation between integration and production
 Only maintainer controls production releases

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-22 23:30:45 +03:00
yusyus
8f062bb96c Fix GitHub Actions release workflow permissions
Problem:
- Release workflow failing with "Resource not accessible by integration"
- Missing permissions for GITHUB_TOKEN to create releases
- Workflow tried to create releases that already exist manually

Fix:
1. Added `permissions: contents: write` at workflow level
   - Grants GITHUB_TOKEN permission to create/edit releases
   - Required for softprops/action-gh-release@v1

2. Added release existence check before creation
   - Prevents errors when release already exists
   - Skips creation gracefully with informative message
   - Useful for manually created releases (like v1.1.0)

Changes:
- Line 8-9: Added permissions section
- Line 48-57: Check if release exists with gh CLI
- Line 59-60: Only create if release doesn't exist
- Line 69-73: Skip message when release already exists

This allows:
- Automatic release creation on new tags
- Manual release creation without workflow conflicts
- Proper error handling and user feedback

Related: GitHub Actions permissions model
https://docs.github.com/en/actions/security-guides/automatic-token-authentication

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-22 23:13:55 +03:00
yusyus
0c5515129b Fix flaky upload_skill tests by restoring cwd in parallel scraping tests
Problem:
- 2 tests in test_upload_skill.py failing intermittently in CI
- Tests passed individually but failed when run after test_parallel_scraping.py
- Tests failed with exit code 2 instead of 0 when running `--help`

Root Cause:
- test_parallel_scraping.py calls `os.chdir(tmpdir)` to create temporary test directories
- These directory changes persisted across test classes
- When upload_skill CLI tests ran subprocess with path 'cli/upload_skill.py',
  the relative path was broken because cwd was still in the temp directory
- Result: subprocess couldn't find the script, returned exit code 2

Fix:
- Added setUp/tearDown to all 6 test classes in test_parallel_scraping.py
- setUp saves original cwd with `self.original_cwd = os.getcwd()`
- tearDown restores it with `os.chdir(self.original_cwd)`
- Ensures tests don't pollute working directory state for subsequent tests

Impact:
- All 158 tests now pass consistently
- No more flaky failures in CI
- Test isolation properly maintained

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-22 22:53:49 +03:00
IbrahimAlbyrk-luduArts
7e94c276be Add unlimited scraping, parallel mode, and rate limit control (#144)
Add three major features for improved performance and flexibility:

1. **Unlimited Scraping Mode**
   - Support max_pages: null or -1 for complete documentation coverage
   - Added unlimited parameter to MCP tools
   - Warning messages for unlimited mode

2. **Parallel Scraping (1-10 workers)**
   - ThreadPoolExecutor for concurrent requests
   - Thread-safe with proper locking
   - 20x performance improvement (10K pages: 83min → 4min)
   - Workers parameter in config

3. **Configurable Rate Limiting**
   - CLI overrides for rate_limit
   - --no-rate-limit flag for maximum speed
   - Per-worker rate limiting semantics

4. **MCP Streaming & Timeouts**
   - Non-blocking subprocess with real-time output
   - Intelligent timeouts per operation type
   - Prevents frozen/hanging behavior

**Thread-Safety Fixes:**
- Fixed race condition on visited_urls.add()
- Protected pages_scraped counter with lock
- Added explicit exception checking for workers
- All shared state operations properly synchronized

**Test Coverage:**
- Added 17 comprehensive tests for new features
- All 117 tests passing
- Thread safety validated

**Performance:**
- 1000 pages: 8.3min → 0.4min (20x faster)
- 10000 pages: 83min → 4min (20x faster)
- Maintains backward compatibility (default: 0.5s, 1 worker)

**Commits:**
- 309bf71: feat: Add unlimited scraping mode support
- 3ebc2d7: fix(mcp): Add timeout and streaming output
- 5d16fdc: feat: Add configurable rate limiting and parallel scraping
- ae7883d: Fix MCP server tests for streaming subprocess
- e5713dd: Fix critical thread-safety issues in parallel scraping
- 303efaf: Add comprehensive tests for parallel scraping features

Co-authored-by: IbrahimAlbyrk-luduArts <ialbayrak@luduarts.com>
Co-authored-by: Claude <noreply@anthropic.com>
2025-10-22 22:46:02 +03:00
yusyus
13fcce1f4e Add comprehensive test coverage for CLI utilities
Expand test suite from 118 to 166 tests (+48 new tests) with focus on
untested CLI tools and utility functions. Overall coverage increased
from 14% to 25%.

New test files:
- tests/test_utilities.py (42 tests) - API keys, file validation, formatting
- tests/test_package_skill.py (11 tests) - Skill packaging workflow
- tests/test_estimate_pages.py (8 tests) - Page estimation functionality
- tests/test_upload_skill.py (7 tests) - Skill upload validation

Coverage improvements by module:
- cli/utils.py: 0% → 72% (+72%)
- cli/upload_skill.py: 0% → 53% (+53%)
- cli/estimate_pages.py: 0% → 47% (+47%)
- cli/package_skill.py: 0% → 43% (+43%)

All 166 tests passing. Added pytest-cov for coverage reporting.
Updated requirements.txt with all dependencies including MCP packages.

Test execution: 9.6s for complete suite

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-22 22:08:02 +03:00
Preston Brown
de5344caf9 Add virtual environment setup and minimal dependencies (#149)
## Changes
- Add virtual environment setup instructions to all docs
- Create requirements.txt with minimal dependencies (13 packages)
- Make anthropic optional (only needed for API enhancement)
- Clarify path notation (~ = $HOME, /Users/yourname examples)
- Add venv activation reminders throughout documentation

## Files Changed
- README.md: Added venv setup section to CLI method
- BULLETPROOF_QUICKSTART.md: Replaced Step 4 with venv setup
- CLAUDE.md: Updated Prerequisites with venv instructions
- requirements.txt: Created with minimal deps (requests, beautifulsoup4, pytest)

## Why
- Prevents package conflicts and permission issues
- Standard Python development practice
- Enables proper pytest usage without pipx complications
- Makes setup clearer for beginners
2025-10-22 21:54:05 +03:00
yusyus
ff148cf98f Update documentation for new Ansible config
Added ansible-core.json config to available presets list in:
- README.md: Added to preset table and usage examples
- CLAUDE.md: Added to production configs list with details

Changes:
- Total configs: 11 → 12
- New category: DevOps & Automation
- Reorganized config list for better categorization

Related: PR #147
2025-10-22 21:51:45 +03:00
Schuyler Erle
183c7596a5 Add config for Ansible core documentation (#147)
Co-authored-by: Schuyler Erle <schuyler@ardc.net>
2025-10-22 21:50:59 +03:00
yusyus
c03186574d Add comprehensive CLI path tests and fix remaining issues
Added 18 new tests covering all aspects of CLI path corrections:
- Docstring/usage examples (5 tests)
- Print statements (3 tests)
- Subprocess calls (1 test)
- Documentation files (3 tests)
- Help output functionality (2 tests)
- Script executability (4 tests)

All tests verify that:
1. Scripts can be executed with cli/ prefix
2. Usage examples show correct paths
3. Print statements guide users correctly
4. No old hardcoded paths remain
5. Documentation is consistent

Fixed additional issues found by tests:
- cli/enhance_skill.py: Fixed 4 more occurrences in docstring and error message
- cli/package_skill.py: Fixed 1 occurrence in help epilog

Test Results:
- Total tests: 118 (100 existing + 18 new)
- All tests passing: 100%
- Coverage: CLI paths, scraper features, config validation, integration, MCP server

Related: PR #145
2025-10-22 21:45:51 +03:00
yusyus
581dbc792d Fix CLI path references in Python code
All Python scripts now use correct cli/ prefix in:
- Usage docstrings (shown in --help)
- Print statements (shown to users)
- Subprocess calls (when calling other scripts)

Changes:
- cli/doc_scraper.py: Fixed 9 references (usage, print, subprocess)
- cli/enhance_skill_local.py: Fixed 6 references (usage, print)
- cli/enhance_skill.py: Fixed 5 references (usage, print)
- cli/package_skill.py: Fixed 4 references (usage, epilog)
- cli/estimate_pages.py: Fixed 3 references (epilog examples)

All commands now correctly show:
- python3 cli/doc_scraper.py (not python3 doc_scraper.py)
- python3 cli/enhance_skill.py (not python3 enhance_skill.py)
- python3 cli/enhance_skill_local.py (not python3 enhance_skill_local.py)
- python3 cli/package_skill.py (not python3 package_skill.py)
- python3 cli/estimate_pages.py (not python3 estimate_pages.py)

Also fixed:
- Old hardcoded path in enhance_skill_local.py:221
  (was: /mnt/skills/examples/skill-creator/scripts/package_skill.py)
  (now: cli/package_skill.py)
- Old hardcoded path in enhance_skill.py:210
  (was: /mnt/skills/examples/skill-creator/scripts/package_skill.py)
  (now: cli/package_skill.py)

This ensures all user-facing messages and subprocess calls use the
correct paths when run from the repository root.

Related: PR #145
2025-10-22 21:38:56 +03:00
yusyus
66719cd53a Fix CLI path references in documentation
Following PR #145 which fixed README.md, this commit corrects all
remaining documentation files to use the correct cli/ directory prefix
for Python scripts.

Changes:
- QUICKSTART.md: Fixed 21 occurrences (doc_scraper.py, enhance_skill_local.py, package_skill.py)
- docs/UPLOAD_GUIDE.md: Fixed 10 occurrences (doc_scraper.py, enhance_skill_local.py, package_skill.py)
- docs/ENHANCEMENT.md: Fixed 9 occurrences (doc_scraper.py, enhance_skill.py, enhance_skill_local.py)

All commands now correctly reference:
- python3 cli/doc_scraper.py (not python3 doc_scraper.py)
- python3 cli/enhance_skill.py (not python3 enhance_skill.py)
- python3 cli/enhance_skill_local.py (not python3 enhance_skill_local.py)
- python3 cli/package_skill.py (not python3 package_skill.py)
- python3 cli/estimate_pages.py (not python3 estimate_pages.py)

This ensures all documentation examples work correctly when run from
the repository root directory.

Related: PR #145
2025-10-22 21:33:47 +03:00
Adam Creeger
9fcfc139bc Update README to use cli directory for all CLI examples (#145) 2025-10-22 21:30:45 +03:00
yusyus
e5f4d100b0 Merge pull request #143 from schuyler/main
Add config for Claude Code documentation
2025-10-22 21:22:55 +03:00
Schuyler Erle
ab585584d0 Add config for Claude Code documentation 2025-10-20 21:27:19 -07:00
yusyus
013523c81d Close Issues #117 and #125 - Tasks already complete
Discovered 2 tasks were already done:

Issue #117 (H1.4) - Answer Issue #3: Pro plan compatibility
===========================================================
 Status: ALREADY COMPLETE

What it was:
- Answer user question about Pro plan compatibility

Why it's done:
- Issue #3 already answered comprehensively
- User question: "Will this work with pro plan?"
- Answer given: Works with any plan, no API key needed
- Issue #3 already closed by owner

Time: 0 hours (already done)

Issue #125 (I2.1) - Write troubleshooting guide
===============================================
 Status: ALREADY COMPLETE

What it was:
- Write comprehensive troubleshooting guide
- Document common issues and solutions

Why it's done:
- TROUBLESHOOTING.md created during H1.1 (Issue #8)
- 447 lines of comprehensive troubleshooting
- Covers: installation, runtime, MCP, scraping, platform-specific
- Already committed in 9028974

Time: 1.5 hours (done as part of H1.1)

Updated Documentation:
=====================

TODO.md:
- Added H1.4 and I2.1 to completed tasks
- Updated Category H summary (3/5 done)
- Added to Progress Tracking section

NEXT_TASKS.md:
- Marked H1.4 as DONE (Issue #3 already answered)
- Marked I2.1 as DONE (TROUBLESHOOTING.md created)
- Updated sprint progress: 6/12 tasks (50%)
- Added H1.5 to starter pack
- Updated results summary

Impact:
=======
- H1 Group: 4/5 tasks complete (80%)
- I2 Group: 1/5 tasks complete (20%)
- Week Progress: 6/12 tasks (50%)
- Only H1.3 and H1.5 remain in H1

Next Priority: H1.3 - Create example project folder (2-3 hours)

Files modified: TODO.md, NEXT_TASKS.md
Issues closed: #117, #125
2025-10-21 00:56:52 +03:00
yusyus
831ea67d58 Update task tracking and CLAUDE.md with latest progress
Documentation Updates:
======================

TODO.md:
--------
 Added "Completed This Week" section:
   - H1.1: Issue #8 fixed (bulletproof docs + MCP setup)
   - H1.2: Issue #7 fixed (11/11 configs working)
   - H1.4: Issue #4 linked to roadmap
   - PR #5: Reviewed and approved

 Updated "Immediate Tasks" list:
   - Removed completed tasks
   - Added H1.3 (example project) as next priority

 Updated Progress Tracking:
   - 10 items completed this week
   - Clear visibility of accomplishments
   - Next steps clearly defined

NEXT_TASKS.md:
--------------
 Marked completed tasks in Starter Pack:
   - H1.1 (Issue #8) - DONE
   - H1.2 (Issue #7) - DONE
   - H1.4 (Issue #4) - DONE
   - PR #5 Review - DONE

 Updated Current Sprint (Oct 20-27):
   - Monday/Tuesday: 4/4 tasks completed 
   - Wednesday/Thursday: 3 tasks remaining
   - Progress: 4/10 tasks (40%)

 Added specific accomplishments:
   - Community engaged (3 issues)
   - All configs fixed (11/11)
   - PR security verified
   - Bulletproof documentation

CLAUDE.md:
----------
 Added "Current Status" section at top:
   - Version: v1.0.0
   - Recent updates this week
   - Community response wins
   - Next priorities

 Added configs status:
   - 11/11 verified working (100%)
   - New Laravel config
   - All selectors tested

 Added roadmap reference:
   - 134 tasks in 22 groups
   - Project board link
   - Clear next steps

 Added Laravel to Quick Start examples

 Added "Available Production Configs" section:
   - All 11 configs listed with selectors
   - Content extraction stats
   - Organized by category
   - Verification date

 Updated Additional Documentation:
   - Added BULLETPROOF_QUICKSTART.md
   - Added TROUBLESHOOTING.md
   - Added FLEXIBLE_ROADMAP.md
   - Added NEXT_TASKS.md
   - Added TODO.md

Impact:
-------
- Clear visibility of progress (4 major items this week)
- Updated guidance for Claude Code
- Accurate config information (11 working configs)
- Better onboarding with new docs
- Transparent roadmap tracking

Files modified: TODO.md, NEXT_TASKS.md, CLAUDE.md
2025-10-21 00:42:36 +03:00
yusyus
8bd3ccfcdf Merge pull request #5 from jjshanks/anchor-fix
Strip anchors from urls so that the pages aren't duplicated
2025-10-21 00:26:26 +03:00
yusyus
80382551b1 Fix Issue #7: Fix all broken configs and add Laravel support
Tested and fixed all 11 production configs - now 100% working!

Fixed Configs:
1. Django (configs/django.json)
   -  Was using: div.document (selector doesn't exist)
   -  Now using: article (1,688 chars of content)
   - Verified on: https://docs.djangoproject.com/en/stable/

2. Astro (configs/astro.json)
   -  Was using: homepage URL (no article element)
   -  Now using: /en/getting-started/ with article selector
   - Added: start_urls, categories, improved URL patterns
   - Increased max_pages from 15 to 100

3. Tailwind (configs/tailwind.json)
   -  Was using: article (selector doesn't exist)
   -  Now using: div.prose (195 chars of content)
   - Verified on: https://tailwindcss.com/docs

New Config:
4. Laravel (configs/laravel.json) - NEW!
   - Created complete Laravel 9.x config
   - Selector: #main-content (16,131 chars of content)
   - Base URL: https://laravel.com/docs/9.x/
   - Includes: 8 start_urls covering installation, routing,
     controllers, views, Blade, Eloquent, migrations, auth
   - Categories: getting_started, routing, views, models,
     authentication, api
   - max_pages: 500

Test Results:
 11/11 configs tested and verified (100%)
 All selectors extract content properly
 All base URLs accessible

Working Configs:
-  astro.json
-  django.json
-  fastapi.json
-  godot.json
-  godot-large-example.json
-  kubernetes.json
-  laravel.json (NEW)
-  react.json
-  steam-economy-complete.json
-  tailwind.json
-  vue.json

How I Tested:
1. Created test_selectors.py to find correct CSS selectors
2. Tested each config's base_url + selector combination
3. Verified content extraction (not just "found" but actual text)
4. Ensured meaningful content length (50+ chars minimum)

Fixes Issue #7 - Laravel scraping not working
Fixes #7
2025-10-21 00:16:39 +03:00
yusyus
9028974da9 Fix Issue #8: Add bulletproof setup and prerequisites
Addresses community feedback about missing setup steps.

New Documentation:
+ BULLETPROOF_QUICKSTART.md - Complete beginner guide
  - Step-by-step Python/Git installation
  - Every step with expected output
  - Troubleshooting for each step
  - Test example (5-page scrape)
  - 15-30 minute complete setup

+ TROUBLESHOOTING.md - Comprehensive troubleshooting
  - Installation issues (Python, pip, permissions)
  - Runtime issues (file not found, configs)
  - MCP setup issues (placeholder paths!)
  - Scraping issues (slow, empty content)
  - Platform-specific (macOS/Linux/Windows)
  - Verification commands

Setup Script Improvements:
 Fixed setup_mcp.sh path expansion
  - Now shows ACTUAL paths (not $REPO_PATH placeholder)
  - Verifies paths exist after writing config
  - Shows config contents for verification
  - Tests MCP server path validity
  - Clear warning about placeholders

README Updates:
 Added Prerequisites section
  - Python 3.10+ requirement clear
  - Git requirement clear
  - Links to bulletproof guide
 Added git clone step to Quick Start
 Reorganized Documentation section
  - Getting Started (new, beginner, troubleshooting)
  - Guides (advanced topics)
  - Technical (architecture)

Fixes:
- Issue #8 - Prereqs to Getting Started
- Issue #114 on project board (H1.1)
- Placeholder path problem in MCP setup
- Missing beginner-friendly docs

Impact: New users can now get started without confusion!
2025-10-21 00:04:26 +03:00
yusyus
d9e9fb53ad Complete comprehensive planning verification and fix gaps
Issues Found & Fixed:
-  Found 7 missing E1 tasks (E1.3-E1.9)
-  Created issues #136-#142
-  Added to project board
-  Assigned Feature Group E1

Documentation Updates:
- Updated README.md (127 → 134 tasks)
- Updated GITHUB_BOARD_SETUP_COMPLETE.md (127 → 134 tasks)
- Added complete E1 task list (#136-#142)
- Created PLANNING_VERIFICATION.md (comprehensive report)

Verification Results:
 All 134 tasks in roadmap
 All 134 GitHub issues created (#9-#142)
 All 134 items on project board
 All 22 feature groups assigned
 All 6 custom fields configured
 All documentation consistent
 No gaps or holes found

System Status: 100% COMPLETE AND VERIFIED

Files Changed:
- README.md
- GITHUB_BOARD_SETUP_COMPLETE.md
+ PLANNING_VERIFICATION.md (new)

GitHub Issues: #9-#142 (134 total)
Project Board: https://github.com/users/yusufkaraaslan/projects/2
Feature Groups: 22 (A1-J1)
Categories: 10 (A-J)
2025-10-20 23:51:47 +03:00
yusyus
5f29c1c191 Configure project board for incremental development workflow
New Workflow:
- Added 'Workflow Stage' field with 5 stages
- 📋 Backlog (120 tasks) - All available tasks
-  Quick Wins (7 tasks) - High priority starters
- 🎯 Ready to Start (0-5 tasks) - Personal queue
- 🔨 In Progress (1-2 max) - Active work
-  Done - Completed tasks

Quick Wins Pre-selected:
- #130 - Install MCP (5 min)
- #114 - Respond to Issue #8 (30 min)
- #117 - Answer Issue #3 (30 min)
- #27 - Research PDF parsing (30 min)
- #21 - GitHub Pages site (1-2 hours)
- #93 - URL normalization (1-2 hours)
- #116 - Example project (2-3 hours)

Updated PROJECT_BOARD_GUIDE.md:
- Explained Workflow Stage field
- Step-by-step incremental workflow
- Recommended views and filters
- Tips for incremental success

Philosophy: Small tasks → Pick one → Complete → Move to next!
2025-10-20 23:13:10 +03:00
yusyus
e1e3968537 Add GitHub Project Board setup and guide
- Added 3 custom fields: Category, Time Estimate, Priority
- Created comprehensive project README
- Added PROJECT_BOARD_GUIDE.md with usage instructions
- Project board fully configured for flexible development

Custom Fields:
- Category: 10 categories matching our roadmap
- Time Estimate: 5 levels (5min to 8+ hours)
- Priority: High/Medium/Low/Starter
- Status: Todo/In Progress/Done (default)

Project: https://github.com/users/yusufkaraaslan/projects/2
2025-10-20 23:03:14 +03:00
yusyus
e092318351 Add project board link to README
- Added project board badge
- Added prominent link to development roadmap
- Links to 127 tasks across 10 categories
2025-10-20 22:58:04 +03:00
yusyus
90449cc86d Add step-by-step project board setup instructions
- Complete checklist for manual setup
- GitHub CLI automation commands
- Label colors and descriptions
- Milestone creation guide
- Issue creation workflow

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-20 13:40:29 +03:00
yusyus
29a181752a Add GitHub Project Board setup and issue templates
- Add comprehensive PROJECT_BOARD_SETUP.md with 20 issues
- Create 3 milestones: v1.1.0, v1.2.0, v2.0.0
- Add issue templates: feature, bug, documentation
- Add PR template with checklist
- Define labels for priority, type, component, status
- Include setup instructions for web UI and CLI

Features:
- 6-column project board structure
- 20 pre-defined issues covering website, core improvements, advanced features
- Custom fields: Effort, Impact, Category
- Success metrics and community engagement guidelines

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-20 13:38:13 +03:00
yusyus
efaa1454e5 Merge pull request #6 from lwsinclair/add-mseep-badge
Add MseeP.ai badge
2025-10-20 12:43:47 +03:00