From 3bad7cf365d91233293b006b89454eb7d10ef30c Mon Sep 17 00:00:00 2001 From: yusyus Date: Fri, 27 Feb 2026 22:26:21 +0300 Subject: [PATCH] fix: RAG chunking crash using non-existent converter.output_dir DocToSkillConverter has self.skill_dir (string), not self.output_dir. The --chunk-for-rag flag on scrape command crashed with AttributeError. Changed to Path(converter.skill_dir). Co-Authored-By: Claude Opus 4.6 --- CHANGELOG.md | 1 + src/skill_seekers/cli/doc_scraper.py | 5 +++-- 2 files changed, 4 insertions(+), 2 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index 5d2c3f9..b2e7351 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -22,6 +22,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 - **`docx` optional dependency group** — `pip install skill-seekers[docx]` (mammoth + python-docx) ### Fixed +- **RAG chunking crash (`AttributeError: output_dir`)** — `execute_scraping_and_building()` used `converter.output_dir` which doesn't exist on `DocToSkillConverter`. Changed to `Path(converter.skill_dir)`. Affected `--chunk-for-rag` flag on `scrape` command. - **Issue #301: `setup.sh` fails on macOS with mismatched Python/pip** — `pip3` can point to a different Python than `python3` (e.g. pip3 → 3.9, python3 → 3.14), causing "no matching distribution" errors. Changed `setup.sh` to use `python3 -m pip` instead of bare `pip3` to guarantee the correct interpreter. - **Issue #300: Selector fallback & dry-run link discovery** — `create https://reactflow.dev/` now finds 20+ pages (was 1). Root causes: - `extract_content()` extracted links after the early-return when no content selector matched, so they were never discovered. Moved link extraction before the early return. diff --git a/src/skill_seekers/cli/doc_scraper.py b/src/skill_seekers/cli/doc_scraper.py index 9d59bf9..62cac55 100755 --- a/src/skill_seekers/cli/doc_scraper.py +++ b/src/skill_seekers/cli/doc_scraper.py @@ -2289,10 +2289,11 @@ def execute_scraping_and_building( ) # Chunk the skill - chunks = chunker.chunk_skill(converter.output_dir) + skill_dir = Path(converter.skill_dir) + chunks = chunker.chunk_skill(skill_dir) # Save chunks - chunks_path = converter.output_dir / "rag_chunks.json" + chunks_path = skill_dir / "rag_chunks.json" chunker.save_chunks(chunks, chunks_path) logger.info(f"✅ Generated {len(chunks)} RAG chunks")