fix: resolve 18 bugs and code quality issues across adaptors, CLI, and chunking pipeline

Bug fixes: - Fix --var flag silently dropped in create routing (args.workflow_var → args.var) - Fix double _score_code_quality() call in word scraper - Add .docx file extension validation in WordToSkillConverter - Fix weaviate ImportError masked by generic Exception handler - Fix RAG chunking crash using non-existent converter.output_dir Chunking pipeline improvements: - Wire --chunk-overlap-tokens through entire package pipeline (package_skill → adaptor.package → format_skill_md → _maybe_chunk_content → RAGChunker) - Add auto-scaling overlap: max(50, chunk_tokens//10) when chunk size is non-default - Rename --no-preserve-code to --no-preserve-code-blocks (backward-compat alias kept) - Replace hardcoded 512/50 chunk defaults with DEFAULT_CHUNK_TOKENS/DEFAULT_CHUNK_OVERLAP_TOKENS constants across all 12 concrete adaptors, rag_chunker, base, and package_skill Code quality: - Extract shared _generate_openai_embeddings() and _generate_st_embeddings() to SkillAdaptor base class, removing ~150 lines of duplication from chroma/weaviate/pinecone - Add Pinecone adaptor with full upload support (pinecone_adaptor.py) Tests (14 new): - chunk_overlap_tokens parameter wiring, auto-scaling overlap, preserve_code_blocks flag - .docx/.doc/no-extension file validation, --var flag routing E2E - Embedding method inheritance verification, backward-compatible flag aliases Docs: - Update CHANGELOG, CLI_REFERENCE, API_REFERENCE, packaging guide (EN+ZH) - Update README test count badge (1880+ → 2283+) All 2283 tests passing, 8 skipped, 0 failures. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-28 21:57:59 +03:00
parent 3bad7cf365
commit 064405c052
41 changed files with 1864 additions and 237 deletions
--- a/src/skill_seekers/cli/codebase_scraper.py
+++ b/src/skill_seekers/cli/codebase_scraper.py
@@ -1057,6 +1057,7 @@ def analyze_codebase(
    enhance_level: int = 0,
    skill_name: str | None = None,
    skill_description: str | None = None,
+    doc_version: str = "",
 ) -> dict[str, Any]:
    """
    Analyze local codebase and extract code knowledge.
@@ -1603,6 +1604,7 @@ def analyze_codebase(
        docs_data=docs_data,
        skill_name=skill_name,
        skill_description=skill_description,
+        doc_version=doc_version,
    )

    return results
@@ -1622,6 +1624,7 @@ def _generate_skill_md(
    docs_data: dict[str, Any] | None = None,
    skill_name: str | None = None,
    skill_description: str | None = None,
+    doc_version: str = "",
 ):
    """
    Generate rich SKILL.md from codebase analysis results.
@@ -1657,6 +1660,7 @@ def _generate_skill_md(
    skill_content = f"""---
 name: {skill_name}
 description: {description}
+doc_version: {doc_version}
 ---

 # {repo_name} Codebase
@@ -2197,13 +2201,11 @@ def _generate_references(output_dir: Path):

        if source_dir.exists() and source_dir.is_dir():
            # Copy directory to references/ (not symlink, for portability)
-            if target_dir.exists():
-                import shutil
-
-                shutil.rmtree(target_dir)
-
            import shutil

+            if target_dir.exists():
+                shutil.rmtree(target_dir)
+
            shutil.copytree(source_dir, target_dir)
            logger.debug(f"Copied {source} → references/{target}")

@@ -2451,6 +2453,7 @@ Examples:
            enhance_level=args.enhance_level,  # AI enhancement level (0-3)
            skill_name=getattr(args, "name", None),
            skill_description=getattr(args, "description", None),
+            doc_version=getattr(args, "doc_version", ""),
        )

        # ============================================================