fix: resolve 18 bugs and code quality issues across adaptors, CLI, and chunking pipeline
Bug fixes: - Fix --var flag silently dropped in create routing (args.workflow_var → args.var) - Fix double _score_code_quality() call in word scraper - Add .docx file extension validation in WordToSkillConverter - Fix weaviate ImportError masked by generic Exception handler - Fix RAG chunking crash using non-existent converter.output_dir Chunking pipeline improvements: - Wire --chunk-overlap-tokens through entire package pipeline (package_skill → adaptor.package → format_skill_md → _maybe_chunk_content → RAGChunker) - Add auto-scaling overlap: max(50, chunk_tokens//10) when chunk size is non-default - Rename --no-preserve-code to --no-preserve-code-blocks (backward-compat alias kept) - Replace hardcoded 512/50 chunk defaults with DEFAULT_CHUNK_TOKENS/DEFAULT_CHUNK_OVERLAP_TOKENS constants across all 12 concrete adaptors, rag_chunker, base, and package_skill Code quality: - Extract shared _generate_openai_embeddings() and _generate_st_embeddings() to SkillAdaptor base class, removing ~150 lines of duplication from chroma/weaviate/pinecone - Add Pinecone adaptor with full upload support (pinecone_adaptor.py) Tests (14 new): - chunk_overlap_tokens parameter wiring, auto-scaling overlap, preserve_code_blocks flag - .docx/.doc/no-extension file validation, --var flag routing E2E - Embedding method inheritance verification, backward-compatible flag aliases Docs: - Update CHANGELOG, CLI_REFERENCE, API_REFERENCE, packaging guide (EN+ZH) - Update README test count badge (1880+ → 2283+) All 2283 tests passing, 8 skipped, 0 failures. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -13,6 +13,7 @@ from skill_seekers.cli.arguments.create import (
|
||||
get_compatible_arguments,
|
||||
get_universal_argument_names,
|
||||
)
|
||||
from skill_seekers.cli.arguments.common import DEFAULT_CHUNK_TOKENS, DEFAULT_CHUNK_OVERLAP_TOKENS
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
@@ -106,8 +107,8 @@ class CreateCommand:
|
||||
# Check against common defaults
|
||||
defaults = {
|
||||
"max_issues": 100,
|
||||
"chunk_tokens": 512,
|
||||
"chunk_overlap_tokens": 50,
|
||||
"chunk_tokens": DEFAULT_CHUNK_TOKENS,
|
||||
"chunk_overlap_tokens": DEFAULT_CHUNK_OVERLAP_TOKENS,
|
||||
"output": None,
|
||||
}
|
||||
|
||||
@@ -160,11 +161,11 @@ class CreateCommand:
|
||||
# RAG arguments (web scraper only)
|
||||
if getattr(self.args, "chunk_for_rag", False):
|
||||
argv.append("--chunk-for-rag")
|
||||
if getattr(self.args, "chunk_tokens", None) and self.args.chunk_tokens != 512:
|
||||
if getattr(self.args, "chunk_tokens", None) and self.args.chunk_tokens != DEFAULT_CHUNK_TOKENS:
|
||||
argv.extend(["--chunk-tokens", str(self.args.chunk_tokens)])
|
||||
if (
|
||||
getattr(self.args, "chunk_overlap_tokens", None)
|
||||
and self.args.chunk_overlap_tokens != 50
|
||||
and self.args.chunk_overlap_tokens != DEFAULT_CHUNK_OVERLAP_TOKENS
|
||||
):
|
||||
argv.extend(["--chunk-overlap-tokens", str(self.args.chunk_overlap_tokens)])
|
||||
|
||||
@@ -428,6 +429,10 @@ class CreateCommand:
|
||||
if self.args.quiet:
|
||||
argv.append("--quiet")
|
||||
|
||||
# Documentation version metadata
|
||||
if getattr(self.args, "doc_version", ""):
|
||||
argv.extend(["--doc-version", self.args.doc_version])
|
||||
|
||||
# Enhancement Workflow arguments
|
||||
if getattr(self.args, "enhance_workflow", None):
|
||||
for wf in self.args.enhance_workflow:
|
||||
|
||||
Reference in New Issue
Block a user