fix: resolve 18 bugs and code quality issues across adaptors, CLI, and chunking pipeline
Bug fixes: - Fix --var flag silently dropped in create routing (args.workflow_var → args.var) - Fix double _score_code_quality() call in word scraper - Add .docx file extension validation in WordToSkillConverter - Fix weaviate ImportError masked by generic Exception handler - Fix RAG chunking crash using non-existent converter.output_dir Chunking pipeline improvements: - Wire --chunk-overlap-tokens through entire package pipeline (package_skill → adaptor.package → format_skill_md → _maybe_chunk_content → RAGChunker) - Add auto-scaling overlap: max(50, chunk_tokens//10) when chunk size is non-default - Rename --no-preserve-code to --no-preserve-code-blocks (backward-compat alias kept) - Replace hardcoded 512/50 chunk defaults with DEFAULT_CHUNK_TOKENS/DEFAULT_CHUNK_OVERLAP_TOKENS constants across all 12 concrete adaptors, rag_chunker, base, and package_skill Code quality: - Extract shared _generate_openai_embeddings() and _generate_st_embeddings() to SkillAdaptor base class, removing ~150 lines of duplication from chroma/weaviate/pinecone - Add Pinecone adaptor with full upload support (pinecone_adaptor.py) Tests (14 new): - chunk_overlap_tokens parameter wiring, auto-scaling overlap, preserve_code_blocks flag - .docx/.doc/no-extension file validation, --var flag routing E2E - Embedding method inheritance verification, backward-compatible flag aliases Docs: - Update CHANGELOG, CLI_REFERENCE, API_REFERENCE, packaging guide (EN+ZH) - Update README test count badge (1880+ → 2283+) All 2283 tests passing, 8 skipped, 0 failures. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -16,6 +16,7 @@ Tests cover:
|
||||
"""
|
||||
|
||||
import json
|
||||
import os
|
||||
import shutil
|
||||
import tempfile
|
||||
import unittest
|
||||
@@ -456,6 +457,37 @@ class TestWordErrorHandling(unittest.TestCase):
|
||||
with self.assertRaises((KeyError, TypeError)):
|
||||
self.WordToSkillConverter({"docx_path": "test.docx"})
|
||||
|
||||
def test_non_docx_file_raises_value_error(self):
|
||||
"""extract_docx raises ValueError for non-.docx files."""
|
||||
# Create a real file with wrong extension
|
||||
txt_path = os.path.join(self.temp_dir, "test.txt")
|
||||
with open(txt_path, "w") as f:
|
||||
f.write("not a docx")
|
||||
config = {"name": "test", "docx_path": txt_path}
|
||||
converter = self.WordToSkillConverter(config)
|
||||
with self.assertRaises(ValueError):
|
||||
converter.extract_docx()
|
||||
|
||||
def test_doc_file_raises_value_error(self):
|
||||
"""extract_docx raises ValueError for .doc (old Word format)."""
|
||||
doc_path = os.path.join(self.temp_dir, "test.doc")
|
||||
with open(doc_path, "w") as f:
|
||||
f.write("not a docx")
|
||||
config = {"name": "test", "docx_path": doc_path}
|
||||
converter = self.WordToSkillConverter(config)
|
||||
with self.assertRaises(ValueError):
|
||||
converter.extract_docx()
|
||||
|
||||
def test_no_extension_file_raises_value_error(self):
|
||||
"""extract_docx raises ValueError for file with no extension."""
|
||||
no_ext_path = os.path.join(self.temp_dir, "document")
|
||||
with open(no_ext_path, "w") as f:
|
||||
f.write("not a docx")
|
||||
config = {"name": "test", "docx_path": no_ext_path}
|
||||
converter = self.WordToSkillConverter(config)
|
||||
with self.assertRaises(ValueError):
|
||||
converter.extract_docx()
|
||||
|
||||
|
||||
class TestWordJSONWorkflow(unittest.TestCase):
|
||||
"""Test building skills from extracted JSON."""
|
||||
|
||||
Reference in New Issue
Block a user