fix: remove arbitrary limits, fix hardcoded languages, and fix summarizer bugs

Stage 1 quality improvements from the Arbitrary Limits & Dead Code audit: Reference file truncation removed: - codebase_scraper.py: remove code[:500] truncation at 5 locations — reference files now contain complete code blocks for copy-paste usability - unified_skill_builder.py: remove issues[:20], releases[:10], body[:500], and code_snippet[:300] caps in reference files — full content preserved Enhancement summarizer rewrite: - enhance_skill_local.py: replace arbitrary [:5] code block cap with character-budget approach using target_ratio * content_chars - Fix intro boundary bug: track code block state so intro never ends inside a code block, which was desynchronizing the parser - Remove dead _target_lines variable (assigned but never used) - Heading chunks now also respect the character budget Hardcoded language fixes: - unified_skill_builder.py: test examples use ex["language"] instead of always "python" for syntax highlighting - how_to_guide_builder.py: add language field to HowToGuide dataclass, set from workflow at creation, used in AI enhancement prompt Test fixes: - test_enhance_skill_local.py: rename test to test_code_blocks_not_arbitrarily_capped, fix assertion to count actual blocks (```count // 2), use target_ratio=0.9 Documentation: - Add Stage 1 plan, implementation summary, review, and corrected docs - Update CHANGELOG.md with all changes Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-26 00:30:40 +03:00
parent b81d55fda0
commit b6d4dd8423
10 changed files with 1189 additions and 20 deletions
--- a/tests/test_enhance_skill_local.py
+++ b/tests/test_enhance_skill_local.py
@@ -356,14 +356,17 @@ class TestSummarizeReference:
        # Result should be significantly shorter than original
        assert len(result) < len(content)

-    def test_code_blocks_capped_at_five(self, tmp_path):
+    def test_code_blocks_not_arbitrarily_capped(self, tmp_path):
+        """Code blocks should not be arbitrarily capped at 5 - should use token budget."""
        enhancer = self._enhancer(tmp_path)
-        content = "\n".join(["Intro line"] * 20) + "\n"
+        content = "\n".join(["Intro line"] * 10) + "\n"  # Shorter intro
        for i in range(10):
-            content += f"```python\ncode_block_{i}()\n```\n"
-        result = enhancer.summarize_reference(content)
-        # Should have at most 5 code blocks
-        assert result.count("```python") <= 5
+            content += f"```\ncode_block_{i}()\n```\n"  # Short code blocks
+        # Use high ratio to ensure budget fits well beyond 5 blocks
+        result = enhancer.summarize_reference(content, target_ratio=0.9)
+        # Each block has opening + closing ```, so divide by 2 for actual block count
+        code_block_count = result.count("```") // 2
+        assert code_block_count > 5, f"Expected >5 code blocks, got {code_block_count}"


 # ---------------------------------------------------------------------------