fix: remove arbitrary limits, fix hardcoded languages, and fix summarizer bugs
Stage 1 quality improvements from the Arbitrary Limits & Dead Code audit: Reference file truncation removed: - codebase_scraper.py: remove code[:500] truncation at 5 locations — reference files now contain complete code blocks for copy-paste usability - unified_skill_builder.py: remove issues[:20], releases[:10], body[:500], and code_snippet[:300] caps in reference files — full content preserved Enhancement summarizer rewrite: - enhance_skill_local.py: replace arbitrary [:5] code block cap with character-budget approach using target_ratio * content_chars - Fix intro boundary bug: track code block state so intro never ends inside a code block, which was desynchronizing the parser - Remove dead _target_lines variable (assigned but never used) - Heading chunks now also respect the character budget Hardcoded language fixes: - unified_skill_builder.py: test examples use ex["language"] instead of always "python" for syntax highlighting - how_to_guide_builder.py: add language field to HowToGuide dataclass, set from workflow at creation, used in AI enhancement prompt Test fixes: - test_enhance_skill_local.py: rename test to test_code_blocks_not_arbitrarily_capped, fix assertion to count actual blocks (```count // 2), use target_ratio=0.9 Documentation: - Add Stage 1 plan, implementation summary, review, and corrected docs - Update CHANGELOG.md with all changes Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -419,7 +419,7 @@ def extract_markdown_structure(content: str) -> dict[str, Any]:
|
||||
structure["code_blocks"].append(
|
||||
{
|
||||
"language": language,
|
||||
"code": code[:500], # Truncate long code blocks
|
||||
"code": code, # Full code - no truncation
|
||||
"full_length": len(code),
|
||||
}
|
||||
)
|
||||
@@ -486,7 +486,7 @@ def extract_rst_structure(content: str) -> dict[str, Any]:
|
||||
"code_blocks": [
|
||||
{
|
||||
"language": cb.language or "text",
|
||||
"code": cb.code[:500] if len(cb.code) > 500 else cb.code,
|
||||
"code": cb.code, # Full code - no truncation
|
||||
"full_length": len(cb.code),
|
||||
"quality_score": cb.quality_score,
|
||||
}
|
||||
@@ -572,7 +572,7 @@ def extract_rst_structure(content: str) -> dict[str, Any]:
|
||||
structure["code_blocks"].append(
|
||||
{
|
||||
"language": language,
|
||||
"code": code[:500],
|
||||
"code": code, # Full code - no truncation
|
||||
"full_length": len(code),
|
||||
}
|
||||
)
|
||||
@@ -717,7 +717,7 @@ def process_markdown_docs(
|
||||
for h in parsed_doc.headings
|
||||
],
|
||||
"code_blocks": [
|
||||
{"language": cb.language, "code": cb.code[:500]}
|
||||
{"language": cb.language, "code": cb.code} # Full code
|
||||
for cb in parsed_doc.code_blocks
|
||||
],
|
||||
"tables": len(parsed_doc.tables),
|
||||
@@ -743,7 +743,7 @@ def process_markdown_docs(
|
||||
for h in parsed_doc.headings
|
||||
],
|
||||
"code_blocks": [
|
||||
{"language": cb.language, "code": cb.code[:500]}
|
||||
{"language": cb.language, "code": cb.code} # Full code
|
||||
for cb in parsed_doc.code_blocks
|
||||
],
|
||||
"tables": len(parsed_doc.tables),
|
||||
|
||||
Reference in New Issue
Block a user