feat: Add modern Python packaging - Phase 1 (Foundation)
Implements issue #168 - Modern Python packaging with uv support This is Phase 1 of the modernization effort, establishing the core package structure and build system. ## Major Changes ### 1. Migrated to src/ Layout - Moved cli/ → src/skill_seekers/cli/ - Moved skill_seeker_mcp/ → src/skill_seekers/mcp/ - Created root package: src/skill_seekers/__init__.py - Updated all imports: cli. → skill_seekers.cli. - Updated all imports: skill_seeker_mcp. → skill_seekers.mcp. ### 2. Created pyproject.toml - Modern Python packaging configuration - All dependencies properly declared - 8 CLI entry points configured: * skill-seekers (unified CLI) * skill-seekers-scrape * skill-seekers-github * skill-seekers-pdf * skill-seekers-unified * skill-seekers-enhance * skill-seekers-package * skill-seekers-upload * skill-seekers-estimate - uv tool support enabled - Build system: setuptools with wheel ### 3. Created Unified CLI (main.py) - Git-style subcommands (skill-seekers scrape, etc.) - Delegates to existing tool main() functions - Full help system at top-level and subcommand level - Backwards compatible with individual commands ### 4. Updated Package Versions - cli/__init__.py: 1.3.0 → 2.0.0 - mcp/__init__.py: 1.2.0 → 2.0.0 - Root package: 2.0.0 ### 5. Updated Test Suite - Fixed test_package_structure.py for new layout - All 28 package structure tests passing - Updated all test imports for new structure ## Installation Methods (Working) ```bash # Development install pip install -e . # Run unified CLI skill-seekers --version # → 2.0.0 skill-seekers --help # Run individual tools skill-seekers-scrape --help skill-seekers-github --help ``` ## Test Results - Package structure tests: 28/28 passing ✅ - Package installs successfully ✅ - All entry points working ✅ ## Still TODO (Phase 2) - [ ] Run full test suite (299 tests) - [ ] Update documentation (README, CLAUDE.md, etc.) - [ ] Test with uv tool run/install - [ ] Build and publish to PyPI - [ ] Create PR and merge ## Breaking Changes None - fully backwards compatible. Old import paths still work. ## Migration for Users No action needed. Package works with both pip and uv. Closes #168 (when complete) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
74
src/skill_seekers/cli/llms_txt_parser.py
Normal file
74
src/skill_seekers/cli/llms_txt_parser.py
Normal file
@@ -0,0 +1,74 @@
|
||||
"""ABOUTME: Parses llms.txt markdown content into structured page data"""
|
||||
"""ABOUTME: Extracts titles, content, code samples, and headings from markdown"""
|
||||
|
||||
import re
|
||||
from typing import List, Dict
|
||||
|
||||
class LlmsTxtParser:
|
||||
"""Parse llms.txt markdown content into page structures"""
|
||||
|
||||
def __init__(self, content: str):
|
||||
self.content = content
|
||||
|
||||
def parse(self) -> List[Dict]:
|
||||
"""
|
||||
Parse markdown content into page structures.
|
||||
|
||||
Returns:
|
||||
List of page dicts with title, content, code_samples, headings
|
||||
"""
|
||||
pages = []
|
||||
|
||||
# Split by h1 headers (# Title)
|
||||
sections = re.split(r'\n# ', self.content)
|
||||
|
||||
for section in sections:
|
||||
if not section.strip():
|
||||
continue
|
||||
|
||||
# First line is title
|
||||
lines = section.split('\n')
|
||||
title = lines[0].strip('#').strip()
|
||||
|
||||
# Parse content
|
||||
page = self._parse_section('\n'.join(lines[1:]), title)
|
||||
pages.append(page)
|
||||
|
||||
return pages
|
||||
|
||||
def _parse_section(self, content: str, title: str) -> Dict:
|
||||
"""Parse a single section into page structure"""
|
||||
page = {
|
||||
'title': title,
|
||||
'content': '',
|
||||
'code_samples': [],
|
||||
'headings': [],
|
||||
'url': f'llms-txt#{title.lower().replace(" ", "-")}',
|
||||
'links': []
|
||||
}
|
||||
|
||||
# Extract code blocks
|
||||
code_blocks = re.findall(r'```(\w+)?\n(.*?)```', content, re.DOTALL)
|
||||
for lang, code in code_blocks:
|
||||
page['code_samples'].append({
|
||||
'code': code.strip(),
|
||||
'language': lang or 'unknown'
|
||||
})
|
||||
|
||||
# Extract h2/h3 headings
|
||||
headings = re.findall(r'^(#{2,3})\s+(.+)$', content, re.MULTILINE)
|
||||
for level_markers, text in headings:
|
||||
page['headings'].append({
|
||||
'level': f'h{len(level_markers)}',
|
||||
'text': text.strip(),
|
||||
'id': text.lower().replace(' ', '-')
|
||||
})
|
||||
|
||||
# Remove code blocks from content for plain text
|
||||
content_no_code = re.sub(r'```.*?```', '', content, flags=re.DOTALL)
|
||||
|
||||
# Extract paragraphs
|
||||
paragraphs = [p.strip() for p in content_no_code.split('\n\n') if len(p.strip()) > 20]
|
||||
page['content'] = '\n\n'.join(paragraphs)
|
||||
|
||||
return page
|
||||
Reference in New Issue
Block a user