Files
skill-seekers-reference/src/skill_seekers/cli/llms_txt_downloader.py
yusyus ce1c07b437 feat: Add modern Python packaging - Phase 1 (Foundation)
Implements issue #168 - Modern Python packaging with uv support

This is Phase 1 of the modernization effort, establishing the core
package structure and build system.

## Major Changes

### 1. Migrated to src/ Layout
- Moved cli/ → src/skill_seekers/cli/
- Moved skill_seeker_mcp/ → src/skill_seekers/mcp/
- Created root package: src/skill_seekers/__init__.py
- Updated all imports: cli. → skill_seekers.cli.
- Updated all imports: skill_seeker_mcp. → skill_seekers.mcp.

### 2. Created pyproject.toml
- Modern Python packaging configuration
- All dependencies properly declared
- 8 CLI entry points configured:
  * skill-seekers (unified CLI)
  * skill-seekers-scrape
  * skill-seekers-github
  * skill-seekers-pdf
  * skill-seekers-unified
  * skill-seekers-enhance
  * skill-seekers-package
  * skill-seekers-upload
  * skill-seekers-estimate
- uv tool support enabled
- Build system: setuptools with wheel

### 3. Created Unified CLI (main.py)
- Git-style subcommands (skill-seekers scrape, etc.)
- Delegates to existing tool main() functions
- Full help system at top-level and subcommand level
- Backwards compatible with individual commands

### 4. Updated Package Versions
- cli/__init__.py: 1.3.0 → 2.0.0
- mcp/__init__.py: 1.2.0 → 2.0.0
- Root package: 2.0.0

### 5. Updated Test Suite
- Fixed test_package_structure.py for new layout
- All 28 package structure tests passing
- Updated all test imports for new structure

## Installation Methods (Working)

```bash
# Development install
pip install -e .

# Run unified CLI
skill-seekers --version  # → 2.0.0
skill-seekers --help

# Run individual tools
skill-seekers-scrape --help
skill-seekers-github --help
```

## Test Results
- Package structure tests: 28/28 passing 
- Package installs successfully 
- All entry points working 

## Still TODO (Phase 2)
- [ ] Run full test suite (299 tests)
- [ ] Update documentation (README, CLAUDE.md, etc.)
- [ ] Test with uv tool run/install
- [ ] Build and publish to PyPI
- [ ] Create PR and merge

## Breaking Changes
None - fully backwards compatible. Old import paths still work.

## Migration for Users
No action needed. Package works with both pip and uv.

Closes #168 (when complete)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-07 01:14:24 +03:00

95 lines
3.1 KiB
Python

"""ABOUTME: Downloads llms.txt files from documentation URLs with retry logic"""
"""ABOUTME: Validates markdown content and handles timeouts with exponential backoff"""
import requests
import time
from typing import Optional
class LlmsTxtDownloader:
"""Download llms.txt content from URLs with retry logic"""
def __init__(self, url: str, timeout: int = 30, max_retries: int = 3):
self.url = url
self.timeout = timeout
self.max_retries = max_retries
def get_proper_filename(self) -> str:
"""
Extract filename from URL and convert .txt to .md
Returns:
Proper filename with .md extension
Examples:
https://hono.dev/llms-full.txt -> llms-full.md
https://hono.dev/llms.txt -> llms.md
https://hono.dev/llms-small.txt -> llms-small.md
"""
# Extract filename from URL
from urllib.parse import urlparse
parsed = urlparse(self.url)
filename = parsed.path.split('/')[-1]
# Replace .txt with .md
if filename.endswith('.txt'):
filename = filename[:-4] + '.md'
return filename
def _is_markdown(self, content: str) -> bool:
"""
Check if content looks like markdown.
Returns:
True if content contains markdown patterns
"""
markdown_patterns = ['# ', '## ', '```', '- ', '* ', '`']
return any(pattern in content for pattern in markdown_patterns)
def download(self) -> Optional[str]:
"""
Download llms.txt content with retry logic.
Returns:
String content or None if download fails
"""
headers = {
'User-Agent': 'Skill-Seekers-llms.txt-Reader/1.0'
}
for attempt in range(self.max_retries):
try:
response = requests.get(
self.url,
headers=headers,
timeout=self.timeout
)
response.raise_for_status()
content = response.text
# Validate content is not empty
if len(content) < 100:
print(f"⚠️ Content too short ({len(content)} chars), rejecting")
return None
# Validate content looks like markdown
if not self._is_markdown(content):
print(f"⚠️ Content doesn't look like markdown")
return None
return content
except requests.RequestException as e:
if attempt < self.max_retries - 1:
# Calculate exponential backoff delay: 1s, 2s, 4s, etc.
delay = 2 ** attempt
print(f"⚠️ Attempt {attempt + 1}/{self.max_retries} failed: {e}")
print(f" Retrying in {delay}s...")
time.sleep(delay)
else:
print(f"❌ Failed to download {self.url} after {self.max_retries} attempts: {e}")
return None
return None