docs: update all documentation for 17 source types

Update 32 documentation files across English and Chinese (zh-CN) docs
to reflect the 10 new source types added in the previous commit.

Updated files:
- README.md, README.zh-CN.md — taglines, feature lists, examples, install extras
- docs/reference/ — CLI_REFERENCE, FEATURE_MATRIX, MCP_REFERENCE, CONFIG_FORMAT, API_REFERENCE
- docs/features/ — UNIFIED_SCRAPING with generic merge docs
- docs/advanced/ — multi-source guide, MCP server guide
- docs/getting-started/ — installation extras, quick-start examples
- docs/user-guide/ — core-concepts, scraping, packaging, workflows (complex-merge)
- docs/ — FAQ, TROUBLESHOOTING, BEST_PRACTICES, ARCHITECTURE, UNIFIED_PARSERS, README
- Root — BULLETPROOF_QUICKSTART, CONTRIBUTING, ROADMAP
- docs/zh-CN/ — Chinese translations for all of the above

32 files changed, +3,016 lines, -245 lines
This commit is contained in:
yusyus
2026-03-15 15:56:04 +03:00
parent 53b911b697
commit 37cb307455
32 changed files with 3011 additions and 240 deletions

View File

@@ -14,6 +14,7 @@ Comprehensive guide for diagnosing and resolving common issues with Skill Seeker
- [Storage Issues](#storage-issues)
- [Network Issues](#network-issues)
- [General Debug Techniques](#general-debug-techniques)
- [Source-Type-Specific Issues](#source-type-specific-issues)
## Installation Issues
@@ -893,6 +894,181 @@ If you're still experiencing issues:
- Steps to reproduce
- Diagnostic information (see above)
## Source-Type-Specific Issues
### Issue: Missing Optional Dependencies for New Source Types
**Symptoms:**
```
ModuleNotFoundError: No module named 'ebooklib'
ModuleNotFoundError: No module named 'python-docx'
ModuleNotFoundError: No module named 'python-pptx'
ImportError: Missing dependency for jupyter extraction
```
**Solutions:**
```bash
# Install all optional dependencies at once
pip install skill-seekers[all]
# Or install per source type
pip install python-docx # Word (.docx) support
pip install ebooklib # EPUB support
pip install python-pptx # PowerPoint (.pptx) support
pip install nbformat nbconvert # Jupyter Notebook support
pip install pyyaml jsonschema # OpenAPI/Swagger support
pip install asciidoctor # AsciiDoc support (or install system asciidoctor)
pip install feedparser # RSS/Atom feed support
pip install groff # Man page support (system package)
# Video support (GPU-aware)
skill-seekers video --setup
```
### Issue: Confluence API Authentication Fails
**Symptoms:**
```
401 Unauthorized: Confluence API rejected credentials
Error: CONFLUENCE_TOKEN not found
```
**Solutions:**
```bash
# Set Confluence Cloud credentials
export CONFLUENCE_URL=https://yourorg.atlassian.net
export CONFLUENCE_EMAIL=your-email@example.com
export CONFLUENCE_TOKEN=your-api-token
# Generate API token at:
# https://id.atlassian.com/manage-profile/security/api-tokens
# Test connection
skill-seekers confluence --space MYSPACE --dry-run
# For Confluence Server/Data Center, use personal access token:
export CONFLUENCE_TOKEN=your-pat
```
### Issue: Notion API Authentication Fails
**Symptoms:**
```
401 Unauthorized: Notion API rejected credentials
Error: NOTION_TOKEN not found
```
**Solutions:**
```bash
# Set Notion integration token
export NOTION_TOKEN=secret_...
# Create an integration at:
# https://www.notion.so/my-integrations
# IMPORTANT: Share the target database/page with your integration
# (click "..." menu on page → "Add connections" → select your integration)
# Test connection
skill-seekers notion --database DATABASE_ID --dry-run
```
### Issue: Jupyter Notebook Extraction Fails
**Symptoms:**
```
Error: Cannot read notebook format
nbformat.reader.NotJSONError
```
**Solutions:**
```bash
# Ensure notebook is valid JSON
python -c "import json; json.load(open('notebook.ipynb'))"
# Install required deps
pip install nbformat nbconvert
# Try with explicit format version
skill-seekers jupyter notebook.ipynb --nbformat 4
```
### Issue: OpenAPI Spec Parsing Fails
**Symptoms:**
```
Error: Not a valid OpenAPI specification
Error: Missing 'openapi' or 'swagger' field
```
**Solutions:**
```bash
# Validate your spec first
pip install openapi-spec-validator
python -c "
from openapi_spec_validator import validate
validate({'openapi': '3.0.0', ...})
"
# Ensure the file has the 'openapi' or 'swagger' top-level key
# Supported: OpenAPI 3.x and Swagger 2.0
# For remote specs
skill-seekers openapi https://api.example.com/openapi.json --name my-api
```
### Issue: EPUB Extraction Produces Empty Output
**Symptoms:**
```
Warning: No content found in EPUB
0 chapters extracted
```
**Solutions:**
```bash
# Check EPUB is valid
pip install epubcheck
epubcheck book.epub
# Try with different content extraction
skill-seekers epub book.epub --extract-images --verbose
# Some DRM-protected EPUBs cannot be extracted
# Ensure your EPUB is DRM-free
```
### Issue: Slack/Discord Export Not Recognized
**Symptoms:**
```
Error: Cannot detect chat platform from export directory
Error: No messages found in export
```
**Solutions:**
```bash
# Specify platform explicitly
skill-seekers chat --platform slack --export-dir ./slack-export
skill-seekers chat --platform discord --export-dir ./discord-export
# For Slack: Export from Workspace Settings → Import/Export
# For Discord: Use DiscordChatExporter or similar tool
# Check export directory structure
ls ./slack-export/
# Should contain: channels/, users.json, etc.
```
---
## Common Error Messages Reference
| Error | Cause | Solution |
@@ -907,6 +1083,11 @@ If you're still experiencing issues:
| `MemoryError` | Out of memory | Reduce batch size |
| `PermissionError` | Access denied | Check file permissions |
| `FileNotFoundError` | Missing file | Verify file path |
| `No module named 'ebooklib'` | EPUB dep missing | `pip install ebooklib` |
| `No module named 'python-docx'` | Word dep missing | `pip install python-docx` |
| `No module named 'python-pptx'` | PPTX dep missing | `pip install python-pptx` |
| `CONFLUENCE_TOKEN not found` | Confluence auth missing | Set env vars (see above) |
| `NOTION_TOKEN not found` | Notion auth missing | Set env vars (see above) |
---