- Filter out chunks smaller than min_chunk_size (default 100 tokens) - Exception: Keep all chunks if entire document is smaller than target size - All 15 tests passing (100% pass rate) Fixes edge case where very small chunks (e.g., 'Short.' = 6 chars) were being created despite min_chunk_size=100 setting. Test: pytest tests/test_rag_chunker.py -v
16 KiB
16 KiB
Troubleshooting Guide
Comprehensive guide for diagnosing and resolving common issues with Skill Seekers.
Table of Contents
- Installation Issues
- Configuration Issues
- Scraping Issues
- GitHub API Issues
- API & Enhancement Issues
- Docker & Kubernetes Issues
- Performance Issues
- Storage Issues
- Network Issues
- General Debug Techniques
Installation Issues
Issue: Package Installation Fails
Symptoms:
ERROR: Could not build wheels for...
ERROR: Failed building wheel for...
Solutions:
# Update pip and setuptools
python -m pip install --upgrade pip setuptools wheel
# Install build dependencies (Ubuntu/Debian)
sudo apt install python3-dev build-essential libssl-dev
# Install build dependencies (RHEL/CentOS)
sudo yum install python3-devel gcc gcc-c++ openssl-devel
# Retry installation
pip install skill-seekers
Issue: Command Not Found After Installation
Symptoms:
$ skill-seekers --version
bash: skill-seekers: command not found
Solutions:
# Check if installed
pip show skill-seekers
# Add to PATH
export PATH="$HOME/.local/bin:$PATH"
# Or reinstall with --user flag
pip install --user skill-seekers
# Verify
which skill-seekers
Issue: Python Version Mismatch
Symptoms:
ERROR: Package requires Python >=3.10 but you are running 3.9
Solutions:
# Check Python version
python --version
python3 --version
# Use specific Python version
python3.12 -m pip install skill-seekers
# Create alias
alias python=python3.12
# Or use pyenv
pyenv install 3.12
pyenv global 3.12
Configuration Issues
Issue: API Keys Not Recognized
Symptoms:
Error: ANTHROPIC_API_KEY not found
401 Unauthorized
Solutions:
# Check environment variables
env | grep API_KEY
# Set in current session
export ANTHROPIC_API_KEY=sk-ant-...
# Set permanently (~/.bashrc or ~/.zshrc)
echo 'export ANTHROPIC_API_KEY=sk-ant-...' >> ~/.bashrc
source ~/.bashrc
# Or use .env file
cat > .env <<EOF
ANTHROPIC_API_KEY=sk-ant-...
EOF
# Load .env
set -a
source .env
set +a
# Verify
skill-seekers config --test
Issue: Configuration File Not Found
Symptoms:
Error: Config file not found: configs/react.json
FileNotFoundError: [Errno 2] No such file or directory
Solutions:
# Check file exists
ls -la configs/react.json
# Use absolute path
skill-seekers scrape --config /full/path/to/configs/react.json
# Create config directory
mkdir -p ~/.config/skill-seekers/configs
# Copy config
cp configs/react.json ~/.config/skill-seekers/configs/
# List available configs
skill-seekers-config list
Issue: Invalid Configuration Format
Symptoms:
json.decoder.JSONDecodeError: Expecting value: line 1 column 1
ValidationError: 1 validation error for Config
Solutions:
# Validate JSON syntax
python -m json.tool configs/myconfig.json
# Check required fields
skill-seekers-validate configs/myconfig.json
# Example valid config
cat > configs/test.json <<EOF
{
"name": "test",
"base_url": "https://docs.example.com/",
"selectors": {
"main_content": "article"
}
}
EOF
Scraping Issues
Issue: No Content Extracted
Symptoms:
Warning: No content found for URL
0 pages scraped
Empty SKILL.md generated
Solutions:
# Enable debug mode
export LOG_LEVEL=DEBUG
skill-seekers scrape --config config.json --verbose
# Test selectors manually
python -c "
from bs4 import BeautifulSoup
import requests
soup = BeautifulSoup(requests.get('URL').content, 'html.parser')
print(soup.select_one('article')) # Test selector
"
# Adjust selectors in config
{
"selectors": {
"main_content": "main", # Try different selectors
"title": "h1",
"code_blocks": "pre"
}
}
# Use fallback selectors
{
"selectors": {
"main_content": ["article", "main", ".content", "#content"]
}
}
Issue: Scraping Takes Too Long
Symptoms:
Scraping has been running for 2 hours...
Progress: 50/500 pages (10%)
Solutions:
# Enable async scraping (2-3x faster)
skill-seekers scrape --config config.json --async
# Reduce max pages
skill-seekers scrape --config config.json --max-pages 100
# Increase concurrency
# Edit config.json:
{
"concurrency": 20, # Default: 10
"rate_limit": 0.2 # Faster (0.2s delay)
}
# Use caching for re-runs
skill-seekers scrape --config config.json --use-cache
Issue: Pages Not Being Discovered
Symptoms:
Only 5 pages found
Expected 100+ pages
Solutions:
# Check URL patterns
{
"url_patterns": {
"include": ["/docs"], # Make sure this matches
"exclude": [] # Remove restrictive patterns
}
}
# Enable breadth-first search
{
"crawl_strategy": "bfs", # vs "dfs"
"max_depth": 10 # Increase depth
}
# Debug URL discovery
skill-seekers scrape --config config.json --dry-run --verbose
GitHub API Issues
Issue: Rate Limit Exceeded
Symptoms:
403 Forbidden
API rate limit exceeded for user
X-RateLimit-Remaining: 0
Solutions:
# Check current rate limit
curl -H "Authorization: token $GITHUB_TOKEN" \
https://api.github.com/rate_limit
# Use multiple tokens
skill-seekers config --github
# Follow wizard to add multiple profiles
# Wait for reset
# Check X-RateLimit-Reset header for timestamp
# Use non-interactive mode in CI/CD
skill-seekers github --repo owner/repo --non-interactive
# Configure rate limit strategy
skill-seekers config --github
# Choose: prompt / wait / switch / fail
Issue: Invalid GitHub Token
Symptoms:
401 Unauthorized
Bad credentials
Solutions:
# Verify token
curl -H "Authorization: token $GITHUB_TOKEN" \
https://api.github.com/user
# Generate new token
# Visit: https://github.com/settings/tokens
# Scopes needed: repo, read:org
# Update token
skill-seekers config --github
# Test token
skill-seekers config --test
Issue: Repository Not Found
Symptoms:
404 Not Found
Repository not found: owner/repo
Solutions:
# Check repository name (case-sensitive)
skill-seekers github --repo facebook/react # Correct
skill-seekers github --repo Facebook/React # Wrong
# Check if repo is private (requires token)
export GITHUB_TOKEN=ghp_...
skill-seekers github --repo private/repo
# Verify repo exists
curl https://api.github.com/repos/owner/repo
API & Enhancement Issues
Issue: Enhancement Fails
Symptoms:
Error: SKILL.md enhancement failed
AuthenticationError: Invalid API key
Solutions:
# Verify API key
skill-seekers config --test
# Try LOCAL mode (free, uses Claude Code Max)
skill-seekers enhance output/react/ --mode LOCAL
# Check API key format
# Claude: sk-ant-...
# OpenAI: sk-...
# Gemini: AIza...
# Test API directly
curl https://api.anthropic.com/v1/messages \
-H "x-api-key: $ANTHROPIC_API_KEY" \
-H "anthropic-version: 2023-06-01" \
-H "content-type: application/json" \
-d '{"model":"claude-sonnet-4.5","max_tokens":1024,"messages":[{"role":"user","content":"Hello"}]}'
Issue: Enhancement Hangs/Timeouts
Symptoms:
Enhancement process not responding
Timeout after 300 seconds
Solutions:
# Increase timeout
skill-seekers enhance output/react/ --timeout 600
# Run in background
skill-seekers enhance output/react/ --background
# Monitor status
skill-seekers enhance-status output/react/ --watch
# Kill hung process
ps aux | grep enhance
kill -9 <PID>
# Check system resources
htop
df -h
Issue: API Cost Concerns
Symptoms:
Worried about API costs for enhancement
Need free alternative
Solutions:
# Use LOCAL mode (free!)
skill-seekers enhance output/react/ --mode LOCAL
# Skip enhancement entirely
skill-seekers scrape --config config.json --skip-enhance
# Estimate cost before enhancing
# Claude API: ~$0.15-$0.30 per skill
# Check usage: https://console.anthropic.com/
# Use batch processing
for dir in output/*/; do
skill-seekers enhance "$dir" --mode LOCAL --background
done
Docker & Kubernetes Issues
Issue: Container Won't Start
Symptoms:
Error response from daemon: Container ... is not running
Container exits immediately
Solutions:
# Check logs
docker logs skillseekers-mcp
# Common issues:
# 1. Missing environment variables
docker run -e ANTHROPIC_API_KEY=$ANTHROPIC_API_KEY ...
# 2. Port already in use
sudo lsof -i :8765
docker run -p 8766:8765 ...
# 3. Permission issues
docker run --user $(id -u):$(id -g) ...
# Run interactively to debug
docker run -it --entrypoint /bin/bash skillseekers:latest
Issue: Kubernetes Pod CrashLoopBackOff
Symptoms:
NAME READY STATUS RESTARTS
skillseekers-mcp-xxx 0/1 CrashLoopBackOff 5
Solutions:
# Check pod logs
kubectl logs -n skillseekers skillseekers-mcp-xxx
# Describe pod
kubectl describe pod -n skillseekers skillseekers-mcp-xxx
# Check events
kubectl get events -n skillseekers --sort-by='.lastTimestamp'
# Common issues:
# 1. Missing secrets
kubectl get secrets -n skillseekers
# 2. Resource constraints
kubectl top nodes
kubectl edit deployment skillseekers-mcp -n skillseekers
# 3. Liveness probe failing
# Increase initialDelaySeconds in deployment
Issue: Image Pull Errors
Symptoms:
ErrImagePull
ImagePullBackOff
Failed to pull image
Solutions:
# Check image exists
docker pull skillseekers:latest
# Create image pull secret
kubectl create secret docker-registry regcred \
--docker-server=registry.example.com \
--docker-username=user \
--docker-password=pass \
-n skillseekers
# Add to deployment
spec:
imagePullSecrets:
- name: regcred
# Use public image (if available)
image: docker.io/skillseekers/skillseekers:latest
Performance Issues
Issue: High Memory Usage
Symptoms:
Process killed (OOM)
Memory usage: 8GB+
System swapping
Solutions:
# Check memory usage
ps aux --sort=-%mem | head -10
htop
# Reduce batch size
skill-seekers scrape --config config.json --batch-size 10
# Enable memory limits
# Docker:
docker run --memory=4g skillseekers:latest
# Kubernetes:
resources:
limits:
memory: 4Gi
# Clear cache
rm -rf ~/.cache/skill-seekers/
# Use streaming for large files
# (automatically handled by library)
Issue: Slow Performance
Symptoms:
Operations taking much longer than expected
High CPU usage
Disk I/O bottleneck
Solutions:
# Enable async operations
skill-seekers scrape --config config.json --async
# Increase concurrency
{
"concurrency": 20 # Adjust based on resources
}
# Use SSD for storage
# Move output to SSD:
mv output/ /mnt/ssd/output/
# Monitor performance
# CPU:
mpstat 1
# Disk I/O:
iostat -x 1
# Network:
iftop
# Profile code
python -m cProfile -o profile.stats \
-m skill_seekers.cli.doc_scraper --config config.json
Issue: Disk Space Issues
Symptoms:
No space left on device
Disk full
Cannot create file
Solutions:
# Check disk usage
df -h
du -sh output/*
# Clean up old skills
find output/ -type d -mtime +30 -exec rm -rf {} \;
# Compress old benchmarks
tar czf benchmarks-archive.tar.gz benchmarks/
rm -rf benchmarks/*.json
# Use cloud storage
skill-seekers scrape --config config.json \
--storage s3 \
--bucket my-skills-bucket
# Clear cache
skill-seekers cache --clear
Storage Issues
Issue: S3 Upload Fails
Symptoms:
botocore.exceptions.NoCredentialsError
AccessDenied
Solutions:
# Check credentials
aws sts get-caller-identity
# Configure AWS CLI
aws configure
# Set environment variables
export AWS_ACCESS_KEY_ID=...
export AWS_SECRET_ACCESS_KEY=...
export AWS_DEFAULT_REGION=us-east-1
# Check bucket permissions
aws s3 ls s3://my-bucket/
# Test upload
echo "test" > test.txt
aws s3 cp test.txt s3://my-bucket/
Issue: GCS Authentication Failed
Symptoms:
google.auth.exceptions.DefaultCredentialsError
Permission denied
Solutions:
# Set credentials file
export GOOGLE_APPLICATION_CREDENTIALS=/path/to/key.json
# Or use gcloud auth
gcloud auth application-default login
# Verify permissions
gsutil ls gs://my-bucket/
# Test upload
echo "test" > test.txt
gsutil cp test.txt gs://my-bucket/
Network Issues
Issue: Connection Timeouts
Symptoms:
requests.exceptions.ConnectionError
ReadTimeout
Connection refused
Solutions:
# Check network connectivity
ping google.com
curl https://docs.example.com/
# Increase timeout
{
"timeout": 60 # seconds
}
# Use proxy if behind firewall
export HTTP_PROXY=http://proxy.example.com:8080
export HTTPS_PROXY=http://proxy.example.com:8080
# Check DNS resolution
nslookup docs.example.com
dig docs.example.com
# Test with curl
curl -v https://docs.example.com/
Issue: SSL/TLS Errors
Symptoms:
ssl.SSLError: [SSL: CERTIFICATE_VERIFY_FAILED]
SSLCertVerificationError
Solutions:
# Update certificates
# Ubuntu/Debian:
sudo apt update && sudo apt install --reinstall ca-certificates
# RHEL/CentOS:
sudo yum reinstall ca-certificates
# As last resort (not recommended for production):
export PYTHONHTTPSVERIFY=0
# Or in code:
skill-seekers scrape --config config.json --no-verify-ssl
General Debug Techniques
Enable Debug Logging
# Set debug level
export LOG_LEVEL=DEBUG
# Run with verbose output
skill-seekers scrape --config config.json --verbose
# Save logs to file
skill-seekers scrape --config config.json 2>&1 | tee debug.log
Collect Diagnostic Information
# System info
uname -a
python --version
pip --version
# Package info
pip show skill-seekers
pip list | grep skill
# Environment
env | grep -E '(API_KEY|TOKEN|PATH)'
# Recent errors
grep -i error /var/log/skillseekers/*.log | tail -20
# Package all diagnostics
tar czf diagnostics.tar.gz \
debug.log \
~/.config/skill-seekers/ \
/var/log/skillseekers/
Test Individual Components
# Test scraper
python -c "
from skill_seekers.cli.doc_scraper import scrape_all
pages = scrape_all('configs/test.json')
print(f'Scraped {len(pages)} pages')
"
# Test GitHub API
python -c "
from skill_seekers.cli.github_fetcher import GitHubFetcher
fetcher = GitHubFetcher()
repo = fetcher.fetch('facebook/react')
print(repo['full_name'])
"
# Test embeddings
python -c "
from skill_seekers.embedding.generator import EmbeddingGenerator
gen = EmbeddingGenerator()
emb = gen.generate('test', model='text-embedding-3-small')
print(f'Embedding dimension: {len(emb)}')
"
Interactive Debugging
# Add breakpoint
import pdb; pdb.set_trace()
# Or use ipdb
import ipdb; ipdb.set_trace()
# Debug with IPython
ipython -i script.py
Getting More Help
If you're still experiencing issues:
- Search existing issues: https://github.com/yusufkaraaslan/Skill_Seekers/issues
- Check documentation: https://skillseekersweb.com/
- Ask on GitHub Discussions: https://github.com/yusufkaraaslan/Skill_Seekers/discussions
- Open a new issue: Include:
- Skill Seekers version (
skill-seekers --version) - Python version (
python --version) - Operating system
- Complete error message
- Steps to reproduce
- Diagnostic information (see above)
- Skill Seekers version (
Common Error Messages Reference
| Error | Cause | Solution |
|---|---|---|
ModuleNotFoundError |
Package not installed | pip install skill-seekers |
401 Unauthorized |
Invalid API key | Check API key format |
403 Forbidden |
Rate limit exceeded | Add more GitHub tokens |
404 Not Found |
Invalid URL/repo | Verify URL is correct |
429 Too Many Requests |
API rate limit | Wait or use multiple keys |
ConnectionError |
Network issue | Check internet connection |
TimeoutError |
Request too slow | Increase timeout |
MemoryError |
Out of memory | Reduce batch size |
PermissionError |
Access denied | Check file permissions |
FileNotFoundError |
Missing file | Verify file path |
Still stuck? Open an issue with the "help wanted" label and we'll assist you!