# Troubleshooting Guide Comprehensive guide for diagnosing and resolving common issues with Skill Seekers. ## Table of Contents - [Installation Issues](#installation-issues) - [Configuration Issues](#configuration-issues) - [Scraping Issues](#scraping-issues) - [GitHub API Issues](#github-api-issues) - [API & Enhancement Issues](#api--enhancement-issues) - [Docker & Kubernetes Issues](#docker--kubernetes-issues) - [Performance Issues](#performance-issues) - [Storage Issues](#storage-issues) - [Network Issues](#network-issues) - [General Debug Techniques](#general-debug-techniques) ## Installation Issues ### Issue: Package Installation Fails **Symptoms:** ``` ERROR: Could not build wheels for... ERROR: Failed building wheel for... ``` **Solutions:** ```bash # Update pip and setuptools python -m pip install --upgrade pip setuptools wheel # Install build dependencies (Ubuntu/Debian) sudo apt install python3-dev build-essential libssl-dev # Install build dependencies (RHEL/CentOS) sudo yum install python3-devel gcc gcc-c++ openssl-devel # Retry installation pip install skill-seekers ``` ### Issue: Command Not Found After Installation **Symptoms:** ```bash $ skill-seekers --version bash: skill-seekers: command not found ``` **Solutions:** ```bash # Check if installed pip show skill-seekers # Add to PATH export PATH="$HOME/.local/bin:$PATH" # Or reinstall with --user flag pip install --user skill-seekers # Verify which skill-seekers ``` ### Issue: Python Version Mismatch **Symptoms:** ``` ERROR: Package requires Python >=3.10 but you are running 3.9 ``` **Solutions:** ```bash # Check Python version python --version python3 --version # Use specific Python version python3.12 -m pip install skill-seekers # Create alias alias python=python3.12 # Or use pyenv pyenv install 3.12 pyenv global 3.12 ``` ### Issue: Video Visual Dependencies Missing **Symptoms:** ``` Missing video dependencies: easyocr RuntimeError: Required video visual dependencies not installed ``` **Solutions:** ```bash # Run the GPU-aware setup command skill-seekers video --setup # This auto-detects your GPU and installs: # - PyTorch (correct CUDA/ROCm/CPU variant) # - easyocr, opencv, pytesseract, scenedetect, faster-whisper # - yt-dlp, youtube-transcript-api # Verify installation python -c "import torch; print(f'PyTorch: {torch.__version__}, CUDA: {torch.cuda.is_available()}')" python -c "import easyocr; print('easyocr OK')" ``` **Common issues:** - Running outside a virtual environment → `--setup` will warn you; create a venv first - Missing system packages → Install `tesseract-ocr` and `ffmpeg` for your OS - AMD GPU without ROCm → Install ROCm first, then re-run `--setup` ## Configuration Issues ### Issue: API Keys Not Recognized **Symptoms:** ``` Error: ANTHROPIC_API_KEY not found 401 Unauthorized ``` **Solutions:** ```bash # Check environment variables env | grep API_KEY # Set in current session export ANTHROPIC_API_KEY=sk-ant-... # Set permanently (~/.bashrc or ~/.zshrc) echo 'export ANTHROPIC_API_KEY=sk-ant-...' >> ~/.bashrc source ~/.bashrc # Or use .env file cat > .env < configs/test.json < # Check system resources htop df -h ``` ### Issue: API Cost Concerns **Symptoms:** ``` Worried about API costs for enhancement Need free alternative ``` **Solutions:** ```bash # Use LOCAL mode (free!) skill-seekers enhance output/react/ --mode LOCAL # Skip enhancement entirely skill-seekers scrape --config config.json --skip-enhance # Estimate cost before enhancing # Claude API: ~$0.15-$0.30 per skill # Check usage: https://console.anthropic.com/ # Use batch processing for dir in output/*/; do skill-seekers enhance "$dir" --mode LOCAL --background done ``` ## Docker & Kubernetes Issues ### Issue: Container Won't Start **Symptoms:** ``` Error response from daemon: Container ... is not running Container exits immediately ``` **Solutions:** ```bash # Check logs docker logs skillseekers-mcp # Common issues: # 1. Missing environment variables docker run -e ANTHROPIC_API_KEY=$ANTHROPIC_API_KEY ... # 2. Port already in use sudo lsof -i :8765 docker run -p 8766:8765 ... # 3. Permission issues docker run --user $(id -u):$(id -g) ... # Run interactively to debug docker run -it --entrypoint /bin/bash skillseekers:latest ``` ### Issue: Kubernetes Pod CrashLoopBackOff **Symptoms:** ``` NAME READY STATUS RESTARTS skillseekers-mcp-xxx 0/1 CrashLoopBackOff 5 ``` **Solutions:** ```bash # Check pod logs kubectl logs -n skillseekers skillseekers-mcp-xxx # Describe pod kubectl describe pod -n skillseekers skillseekers-mcp-xxx # Check events kubectl get events -n skillseekers --sort-by='.lastTimestamp' # Common issues: # 1. Missing secrets kubectl get secrets -n skillseekers # 2. Resource constraints kubectl top nodes kubectl edit deployment skillseekers-mcp -n skillseekers # 3. Liveness probe failing # Increase initialDelaySeconds in deployment ``` ### Issue: Image Pull Errors **Symptoms:** ``` ErrImagePull ImagePullBackOff Failed to pull image ``` **Solutions:** ```bash # Check image exists docker pull skillseekers:latest # Create image pull secret kubectl create secret docker-registry regcred \ --docker-server=registry.example.com \ --docker-username=user \ --docker-password=pass \ -n skillseekers # Add to deployment spec: imagePullSecrets: - name: regcred # Use public image (if available) image: docker.io/skillseekers/skillseekers:latest ``` ## Performance Issues ### Issue: High Memory Usage **Symptoms:** ``` Process killed (OOM) Memory usage: 8GB+ System swapping ``` **Solutions:** ```bash # Check memory usage ps aux --sort=-%mem | head -10 htop # Reduce batch size skill-seekers scrape --config config.json --batch-size 10 # Enable memory limits # Docker: docker run --memory=4g skillseekers:latest # Kubernetes: resources: limits: memory: 4Gi # Clear cache rm -rf ~/.cache/skill-seekers/ # Use streaming for large files # (automatically handled by library) ``` ### Issue: Slow Performance **Symptoms:** ``` Operations taking much longer than expected High CPU usage Disk I/O bottleneck ``` **Solutions:** ```bash # Enable async operations skill-seekers scrape --config config.json --async # Increase concurrency { "concurrency": 20 # Adjust based on resources } # Use SSD for storage # Move output to SSD: mv output/ /mnt/ssd/output/ # Monitor performance # CPU: mpstat 1 # Disk I/O: iostat -x 1 # Network: iftop # Profile code python -m cProfile -o profile.stats \ -m skill_seekers.cli.doc_scraper --config config.json ``` ### Issue: Disk Space Issues **Symptoms:** ``` No space left on device Disk full Cannot create file ``` **Solutions:** ```bash # Check disk usage df -h du -sh output/* # Clean up old skills find output/ -type d -mtime +30 -exec rm -rf {} \; # Compress old benchmarks tar czf benchmarks-archive.tar.gz benchmarks/ rm -rf benchmarks/*.json # Use cloud storage skill-seekers scrape --config config.json \ --storage s3 \ --bucket my-skills-bucket # Clear cache skill-seekers cache --clear ``` ## Storage Issues ### Issue: S3 Upload Fails **Symptoms:** ``` botocore.exceptions.NoCredentialsError AccessDenied ``` **Solutions:** ```bash # Check credentials aws sts get-caller-identity # Configure AWS CLI aws configure # Set environment variables export AWS_ACCESS_KEY_ID=... export AWS_SECRET_ACCESS_KEY=... export AWS_DEFAULT_REGION=us-east-1 # Check bucket permissions aws s3 ls s3://my-bucket/ # Test upload echo "test" > test.txt aws s3 cp test.txt s3://my-bucket/ ``` ### Issue: GCS Authentication Failed **Symptoms:** ``` google.auth.exceptions.DefaultCredentialsError Permission denied ``` **Solutions:** ```bash # Set credentials file export GOOGLE_APPLICATION_CREDENTIALS=/path/to/key.json # Or use gcloud auth gcloud auth application-default login # Verify permissions gsutil ls gs://my-bucket/ # Test upload echo "test" > test.txt gsutil cp test.txt gs://my-bucket/ ``` ## Network Issues ### Issue: Connection Timeouts **Symptoms:** ``` requests.exceptions.ConnectionError ReadTimeout Connection refused ``` **Solutions:** ```bash # Check network connectivity ping google.com curl https://docs.example.com/ # Increase timeout { "timeout": 60 # seconds } # Use proxy if behind firewall export HTTP_PROXY=http://proxy.example.com:8080 export HTTPS_PROXY=http://proxy.example.com:8080 # Check DNS resolution nslookup docs.example.com dig docs.example.com # Test with curl curl -v https://docs.example.com/ ``` ### Issue: SSL/TLS Errors **Symptoms:** ``` ssl.SSLError: [SSL: CERTIFICATE_VERIFY_FAILED] SSLCertVerificationError ``` **Solutions:** ```bash # Update certificates # Ubuntu/Debian: sudo apt update && sudo apt install --reinstall ca-certificates # RHEL/CentOS: sudo yum reinstall ca-certificates # As last resort (not recommended for production): export PYTHONHTTPSVERIFY=0 # Or in code: skill-seekers scrape --config config.json --no-verify-ssl ``` ## General Debug Techniques ### Enable Debug Logging ```bash # Set debug level export LOG_LEVEL=DEBUG # Run with verbose output skill-seekers scrape --config config.json --verbose # Save logs to file skill-seekers scrape --config config.json 2>&1 | tee debug.log ``` ### Collect Diagnostic Information ```bash # System info uname -a python --version pip --version # Package info pip show skill-seekers pip list | grep skill # Environment env | grep -E '(API_KEY|TOKEN|PATH)' # Recent errors grep -i error /var/log/skillseekers/*.log | tail -20 # Package all diagnostics tar czf diagnostics.tar.gz \ debug.log \ ~/.config/skill-seekers/ \ /var/log/skillseekers/ ``` ### Test Individual Components ```bash # Test scraper python -c " from skill_seekers.cli.doc_scraper import scrape_all pages = scrape_all('configs/test.json') print(f'Scraped {len(pages)} pages') " # Test GitHub API python -c " from skill_seekers.cli.github_fetcher import GitHubFetcher fetcher = GitHubFetcher() repo = fetcher.fetch('facebook/react') print(repo['full_name']) " # Test embeddings python -c " from skill_seekers.embedding.generator import EmbeddingGenerator gen = EmbeddingGenerator() emb = gen.generate('test', model='text-embedding-3-small') print(f'Embedding dimension: {len(emb)}') " ``` ### Interactive Debugging ```python # Add breakpoint import pdb; pdb.set_trace() # Or use ipdb import ipdb; ipdb.set_trace() # Debug with IPython ipython -i script.py ``` ## Getting More Help If you're still experiencing issues: 1. **Search existing issues:** https://github.com/yusufkaraaslan/Skill_Seekers/issues 2. **Check documentation:** https://skillseekersweb.com/ 3. **Ask on GitHub Discussions:** https://github.com/yusufkaraaslan/Skill_Seekers/discussions 4. **Open a new issue:** Include: - Skill Seekers version (`skill-seekers --version`) - Python version (`python --version`) - Operating system - Complete error message - Steps to reproduce - Diagnostic information (see above) ## Common Error Messages Reference | Error | Cause | Solution | |-------|-------|----------| | `ModuleNotFoundError` | Package not installed | `pip install skill-seekers` | | `401 Unauthorized` | Invalid API key | Check API key format | | `403 Forbidden` | Rate limit exceeded | Add more GitHub tokens | | `404 Not Found` | Invalid URL/repo | Verify URL is correct | | `429 Too Many Requests` | API rate limit | Wait or use multiple keys | | `ConnectionError` | Network issue | Check internet connection | | `TimeoutError` | Request too slow | Increase timeout | | `MemoryError` | Out of memory | Reduce batch size | | `PermissionError` | Access denied | Check file permissions | | `FileNotFoundError` | Missing file | Verify file path | --- **Still stuck?** Open an issue with the "help wanted" label and we'll assist you!