Files
skill-seekers-reference/docs/reference/API_REFERENCE.md
yusyus 6f1d0a9a45 docs: Comprehensive markdown documentation update for v2.7.0
Documentation Overhaul (7 new files, ~4,750 lines)

Version Consistency Updates:
- Updated all version references to v2.7.0 (ROADMAP.md)
- Standardized test counts to 1200+ tests (README.md, Quality Assurance)
- Updated MCP tool references to 18 tools (CHANGELOG.md)

New Documentation Files:
1. docs/reference/API_REFERENCE.md (750 lines)
   - Complete programmatic usage guide for Python integration
   - All 8 core APIs documented with examples
   - Configuration schema reference and error handling
   - CI/CD integration examples (GitHub Actions, GitLab CI)
   - Performance optimization and batch processing

2. docs/features/BOOTSTRAP_SKILL.md (450 lines)
   - Self-hosting capability documentation (dogfooding)
   - Architecture and workflow explanation (3 components)
   - Troubleshooting and testing guide
   - CI/CD integration examples
   - Advanced usage and customization

3. docs/reference/CODE_QUALITY.md (550 lines)
   - Comprehensive Ruff linting documentation
   - All 21 v2.7.0 fixes explained with examples
   - Testing requirements and coverage standards
   - CI/CD integration (GitHub Actions, pre-commit hooks)
   - Security scanning with Bandit
   - Development workflow best practices

4. docs/guides/TESTING_GUIDE.md (750 lines)
   - Complete testing reference (1200+ tests)
   - Unit, integration, E2E, and MCP testing guides
   - Coverage analysis and improvement strategies
   - Debugging tests and troubleshooting
   - CI/CD matrix testing (2 OS, 4 Python versions)
   - Best practices and common patterns

5. docs/QUICK_REFERENCE.md (300 lines)
   - One-page cheat sheet for quick lookup
   - All CLI commands with examples
   - Common workflows and shortcuts
   - Environment variables and configurations
   - Tips & tricks for power users

6. docs/guides/MIGRATION_GUIDE.md (400 lines)
   - Version upgrade guides (v1.0.0 → v2.7.0)
   - Breaking changes and migration steps
   - Compatibility tables for all versions
   - Rollback instructions
   - Common migration issues and solutions

7. docs/FAQ.md (550 lines)
   - Comprehensive Q&A covering all major topics
   - Installation, usage, platforms, features
   - Troubleshooting shortcuts
   - Platform-specific questions
   - Advanced usage and programmatic integration

Navigation Improvements:
- Added "New in v2.7.0" section to docs/README.md
- Integrated all new docs into navigation structure
- Enhanced "Finding What You Need" section with new entries
- Updated developer quick links (testing, code quality, API)
- Cross-referenced related documentation

Documentation Quality:
- All version references consistent (v2.7.0)
- Test counts standardized (1200+ tests)
- MCP tool counts accurate (18 tools)
- All internal links validated
- Format consistency maintained
- Proper heading hierarchy

Impact:
- 64 markdown files reviewed and validated
- 7 new documentation files created (~4,750 lines)
- 4 files updated (ROADMAP, README, CHANGELOG, docs/README)
- Comprehensive coverage of all v2.7.0 features
- Enhanced developer onboarding experience
- Improved user documentation accessibility

Related Issues:
- Addresses documentation gaps identified in v2.7.0 planning
- Supports code quality improvements (21 ruff fixes)
- Documents bootstrap skill feature
- Provides migration path for users upgrading from older versions

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-18 01:16:22 +03:00

23 KiB

API Reference - Programmatic Usage

Version: 2.7.0 Last Updated: 2026-01-18 Status: Production Ready


Overview

Skill Seekers can be used programmatically for integration into other tools, automation scripts, and CI/CD pipelines. This guide covers the public APIs available for developers who want to embed Skill Seekers functionality into their own applications.

Use Cases:

  • Automated documentation skill generation in CI/CD
  • Batch processing multiple documentation sources
  • Custom skill generation workflows
  • Integration with internal tooling
  • Automated skill updates on documentation changes

Installation

Basic Installation

pip install skill-seekers

With Platform Dependencies

# Google Gemini support
pip install skill-seekers[gemini]

# OpenAI ChatGPT support
pip install skill-seekers[openai]

# All platform support
pip install skill-seekers[all-llms]

Development Installation

git clone https://github.com/yusufkaraaslan/Skill_Seekers.git
cd Skill_Seekers
pip install -e ".[all-llms]"

Core APIs

1. Documentation Scraping API

Extract content from documentation websites using BFS traversal and smart categorization.

Basic Usage

from skill_seekers.cli.doc_scraper import scrape_all, build_skill
import json

# Load configuration
with open('configs/react.json', 'r') as f:
    config = json.load(f)

# Scrape documentation
pages = scrape_all(
    base_url=config['base_url'],
    selectors=config['selectors'],
    config=config,
    output_dir='output/react_data'
)

print(f"Scraped {len(pages)} pages")

# Build skill from scraped data
skill_path = build_skill(
    config_name='react',
    output_dir='output/react',
    data_dir='output/react_data'
)

print(f"Skill created at: {skill_path}")

Advanced Scraping Options

from skill_seekers.cli.doc_scraper import scrape_all

# Custom scraping with advanced options
pages = scrape_all(
    base_url='https://docs.example.com',
    selectors={
        'main_content': 'article',
        'title': 'h1',
        'code_blocks': 'pre code'
    },
    config={
        'name': 'my-framework',
        'description': 'Custom framework documentation',
        'rate_limit': 0.5,  # 0.5 second delay between requests
        'max_pages': 500,   # Limit to 500 pages
        'url_patterns': {
            'include': ['/docs/'],
            'exclude': ['/blog/', '/changelog/']
        }
    },
    output_dir='output/my-framework_data',
    use_async=True  # Enable async scraping (2-3x faster)
)

Rebuilding Without Scraping

from skill_seekers.cli.doc_scraper import build_skill

# Rebuild skill from existing data (fast!)
skill_path = build_skill(
    config_name='react',
    output_dir='output/react',
    data_dir='output/react_data',  # Use existing scraped data
    skip_scrape=True  # Don't re-scrape
)

2. GitHub Repository Analysis API

Analyze GitHub repositories with three-stream architecture (Code + Docs + Insights).

Basic GitHub Analysis

from skill_seekers.cli.github_scraper import scrape_github_repo

# Analyze GitHub repository
result = scrape_github_repo(
    repo_url='https://github.com/facebook/react',
    output_dir='output/react-github',
    analysis_depth='c3x',  # Options: 'basic' or 'c3x'
    github_token='ghp_...'  # Optional: higher rate limits
)

print(f"Analysis complete: {result['skill_path']}")
print(f"Code files analyzed: {result['stats']['code_files']}")
print(f"Patterns detected: {result['stats']['patterns']}")

Stream-Specific Analysis

from skill_seekers.cli.github_scraper import scrape_github_repo

# Focus on specific streams
result = scrape_github_repo(
    repo_url='https://github.com/vercel/next.js',
    output_dir='output/nextjs',
    analysis_depth='c3x',
    enable_code_stream=True,      # C3.x codebase analysis
    enable_docs_stream=True,      # README, docs/, wiki
    enable_insights_stream=True,  # GitHub metadata, issues
    include_tests=True,           # Extract test examples
    include_patterns=True,        # Detect design patterns
    include_how_to_guides=True    # Generate guides from tests
)

3. PDF Extraction API

Extract content from PDF documents with OCR and image support.

Basic PDF Extraction

from skill_seekers.cli.pdf_scraper import scrape_pdf

# Extract from single PDF
skill_path = scrape_pdf(
    pdf_path='documentation.pdf',
    output_dir='output/pdf-skill',
    skill_name='my-pdf-skill',
    description='Documentation from PDF'
)

print(f"PDF skill created: {skill_path}")

Advanced PDF Processing

from skill_seekers.cli.pdf_scraper import scrape_pdf

# PDF extraction with all features
skill_path = scrape_pdf(
    pdf_path='large-manual.pdf',
    output_dir='output/manual',
    skill_name='product-manual',
    description='Product manual documentation',
    enable_ocr=True,              # OCR for scanned PDFs
    extract_images=True,          # Extract embedded images
    extract_tables=True,          # Parse tables
    chunk_size=50,                # Pages per chunk (large PDFs)
    language='eng',               # OCR language
    dpi=300                       # Image DPI for OCR
)

4. Unified Multi-Source Scraping API

Combine multiple sources (docs + GitHub + PDF) into a single unified skill.

Unified Scraping

from skill_seekers.cli.unified_scraper import unified_scrape

# Scrape from multiple sources
result = unified_scrape(
    config_path='configs/unified/react-unified.json',
    output_dir='output/react-complete'
)

print(f"Unified skill created: {result['skill_path']}")
print(f"Sources merged: {result['sources']}")
print(f"Conflicts detected: {result['conflicts']}")

Conflict Detection

from skill_seekers.cli.unified_scraper import detect_conflicts

# Detect discrepancies between sources
conflicts = detect_conflicts(
    docs_dir='output/react_data',
    github_dir='output/react-github',
    pdf_dir='output/react-pdf'
)

for conflict in conflicts:
    print(f"Conflict in {conflict['topic']}:")
    print(f"  Docs say: {conflict['docs_version']}")
    print(f"  Code shows: {conflict['code_version']}")

5. Skill Packaging API

Package skills for different LLM platforms using the platform adaptor architecture.

Basic Packaging

from skill_seekers.cli.adaptors import get_adaptor

# Get platform-specific adaptor
adaptor = get_adaptor('claude')  # Options: claude, gemini, openai, markdown

# Package skill
package_path = adaptor.package(
    skill_dir='output/react/',
    output_path='output/'
)

print(f"Claude skill package: {package_path}")

Multi-Platform Packaging

from skill_seekers.cli.adaptors import get_adaptor

# Package for all platforms
platforms = ['claude', 'gemini', 'openai', 'markdown']

for platform in platforms:
    adaptor = get_adaptor(platform)
    package_path = adaptor.package(
        skill_dir='output/react/',
        output_path='output/'
    )
    print(f"{platform.capitalize()} package: {package_path}")

Custom Packaging Options

from skill_seekers.cli.adaptors import get_adaptor

adaptor = get_adaptor('gemini')

# Gemini-specific packaging (.tar.gz format)
package_path = adaptor.package(
    skill_dir='output/react/',
    output_path='output/',
    compress_level=9,  # Maximum compression
    include_metadata=True
)

6. Skill Upload API

Upload packaged skills to LLM platforms via their APIs.

Claude AI Upload

import os
from skill_seekers.cli.adaptors import get_adaptor

adaptor = get_adaptor('claude')

# Upload to Claude AI
result = adaptor.upload(
    package_path='output/react-claude.zip',
    api_key=os.getenv('ANTHROPIC_API_KEY')
)

print(f"Uploaded to Claude AI: {result['skill_id']}")

Google Gemini Upload

import os
from skill_seekers.cli.adaptors import get_adaptor

adaptor = get_adaptor('gemini')

# Upload to Google Gemini
result = adaptor.upload(
    package_path='output/react-gemini.tar.gz',
    api_key=os.getenv('GOOGLE_API_KEY')
)

print(f"Gemini corpus ID: {result['corpus_id']}")

OpenAI ChatGPT Upload

import os
from skill_seekers.cli.adaptors import get_adaptor

adaptor = get_adaptor('openai')

# Upload to OpenAI Vector Store
result = adaptor.upload(
    package_path='output/react-openai.zip',
    api_key=os.getenv('OPENAI_API_KEY')
)

print(f"Vector store ID: {result['vector_store_id']}")

7. AI Enhancement API

Enhance skills with AI-powered improvements using platform-specific models.

API Mode Enhancement

import os
from skill_seekers.cli.adaptors import get_adaptor

adaptor = get_adaptor('claude')

# Enhance using Claude API
result = adaptor.enhance(
    skill_dir='output/react/',
    mode='api',
    api_key=os.getenv('ANTHROPIC_API_KEY')
)

print(f"Enhanced skill: {result['enhanced_path']}")
print(f"Quality score: {result['quality_score']}/10")

LOCAL Mode Enhancement

from skill_seekers.cli.adaptors import get_adaptor

adaptor = get_adaptor('claude')

# Enhance using Claude Code CLI (free!)
result = adaptor.enhance(
    skill_dir='output/react/',
    mode='LOCAL',
    execution_mode='headless',  # Options: headless, background, daemon
    timeout=300  # 5 minute timeout
)

print(f"Enhanced skill: {result['enhanced_path']}")

Background Enhancement with Monitoring

from skill_seekers.cli.enhance_skill_local import enhance_skill
from skill_seekers.cli.enhance_status import monitor_enhancement
import time

# Start background enhancement
result = enhance_skill(
    skill_dir='output/react/',
    mode='background'
)

pid = result['pid']
print(f"Enhancement started in background (PID: {pid})")

# Monitor progress
while True:
    status = monitor_enhancement('output/react/')
    print(f"Status: {status['state']}, Progress: {status['progress']}%")

    if status['state'] == 'completed':
        print(f"Enhanced skill: {status['output_path']}")
        break
    elif status['state'] == 'failed':
        print(f"Enhancement failed: {status['error']}")
        break

    time.sleep(5)  # Check every 5 seconds

8. Complete Workflow Automation API

Automate the entire workflow: fetch config → scrape → enhance → package → upload.

One-Command Install

import os
from skill_seekers.cli.install_skill import install_skill

# Complete workflow automation
result = install_skill(
    config_name='react',  # Use preset config
    target='claude',      # Target platform
    api_key=os.getenv('ANTHROPIC_API_KEY'),
    enhance=True,         # Enable AI enhancement
    upload=True,          # Upload to platform
    force=True            # Skip confirmations
)

print(f"Skill installed: {result['skill_id']}")
print(f"Package path: {result['package_path']}")
print(f"Time taken: {result['duration']}s")

Custom Config Install

from skill_seekers.cli.install_skill import install_skill

# Install with custom configuration
result = install_skill(
    config_path='configs/custom/my-framework.json',
    target='gemini',
    api_key=os.getenv('GOOGLE_API_KEY'),
    enhance=True,
    upload=True,
    analysis_depth='c3x',  # Deep codebase analysis
    enable_router=True     # Generate router for large docs
)

Configuration Objects

Config Schema

Skill Seekers uses JSON configuration files to define scraping behavior.

{
  "name": "framework-name",
  "description": "When to use this skill",
  "base_url": "https://docs.example.com/",
  "selectors": {
    "main_content": "article",
    "title": "h1",
    "code_blocks": "pre code",
    "navigation": "nav.sidebar"
  },
  "url_patterns": {
    "include": ["/docs/", "/api/", "/guides/"],
    "exclude": ["/blog/", "/changelog/", "/archive/"]
  },
  "categories": {
    "getting_started": ["intro", "quickstart", "installation"],
    "api": ["api", "reference", "methods"],
    "guides": ["guide", "tutorial", "how-to"],
    "examples": ["example", "demo", "sample"]
  },
  "rate_limit": 0.5,
  "max_pages": 500,
  "llms_txt_url": "https://example.com/llms.txt",
  "enable_async": true
}

Required Fields

Field Type Description
name string Skill name (alphanumeric + hyphens)
description string When to use this skill
base_url string Documentation website URL
selectors object CSS selectors for content extraction

Optional Fields

Field Type Default Description
url_patterns.include array [] URL path patterns to include
url_patterns.exclude array [] URL path patterns to exclude
categories object {} Category keywords mapping
rate_limit float 0.5 Delay between requests (seconds)
max_pages int 500 Maximum pages to scrape
llms_txt_url string null URL to llms.txt file
enable_async bool false Enable async scraping (faster)

Unified Config Schema (Multi-Source)

{
  "name": "framework-unified",
  "description": "Complete framework documentation",
  "sources": {
    "documentation": {
      "type": "docs",
      "base_url": "https://docs.example.com/",
      "selectors": { "main_content": "article" }
    },
    "github": {
      "type": "github",
      "repo_url": "https://github.com/org/repo",
      "analysis_depth": "c3x"
    },
    "pdf": {
      "type": "pdf",
      "pdf_path": "manual.pdf",
      "enable_ocr": true
    }
  },
  "conflict_resolution": "prefer_code",
  "merge_strategy": "smart"
}

Advanced Options

Custom Selectors

from skill_seekers.cli.doc_scraper import scrape_all

# Custom CSS selectors for complex sites
pages = scrape_all(
    base_url='https://complex-site.com',
    selectors={
        'main_content': 'div.content-wrapper > article',
        'title': 'h1.page-title',
        'code_blocks': 'pre.highlight code',
        'navigation': 'aside.sidebar nav',
        'metadata': 'meta[name="description"]'
    },
    config={'name': 'complex-site'}
)

URL Pattern Matching

# Advanced URL filtering
config = {
    'url_patterns': {
        'include': [
            '/docs/',           # Exact path match
            '/api/**',          # Wildcard: all subpaths
            '/guides/v2.*'      # Regex: version-specific
        ],
        'exclude': [
            '/blog/',
            '/changelog/',
            '**/*.png',         # Exclude images
            '**/*.pdf'          # Exclude PDFs
        ]
    }
}

Category Inference

from skill_seekers.cli.doc_scraper import infer_categories

# Auto-detect categories from URL structure
categories = infer_categories(
    pages=[
        {'url': 'https://docs.example.com/getting-started/intro'},
        {'url': 'https://docs.example.com/api/authentication'},
        {'url': 'https://docs.example.com/guides/tutorial'}
    ]
)

print(categories)
# Output: {
#   'getting-started': ['intro'],
#   'api': ['authentication'],
#   'guides': ['tutorial']
# }

Error Handling

Common Exceptions

from skill_seekers.cli.doc_scraper import scrape_all
from skill_seekers.exceptions import (
    NetworkError,
    InvalidConfigError,
    ScrapingError,
    RateLimitError
)

try:
    pages = scrape_all(
        base_url='https://docs.example.com',
        selectors={'main_content': 'article'},
        config={'name': 'example'}
    )
except NetworkError as e:
    print(f"Network error: {e}")
    # Retry with exponential backoff
except InvalidConfigError as e:
    print(f"Invalid config: {e}")
    # Fix configuration and retry
except RateLimitError as e:
    print(f"Rate limited: {e}")
    # Increase rate_limit in config
except ScrapingError as e:
    print(f"Scraping failed: {e}")
    # Check selectors and URL patterns

Retry Logic

from skill_seekers.cli.doc_scraper import scrape_all
from skill_seekers.utils import retry_with_backoff

@retry_with_backoff(max_retries=3, base_delay=1.0)
def scrape_with_retry(base_url, config):
    return scrape_all(
        base_url=base_url,
        selectors=config['selectors'],
        config=config
    )

# Automatically retries on network errors
pages = scrape_with_retry(
    base_url='https://docs.example.com',
    config={'name': 'example', 'selectors': {...}}
)

Testing Your Integration

Unit Tests

import pytest
from skill_seekers.cli.doc_scraper import scrape_all

def test_basic_scraping():
    """Test basic documentation scraping."""
    pages = scrape_all(
        base_url='https://docs.example.com',
        selectors={'main_content': 'article'},
        config={
            'name': 'test-framework',
            'max_pages': 10  # Limit for testing
        }
    )

    assert len(pages) > 0
    assert all('title' in p for p in pages)
    assert all('content' in p for p in pages)

def test_config_validation():
    """Test configuration validation."""
    from skill_seekers.cli.config_validator import validate_config

    config = {
        'name': 'test',
        'base_url': 'https://example.com',
        'selectors': {'main_content': 'article'}
    }

    is_valid, errors = validate_config(config)
    assert is_valid
    assert len(errors) == 0

Integration Tests

import pytest
import os
from skill_seekers.cli.install_skill import install_skill

@pytest.mark.integration
def test_end_to_end_workflow():
    """Test complete skill installation workflow."""
    result = install_skill(
        config_name='react',
        target='markdown',  # No API key needed for markdown
        enhance=False,      # Skip AI enhancement
        upload=False,       # Don't upload
        force=True
    )

    assert result['success']
    assert os.path.exists(result['package_path'])
    assert result['package_path'].endswith('.zip')

@pytest.mark.integration
def test_multi_platform_packaging():
    """Test packaging for multiple platforms."""
    from skill_seekers.cli.adaptors import get_adaptor

    platforms = ['claude', 'gemini', 'openai', 'markdown']

    for platform in platforms:
        adaptor = get_adaptor(platform)
        package_path = adaptor.package(
            skill_dir='output/test-skill/',
            output_path='output/'
        )
        assert os.path.exists(package_path)

Performance Optimization

Async Scraping

from skill_seekers.cli.doc_scraper import scrape_all

# Enable async for 2-3x speed improvement
pages = scrape_all(
    base_url='https://docs.example.com',
    selectors={'main_content': 'article'},
    config={'name': 'example'},
    use_async=True  # 2-3x faster
)

Caching and Rebuilding

from skill_seekers.cli.doc_scraper import build_skill

# First scrape (slow - 15-45 minutes)
build_skill(config_name='react', output_dir='output/react')

# Rebuild without re-scraping (fast - <1 minute)
build_skill(
    config_name='react',
    output_dir='output/react',
    data_dir='output/react_data',
    skip_scrape=True  # Use cached data
)

Batch Processing

from concurrent.futures import ThreadPoolExecutor
from skill_seekers.cli.install_skill import install_skill

configs = ['react', 'vue', 'angular', 'svelte']

def install_config(config_name):
    return install_skill(
        config_name=config_name,
        target='markdown',
        enhance=False,
        upload=False,
        force=True
    )

# Process 4 configs in parallel
with ThreadPoolExecutor(max_workers=4) as executor:
    results = list(executor.map(install_config, configs))

for config, result in zip(configs, results):
    print(f"{config}: {result['success']}")

CI/CD Integration Examples

GitHub Actions

name: Generate Skills

on:
  schedule:
    - cron: '0 0 * * *'  # Daily at midnight
  workflow_dispatch:

jobs:
  generate-skills:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3

      - uses: actions/setup-python@v4
        with:
          python-version: '3.11'

      - name: Install Skill Seekers
        run: pip install skill-seekers[all-llms]

      - name: Generate Skills
        env:
          ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
          GOOGLE_API_KEY: ${{ secrets.GOOGLE_API_KEY }}
        run: |
          skill-seekers install react --target claude --enhance --upload
          skill-seekers install vue --target gemini --enhance --upload

      - name: Archive Skills
        uses: actions/upload-artifact@v3
        with:
          name: skills
          path: output/**/*.zip

GitLab CI

generate_skills:
  image: python:3.11
  script:
    - pip install skill-seekers[all-llms]
    - skill-seekers install react --target claude --enhance --upload
    - skill-seekers install vue --target gemini --enhance --upload
  artifacts:
    paths:
      - output/
  only:
    - schedules

Best Practices

1. Use Configuration Files

Store configs in version control for reproducibility:

import json
with open('configs/my-framework.json') as f:
    config = json.load(f)
scrape_all(config=config)

2. Enable Async for Large Sites

pages = scrape_all(base_url=url, config=config, use_async=True)

3. Cache Scraped Data

# Scrape once
scrape_all(config=config, output_dir='output/data')

# Rebuild many times (fast!)
build_skill(config_name='framework', data_dir='output/data', skip_scrape=True)

4. Use Platform Adaptors

# Good: Platform-agnostic
adaptor = get_adaptor(target_platform)
adaptor.package(skill_dir)

# Bad: Hardcoded for one platform
# create_zip_for_claude(skill_dir)

5. Handle Errors Gracefully

try:
    result = install_skill(config_name='framework', target='claude')
except NetworkError:
    # Retry logic
except InvalidConfigError:
    # Fix config

6. Monitor Background Enhancements

# Start enhancement
enhance_skill(skill_dir='output/react/', mode='background')

# Monitor progress
monitor_enhancement('output/react/', watch=True)

API Reference Summary

API Module Use Case
Documentation Scraping doc_scraper Extract from docs websites
GitHub Analysis github_scraper Analyze code repositories
PDF Extraction pdf_scraper Extract from PDF files
Unified Scraping unified_scraper Multi-source scraping
Skill Packaging adaptors Package for LLM platforms
Skill Upload adaptors Upload to platforms
AI Enhancement adaptors Improve skill quality
Complete Workflow install_skill End-to-end automation

Additional Resources


Version: 2.7.0 Last Updated: 2026-01-18 Status: Production Ready