# Test Example Extraction (C3.2) **Transform test files into documentation assets by extracting real API usage patterns** ## Overview The Test Example Extractor analyzes test files to automatically extract meaningful usage examples showing: - **Object Instantiation**: Real parameter values and configuration - **Method Calls**: Expected behaviors and return values - **Configuration Examples**: Valid configuration dictionaries - **Setup Patterns**: Initialization from setUp() methods and pytest fixtures - **Multi-Step Workflows**: Integration test sequences ### Supported Languages (9) | Language | Extraction Method | Supported Features | |----------|------------------|-------------------| | **Python** | AST-based (deep) | All categories, high accuracy | | JavaScript | Regex patterns | Instantiation, assertions, configs | | TypeScript | Regex patterns | Instantiation, assertions, configs | | Go | Regex patterns | Table tests, assertions | | Rust | Regex patterns | Test macros, assertions | | Java | Regex patterns | JUnit patterns | | C# | Regex patterns | xUnit patterns | | PHP | Regex patterns | PHPUnit patterns | | Ruby | Regex patterns | RSpec patterns | ## Quick Start ### CLI Usage ```bash # Extract from directory skill-seekers extract-test-examples tests/ --language python # Extract from single file skill-seekers extract-test-examples --file tests/test_scraper.py # JSON output skill-seekers extract-test-examples tests/ --json > examples.json # Markdown output skill-seekers extract-test-examples tests/ --markdown > examples.md # Filter by confidence skill-seekers extract-test-examples tests/ --min-confidence 0.7 # Limit examples per file skill-seekers extract-test-examples tests/ --max-per-file 5 ``` ### MCP Tool Usage ```python # From Claude Code extract_test_examples(directory="tests/", language="python") # Single file with JSON output extract_test_examples(file="tests/test_api.py", json=True) # High confidence only extract_test_examples(directory="tests/", min_confidence=0.7) ``` ### Codebase Integration ```bash # Combine with codebase analysis skill-seekers analyze --directory . --extract-test-examples ``` ## Output Formats ### JSON Schema ```json { "total_examples": 42, "examples_by_category": { "instantiation": 15, "method_call": 12, "config": 8, "setup": 4, "workflow": 3 }, "examples_by_language": { "Python": 42 }, "avg_complexity": 0.65, "high_value_count": 28, "examples": [ { "example_id": "a3f2b1c0", "test_name": "test_database_connection", "category": "instantiation", "code": "db = Database(host=\"localhost\", port=5432)", "language": "Python", "description": "Instantiate Database: Test database connection", "expected_behavior": "self.assertTrue(db.connect())", "setup_code": null, "file_path": "tests/test_db.py", "line_start": 15, "line_end": 15, "complexity_score": 0.6, "confidence": 0.85, "tags": ["unittest"], "dependencies": ["unittest", "database"] } ] } ``` ### Markdown Format ```markdown # Test Example Extraction Report **Total Examples**: 42 **High Value Examples** (confidence > 0.7): 28 **Average Complexity**: 0.65 ## Examples by Category - **instantiation**: 15 - **method_call**: 12 - **config**: 8 - **setup**: 4 - **workflow**: 3 ## Extracted Examples ### test_database_connection **Category**: instantiation **Description**: Instantiate Database: Test database connection **Expected**: self.assertTrue(db.connect()) **Confidence**: 0.85 **Tags**: unittest ```python db = Database(host="localhost", port=5432) ``` *Source: tests/test_db.py:15* ``` ## Extraction Categories ### 1. Instantiation **Extracts**: Object creation with real parameters ```python # Example from test db = Database( host="localhost", port=5432, user="admin", password="secret" ) ``` **Use Case**: Shows valid initialization parameters ### 2. Method Call **Extracts**: Method calls followed by assertions ```python # Example from test response = api.get("/users/1") assert response.status_code == 200 ``` **Use Case**: Demonstrates expected behavior ### 3. Config **Extracts**: Configuration dictionaries (2+ keys) ```python # Example from test config = { "debug": True, "database_url": "postgresql://localhost/test", "cache_enabled": False } ``` **Use Case**: Shows valid configuration examples ### 4. Setup **Extracts**: setUp() methods and pytest fixtures ```python # Example from setUp self.client = APIClient(api_key="test-key") self.client.connect() ``` **Use Case**: Demonstrates initialization sequences ### 5. Workflow **Extracts**: Multi-step integration tests (3+ steps) ```python # Example workflow user = User(name="John", email="john@example.com") user.save() user.verify() session = user.login(password="secret") assert session.is_active ``` **Use Case**: Shows complete usage patterns ## Quality Filtering ### Confidence Scoring (0.0 - 1.0) - **Instantiation**: 0.8 (high - clear object creation) - **Method Call + Assertion**: 0.85 (very high - behavior proven) - **Config Dict**: 0.75 (good - clear configuration) - **Workflow**: 0.9 (excellent - complete pattern) ### Automatic Filtering **Removes**: - Trivial patterns: `assertTrue(True)`, `assertEqual(1, 1)` - Mock-only code: `Mock()`, `MagicMock()` - Too short: < 20 characters - Empty constructors: `MyClass()` with no parameters **Adjustable Thresholds**: ```bash # High confidence only (0.7+) --min-confidence 0.7 # Allow lower confidence for discovery --min-confidence 0.4 ``` ## Use Cases ### 1. Enhanced Documentation **Problem**: Documentation often lacks real usage examples **Solution**: Extract examples from working tests ```bash # Generate examples for SKILL.md skill-seekers extract-test-examples tests/ --markdown >> SKILL.md ``` ### 2. API Understanding **Problem**: New developers struggle with API usage **Solution**: Show how APIs are actually tested ### 3. Tutorial Generation **Problem**: Creating step-by-step guides is time-consuming **Solution**: Use workflow examples as tutorial steps ### 4. Configuration Examples **Problem**: Valid configuration is unclear **Solution**: Extract config dictionaries from tests ## Architecture ### Core Components ``` TestExampleExtractor (Orchestrator) ├── PythonTestAnalyzer (AST-based) │ ├── extract_from_test_class() │ ├── extract_from_test_function() │ ├── _find_instantiations() │ ├── _find_method_calls_with_assertions() │ ├── _find_config_dicts() │ └── _find_workflows() ├── GenericTestAnalyzer (Regex-based) │ └── PATTERNS (per-language regex) └── ExampleQualityFilter ├── filter() └── _is_trivial() ``` ### Data Flow 1. **Find Test Files**: Glob patterns (test_*.py, *_test.go, etc.) 2. **Detect Language**: File extension mapping 3. **Extract Examples**: - Python → PythonTestAnalyzer (AST) - Others → GenericTestAnalyzer (Regex) 4. **Apply Quality Filter**: Remove trivial patterns 5. **Limit Per File**: Top N by confidence 6. **Generate Report**: JSON or Markdown ## Limitations ### Current Scope - **Python**: Full AST-based extraction (all categories) - **Other Languages**: Regex-based (limited to common patterns) - **Focus**: Test files only (not production code) - **Complexity**: Simple to moderate test patterns ### Not Extracted - Complex mocking setups - Parameterized tests (partial support) - Nested helper functions - Dynamically generated tests ### Future Enhancements (Roadmap C3.3-C3.5) - C3.3: Build 'how to' guides from workflow examples - C3.4: Extract configuration patterns - C3.5: Architectural overview from test coverage ## Troubleshooting ### No Examples Extracted **Symptom**: `total_examples: 0` **Causes**: 1. Test files not found (check patterns: test_*.py, *_test.go) 2. Confidence threshold too high 3. Language not supported **Solutions**: ```bash # Lower confidence threshold --min-confidence 0.3 # Check test file detection ls tests/test_*.py # Verify language support --language python # Use supported language ``` ### Low Quality Examples **Symptom**: Many trivial or incomplete examples **Causes**: 1. Tests use heavy mocking 2. Tests are too simple 3. Confidence threshold too low **Solutions**: ```bash # Increase confidence threshold --min-confidence 0.7 # Reduce examples per file (get best only) --max-per-file 3 ``` ### Parsing Errors **Symptom**: `Failed to parse` warnings **Causes**: 1. Syntax errors in test files 2. Incompatible Python version 3. Dynamic code generation **Solutions**: - Fix syntax errors in test files - Ensure tests are valid Python/JS/Go code - Errors are logged but don't stop extraction ## Examples ### Python unittest ```python # tests/test_database.py import unittest class TestDatabase(unittest.TestCase): def test_connection(self): """Test database connection with real params""" db = Database( host="localhost", port=5432, user="admin", timeout=30 ) self.assertTrue(db.connect()) ``` **Extracts**: - Category: instantiation - Code: `db = Database(host="localhost", port=5432, user="admin", timeout=30)` - Confidence: 0.8 - Expected: `self.assertTrue(db.connect())` ### Python pytest ```python # tests/test_api.py import pytest @pytest.fixture def client(): return APIClient(base_url="https://api.test.com") def test_get_user(client): """Test fetching user data""" response = client.get("/users/123") assert response.status_code == 200 assert response.json()["id"] == 123 ``` **Extracts**: - Category: method_call - Setup: `# Fixtures: client` - Code: `response = client.get("/users/123")\nassert response.status_code == 200` - Confidence: 0.85 ### Go Table Test ```go // add_test.go func TestAdd(t *testing.T) { calc := Calculator{mode: "basic"} result := calc.Add(2, 3) if result != 5 { t.Errorf("Add(2, 3) = %d; want 5", result) } } ``` **Extracts**: - Category: instantiation - Code: `calc := Calculator{mode: "basic"}` - Confidence: 0.6 ## Performance | Metric | Value | |--------|-------| | Processing Speed | ~100 files/second (Python AST) | | Memory Usage | ~50MB for 1000 test files | | Example Quality | 80%+ high-confidence (>0.7) | | False Positives | <5% (with default filtering) | ## Integration Points ### 1. Standalone CLI ```bash skill-seekers extract-test-examples tests/ ``` ### 2. Codebase Analysis ```bash codebase-scraper --directory . --extract-test-examples ``` ### 3. MCP Server ```python # Via Claude Code extract_test_examples(directory="tests/") ``` ### 4. Python API ```python from skill_seekers.cli.test_example_extractor import TestExampleExtractor extractor = TestExampleExtractor(min_confidence=0.6) report = extractor.extract_from_directory("tests/") print(f"Found {report.total_examples} examples") for example in report.examples: print(f"- {example.test_name}: {example.code[:50]}...") ``` ## See Also - [Pattern Detection (C3.1)](../src/skill_seekers/cli/pattern_recognizer.py) - Detect design patterns - [Codebase Scraper](../src/skill_seekers/cli/codebase_scraper.py) - Analyze local repositories - [Unified Scraping](UNIFIED_SCRAPING.md) - Multi-source documentation --- **Status**: ✅ Implemented in v2.6.0 **Issue**: #TBD (C3.2) **Related Tasks**: C3.1 (Pattern Detection), C3.3-C3.5 (Future enhancements)