Commit Graph

84 Commits

Author SHA1 Message Date
yusyus
2a14309342 docs: update changelog, readme, and docs for v3.5.0
- Add CHANGELOG.md entry for v3.5.0 with all PR #336 changes
- Update README.md: version 3.5.0, agent-agnostic examples, marketplace
  pipeline, SPA discovery
- Update CLAUDE.md: AgentClient architecture, 40 MCP tools, new modules
- Update docs/: UML architecture, MCP reference (40 tools, new tool
  categories), enhancement modes (multi-provider/multi-agent), FAQ
- Update src/skill_seekers/mcp/README.md: accurate tool count and paths

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-02 04:57:32 +03:00
yusyus
6fded977dd feat: add Kotlin language support for codebase analysis (#287)
Adds full C3.x pipeline support for Kotlin (.kt, .kts):
- Language detection patterns (40+ weighted patterns for data/sealed classes, coroutines, companion objects, KMP, etc.)
- AST regex parser in code_analyzer.py (classes, objects, functions, extension functions, suspend functions)
- Dependency extraction for Kotlin import statements (with alias support)
- Design pattern adaptations (object→Singleton, companion→Factory, sealed→Strategy, data→Builder, Flow→Observer)
- Test example extraction for JUnit 4/5, Kotest, MockK, Spek
- Config detection for build.gradle.kts / settings.gradle.kts
- Extension maps registered in codebase_scraper, unified_codebase_analyzer, github_scraper, generate_router

Also fixes pre-existing parser count tests (35→36 for doctor command added in previous commit).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 23:25:12 +03:00
yusyus
ea4fed0be4 feat: add headless browser rendering for JavaScript SPA sites (#321)
New BrowserRenderer class uses Playwright to render JavaScript-heavy
documentation sites (React, Vue SPAs) that return empty HTML shells
with requests.get(). Activated via --browser flag on web scraping.

- browser_renderer.py: Playwright wrapper with lazy browser launch,
  auto-install Chromium on first use, context manager support
- doc_scraper.py: browser_mode config, _render_with_browser() helper,
  integrated into scrape_page() and scrape_page_async()
- SPA detection warnings now suggest --browser flag
- Optional dep: pip install "skill-seekers[browser]"
- 14 real e2e tests (actual Chromium, no mocks)
- UML updated: Scrapers class diagram (BrowserRenderer + dependency),
  Parsers (DoctorParser), Utilities (Doctor), Components, and new
  Browser Rendering sequence diagram (#20)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 22:06:14 +03:00
yusyus
006cccabae feat: add skill-seekers doctor health check command (#316)
8 diagnostic checks: Python version (3.10+), package install, git,
14 core deps, 10 optional deps, API keys, MCP server, output dir.
Each check reports pass/warn/fail with --verbose for extra detail.
Exit code 0 if no critical failures, 1 otherwise.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 21:27:17 +03:00
yusyus
43bdabb84f feat: add prompt injection check workflow for content security (#324)
New bundled workflow `prompt-injection-check` scans scraped content for
prompt injection patterns (role assumption, instruction overrides,
delimiter injection, hidden instructions, encoded payloads) using AI.

Flags suspicious content without removing it — preserves documentation
accuracy while warning about adversarial content. Added as first stage
in both `default` and `security-focus` workflows so it runs automatically
with --enhance-level >= 1.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 21:17:57 +03:00
yusyus
c6c17ada95 docs: add 6 behavioral UML diagrams verified against codebase
3 sequence diagrams (create command dispatch, GitHub+C3.x pipeline with
all 5 stages, MCP dual-path invocation), 2 activity diagrams (source
detection in correct code order, enhancement level flag mapping), and
1 component diagram with corrected runtime dependency arrows.

All diagrams cross-referenced against source code for accuracy.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 20:45:30 +03:00
yusyus
d381315340 fix: pass enhance_level instead of removed enhance_with_ai/ai_mode to analyze_codebase (#323)
Two call sites (_run_c3_analysis in unified_scraper.py and _analyze_c3x in
unified_codebase_analyzer.py) still passed the old enhance_with_ai and ai_mode
kwargs which were replaced by enhance_level. This caused a TypeError when
running C3.x codebase analysis.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 22:14:51 +03:00
yusyus
d71c1d3aa3 fix: filter non-integer metadata from GitHub languages API response (#322)
PyGithub's get_languages() returns raw API JSON which in some environments
includes non-integer metadata keys (e.g., "url"), causing a TypeError in
sum(). Now filters to integer values only before calculating percentages.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 23:44:52 +03:00
yusyus
d76ab1d9a4 fix: report accurate saved/skipped page counts and detect SPA sites (#320, #321)
The scraper previously reported len(visited_urls) as "Scraped N pages"
even when save_page() silently skipped pages with empty content (<50
chars). For JavaScript SPA sites this meant "Scraped 190 pages" followed
by "No scraped data found!" with no explanation.

Changes:
- Added pages_saved/pages_skipped counters to DocToSkillConverter
- save_page() now increments pages_skipped on skip, pages_saved on save
- New _log_scrape_completion() reports "(N saved, M skipped)" breakdown
- SPA detection warns when all/most pages have empty content
- build_skill() error now explains empty content cause when pages skipped
- Updated both sync and async scrape completion paths
- 14 new tests across 4 test classes (counting, messages, SPA, build)

Fixes #320

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-24 22:26:35 +03:00
yusyus
6bb7078fbc docs: update all documentation for 12 LLM platforms and 18 agents
- README.md + 11 i18n READMEs: 5→12 LLM platforms, 11→18 agents, new platform/agent tables
- CLAUDE.md: updated --target list, adaptor directory tree
- CHANGELOG.md: added v3.4.0 entry with all Phase 1-4 changes
- docs/reference/CLI_REFERENCE.md: new --target and --agent options
- docs/reference/FEATURE_MATRIX.md: updated all platform counts and tables
- docs/user-guide/04-packaging.md: new platform and agent rows
- docs/FAQ.md: expanded platform/agent answers
- docs/zh-CN/*: synchronized Chinese documentation

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-21 20:42:31 +03:00
yusyus
ca0890ba6f chore: bump version to 3.3.0 and finalize changelog
- Bump version in pyproject.toml: 3.2.0 -> 3.3.0
- Rename [Unreleased] to [3.3.0] - 2026-03-16 with theme line
- Add Supported Source Types (17) reference table
- Add 12 missing changelog entries:
  - feat: sync-config command (#306)
  - feat: best practices guide (#206)
  - docs: 32 files updated for 17 source types
  - docs: README translations for 10 languages
  - perf: pre-compiled regex, bisect line indexing, O(1) dedup (#309)
  - fix: Invalid IPv6 URL on bracket URLs (#284)
  - fix: GitHub scraper PaginatedList crash (#269)
  - fix: release workflow version mismatch and 3.10 compat
  - fix: infer_categories key mismatch
  - fix: flaky benchmark test
  - fix: CI branch protection pending
2026-03-16 00:23:48 +03:00
yusyus
53b911b697 feat: add 10 new skill source types (17 total) with full pipeline integration
Add Jupyter Notebook, Local HTML, OpenAPI/Swagger, AsciiDoc, PowerPoint,
RSS/Atom, Man Pages, Confluence, Notion, and Slack/Discord Chat as new
skill source types. Each type is fully integrated across:

- Standalone CLI commands (skill-seekers <type>)
- Auto-detection via 'skill-seekers create' (file extension + content sniffing)
- Unified multi-source configs (scraped_data, dispatch, config validation)
- Unified skill builder (generic merge + source-attributed synthesis)
- MCP server (scrape_generic tool with per-type flag mapping)
- pyproject.toml (entry points, optional deps, [all] group)

Also fixes: EPUB unified pipeline gap, missing word/video config validators,
OpenAPI yaml import guard, MCP flag mismatch for all 10 types, stale
docstrings, and adds 77 integration tests + complex-merge workflow.

50 files changed, +20,201 lines
2026-03-15 15:30:15 +03:00
yusyus
2e30970dfb feat: add EPUB input support (#310)
Adds EPUB as a first-class input source for skill generation.

- EpubToSkillConverter (epub_scraper.py, ~1200 lines) following PDF scraper pattern
- Dublin Core metadata, spine items, code blocks, tables, images extraction
- DRM detection (Adobe ADEPT, Apple FairPlay, Readium LCP) with fail-fast
- EPUB 3 NCX TOC bug workaround (ignore_ncx=True)
- ebooklib as optional dep: pip install skill-seekers[epub]
- Wired into create command with .epub auto-detection
- 104 tests, all passing

Review fixes: removed 3 empty test stubs, fixed SVG double-counting in
_extract_images(), added logger.debug to bare except pass.

Based on PR #310 by @christianbaumann.
Co-authored-by: Christian Baumann <mail@chriss-baumann.de>
2026-03-15 02:34:41 +03:00
yusyus
a535c7cf18 chore: bump version to 3.2.0 for release
Update version across pyproject.toml, _version.py fallbacks,
CHANGELOG.md, and README badges for v3.2.0 release.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-01 22:24:18 +03:00
yusyus
d19ad7d820 feat: video pipeline OCR quality fixes + two-pass AI enhancement
- Skip OCR on WEBCAM/OTHER frames (eliminates ~64 junk results per video)
- Add _clean_ocr_line() to strip line numbers, IDE decorations, collapse markers
- Add _fix_intra_line_duplication() for multi-engine OCR overlap artifacts
- Add _is_likely_code() filter to prevent UI junk in reference code fences
- Add language detection to get_text_groups() via LanguageDetector
- Apply OCR cleaning in _assemble_structured_text() pipeline
- Add two-pass AI enhancement: Pass 1 cleans reference Code Timeline
  using transcript context, Pass 2 generates SKILL.md from cleaned refs
- Update video-tutorial.yaml prompts for pre-cleaned references
- Add 17 new tests (197 total video tests), 2540 tests passing

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-01 21:48:21 +03:00
yusyus
bb54b3f7b6 docs: comprehensive changelog update for all changes since v3.1.3
Add missing video pipeline feature (the main 15K+ line addition),
15 video bug fixes, and restructure [Unreleased] section with
proper hierarchy: Video Pipeline Core → Video --setup → Word support.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-01 19:55:22 +03:00
yusyus
cc9cc32417 feat: add skill-seekers video --setup for GPU auto-detection and dependency installation
Auto-detects NVIDIA (CUDA), AMD (ROCm), or CPU-only GPU and installs the
correct PyTorch variant + easyocr + all visual extraction dependencies.
Removes easyocr from video-full pip extras to avoid pulling ~2GB of wrong
CUDA packages on non-NVIDIA systems.

New files:
- video_setup.py (835 lines): GPU detection, PyTorch install, ROCm config,
  venv checks, system dep validation, module selection, verification
- test_video_setup.py (60 tests): Full coverage of detection, install, verify

Updated docs: CHANGELOG, AGENTS.md, CLAUDE.md, README.md, CLI_REFERENCE,
FAQ, TROUBLESHOOTING, installation guide, video dependency plan

All 2523 tests passing (15 skipped).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-01 18:39:16 +03:00
yusyus
064405c052 fix: resolve 18 bugs and code quality issues across adaptors, CLI, and chunking pipeline
Bug fixes:
- Fix --var flag silently dropped in create routing (args.workflow_var → args.var)
- Fix double _score_code_quality() call in word scraper
- Add .docx file extension validation in WordToSkillConverter
- Fix weaviate ImportError masked by generic Exception handler
- Fix RAG chunking crash using non-existent converter.output_dir

Chunking pipeline improvements:
- Wire --chunk-overlap-tokens through entire package pipeline
  (package_skill → adaptor.package → format_skill_md → _maybe_chunk_content → RAGChunker)
- Add auto-scaling overlap: max(50, chunk_tokens//10) when chunk size is non-default
- Rename --no-preserve-code to --no-preserve-code-blocks (backward-compat alias kept)
- Replace hardcoded 512/50 chunk defaults with DEFAULT_CHUNK_TOKENS/DEFAULT_CHUNK_OVERLAP_TOKENS
  constants across all 12 concrete adaptors, rag_chunker, base, and package_skill

Code quality:
- Extract shared _generate_openai_embeddings() and _generate_st_embeddings() to SkillAdaptor
  base class, removing ~150 lines of duplication from chroma/weaviate/pinecone
- Add Pinecone adaptor with full upload support (pinecone_adaptor.py)

Tests (14 new):
- chunk_overlap_tokens parameter wiring, auto-scaling overlap, preserve_code_blocks flag
- .docx/.doc/no-extension file validation, --var flag routing E2E
- Embedding method inheritance verification, backward-compatible flag aliases

Docs:
- Update CHANGELOG, CLI_REFERENCE, API_REFERENCE, packaging guide (EN+ZH)
- Update README test count badge (1880+ → 2283+)

All 2283 tests passing, 8 skipped, 0 failures.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-28 21:57:59 +03:00
yusyus
3bad7cf365 fix: RAG chunking crash using non-existent converter.output_dir
DocToSkillConverter has self.skill_dir (string), not self.output_dir.
The --chunk-for-rag flag on scrape command crashed with AttributeError.
Changed to Path(converter.skill_dir).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-27 22:26:21 +03:00
yusyus
4b59bd43be docs: add issue #301 fix to changelog
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-26 22:36:19 +03:00
yusyus
4c8e16c8b1 fix(#300): centralize selector fallback, fix dry-run link discovery, and smart --config routing
- Add FALLBACK_MAIN_SELECTORS constant and _find_main_content() helper to
  eliminate 3 duplicated fallback loops in doc_scraper.py
- Move link extraction before early return in extract_content() so links
  are always discovered from the full page, not just main content
- Fix single-threaded dry-run to extract links from soup (full page)
  instead of main element only — fixes reactflow.dev finding only 1 page
- Add link extraction to async dry-run path (was completely missing)
- Remove main_content from get_configuration() defaults so fallback logic
  kicks in instead of a broad CSS comma selector matching body
- Smart create --config routing: peek at JSON to determine unified
  (sources array → unified_scraper) vs simple (base_url → doc_scraper)
- Update docs/user-guide/02-scraping.md and docs/reference/CONFIG_FORMAT.md
  to use unified config format (legacy format rejected since v2.11.0)
- Fix test_auto_fetch_enabled and test_mcp_validate_legacy_config

Closes #300

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-26 22:25:59 +03:00
yusyus
b6d4dd8423 fix: remove arbitrary limits, fix hardcoded languages, and fix summarizer bugs
Stage 1 quality improvements from the Arbitrary Limits & Dead Code audit:

Reference file truncation removed:
- codebase_scraper.py: remove code[:500] truncation at 5 locations — reference
  files now contain complete code blocks for copy-paste usability
- unified_skill_builder.py: remove issues[:20], releases[:10], body[:500],
  and code_snippet[:300] caps in reference files — full content preserved

Enhancement summarizer rewrite:
- enhance_skill_local.py: replace arbitrary [:5] code block cap with
  character-budget approach using target_ratio * content_chars
- Fix intro boundary bug: track code block state so intro never ends
  inside a code block, which was desynchronizing the parser
- Remove dead _target_lines variable (assigned but never used)
- Heading chunks now also respect the character budget

Hardcoded language fixes:
- unified_skill_builder.py: test examples use ex["language"] instead of
  always "python" for syntax highlighting
- how_to_guide_builder.py: add language field to HowToGuide dataclass,
  set from workflow at creation, used in AI enhancement prompt

Test fixes:
- test_enhance_skill_local.py: rename test to test_code_blocks_not_arbitrarily_capped,
  fix assertion to count actual blocks (```count // 2), use target_ratio=0.9

Documentation:
- Add Stage 1 plan, implementation summary, review, and corrected docs
- Update CHANGELOG.md with all changes

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-26 00:30:40 +03:00
yusyus
91d6340c3c chore: bump version to 3.1.3
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-24 22:24:03 +03:00
yusyus
bbc1674f77 docs: complete changelog for unreleased session work
Add missing entries to [Unreleased]:
- Issue #299 fix (package --target claude argument crash)
- package_skill.py argparser refactor (105-line inline → add_package_arguments())
- Expand setup_logging() entry to include doc_scraper.py

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-24 22:17:24 +03:00
yusyus
73adda0b17 docs: update all chunk flag names to match renamed CLI flags
Replace all occurrences of old ambiguous flag names with the new explicit ones:
  --chunk-size (tokens)  → --chunk-tokens
  --chunk-overlap        → --chunk-overlap-tokens
  --chunk                → --chunk-for-rag
  --streaming-chunk-size → --streaming-chunk-chars
  --streaming-overlap    → --streaming-overlap-chars
  --chunk-size (pages)   → --pdf-pages-per-chunk

Updated: CLI_REFERENCE (EN+ZH), user-guide (EN+ZH), integrations (Haystack,
Chroma, Weaviate, FAISS, Qdrant), features/PDF_CHUNKING, examples/haystack-pipeline,
strategy docs, archive docs, and CHANGELOG.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-24 22:15:14 +03:00
yusyus
93ed5c79a8 chore: bump version to 3.1.2 and update CHANGELOG
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-24 07:09:22 +03:00
YusufKaraaslanSpyke
3adc5a8c1d fix: unify scraper argument interface and fix create command forwarding
All scrapers (scrape, github, analyze, pdf) now share a common argument
contract via add_all_standard_arguments() in arguments/common.py.
Universal flags (--dry-run, --verbose, --quiet, --name, --description,
workflow args) work consistently across all source types.

Previously, `create <url> --dry-run`, `create owner/repo --dry-run`,
and `create ./path --dry-run` would crash because sub-scrapers didn't
accept those flags. Also fixes main.py _handle_analyze_command() not
forwarding --dry-run, --preset, --quiet, --name, --description to
codebase_scraper.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-23 20:56:13 +03:00
Claude
40cec4dffd hotfix: v3.1.1 — fix create command max_pages AttributeError
Merge fix from development (#293, #294) and bump version to 3.1.1.
Fixes crash when max_pages argument was not provided in web source routing.

https://claude.ai/code/session_01HS5q7ghjfEUravNPZRCGux
2026-02-23 06:37:39 +00:00
yusyus
d799a8d8c8 chore: update CHANGELOG for v3.1.0 release — add configs work, correct test count
- Update date to 2026-02-23
- Update test count: 2115 → 2280+ (2158 non-MCP + ~122 MCP)
- Add "Config Repository" section documenting all 178 configs reviewed,
  max_pages removed, URL fixes, structural fixes, doc/script alignment

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-23 01:35:34 +03:00
yusyus
f7117c35a9 chore: bump version to 3.1.0 and update CHANGELOG
- pyproject.toml: version 3.0.0 → 3.1.0
- src/skill_seekers/_version.py: update hardcoded fallback to 3.1.0
- CHANGELOG.md: comprehensive [3.1.0] release notes covering all
  features and fixes since v3.0.0 (unified create command, workflow
  presets, RST parser, smart enhance dispatcher, CLI flag parity,
  60 new workflow YAMLs, test suite improvements)
- Deprecation messages: update "removed in v3.0.0" → "v4.0.0" across
  analyze_presets.py, codebase_scraper.py, mcp/server.py
- tests/test_cli_paths.py: update version assertion to 3.1.0
- tests/test_package_structure.py: update __version__ assertions to 3.1.0
- tests/test_preset_system.py: update deprecation message version to v4.0.0

All 2267 tests passing.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-22 21:52:04 +03:00
yusyus
ba9a8ff8b5 docs: complete documentation overhaul with v3.1.0 release notes and zh-CN translations
Documentation restructure:
- New docs/getting-started/ guide (4 files: install, quick-start, first-skill, next-steps)
- New docs/user-guide/ section (6 files: core concepts through troubleshooting)
- New docs/reference/ section (CLI_REFERENCE, CONFIG_FORMAT, ENVIRONMENT_VARIABLES, MCP_REFERENCE)
- New docs/advanced/ section (custom-workflows, mcp-server, multi-source)
- New docs/ARCHITECTURE.md - system architecture overview
- Archived legacy files (QUICKSTART.md, QUICK_REFERENCE.md, docs/guides/USAGE.md) to docs/archive/legacy/

Chinese (zh-CN) translations:
- Full zh-CN mirror of all user-facing docs (getting-started, user-guide, reference, advanced)
- GitHub Actions workflow for translation sync (.github/workflows/translate-docs.yml)
- Translation sync checker script (scripts/check_translation_sync.sh)
- Translation helper script (scripts/translate_doc.py)

Content updates:
- CHANGELOG.md: [Unreleased] → [3.1.0] - 2026-02-22
- README.md: updated with new doc structure links
- AGENTS.md: updated agent documentation
- docs/features/UNIFIED_SCRAPING.md: updated for unified scraper workflow JSON config

Analysis/planning artifacts (kept for reference):
- DOCUMENTATION_OVERHAUL_PLAN.md, DOCUMENTATION_OVERHAUL_SUMMARY.md
- FEATURE_GAP_ANALYSIS.md, IMPLEMENTATION_GAPS_ANALYSIS.md, CREATE_COMMAND_COVERAGE_ANALYSIS.md
- CHINESE_TRANSLATION_IMPLEMENTATION_SUMMARY.md, ISSUE_260_UPDATE.md

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-22 01:01:51 +03:00
yusyus
265214ac27 feat: enhancement workflow preset system with multi-target CLI
- Add YAML-based enhancement workflow presets shipped inside the package
  (default, minimal, security-focus, architecture-comprehensive, api-documentation)
- Add `skill-seekers workflows` subcommand: list, show, copy, add, remove, validate
- copy/add/remove all accept multiple names/files in one invocation with partial-failure behaviour
- `add --name` override restricted to single-file operations
- Add 5 MCP tools: list_workflows, get_workflow, create_workflow, update_workflow, delete_workflow
- Fix: create command _add_common_args() now correctly forwards each --enhance-workflow
  as a separate flag instead of passing the whole list as a single argument
- Update README: reposition as "data layer for AI systems" with AI Skills front and centre
- Update CHANGELOG, QUICK_REFERENCE, CLAUDE.md with workflow preset details
- 1,880+ tests passing

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-18 21:22:16 +03:00
yusyus
394882cb5b Release v3.0.0 - Universal Intelligence Platform
Major release with 16 platform adaptors, 26 MCP tools, and 1,852 tests.

Highlights:
- 16 platform adaptors (up from 4): LangChain, LlamaIndex, Chroma, FAISS,
  Haystack, Qdrant, Weaviate, Cursor, Windsurf, Cline, Continue.dev, and more
- 26 MCP tools (up from 9) for AI agent integration
- Cloud storage support (S3, GCS, Azure)
- GitHub Action and Docker support for CI/CD
- 1,852 tests across 100 test files
- 12 example projects for every integration
- 18 comprehensive integration guides

Version updates:
- pyproject.toml: 2.9.0 -> 3.0.0
- _version.py: 2.8.0 -> 3.0.0
- CHANGELOG.md: Added v3.0.0 section
- README.md: Updated badges and messaging
2026-02-08 14:24:58 +03:00
yusyus
a82cf6967a fix: Strip anchor fragments in URL conversion to prevent 404 errors (fixes #277)
Critical bug fix for llms.txt URL parsing:

Problem:
- URLs with anchor fragments (e.g., #synchronous-initialization) were
  malformed when converting to .md format
- Example: https://example.com/api#methodhttps://example.com/api#method/index.html.md 
- Caused 404 errors and duplicate requests for same page with different anchors

Solution:
1. Parse URLs with urllib.parse.urlparse() to extract fragments
2. Strip anchor fragments before appending /index.html.md
3. Deduplicate base URLs (multiple anchors → single request)
4. Fix .md detection: '.md' in url → url.endswith('.md')
   - Prevents false matches on URLs like /cmd-line or /AMD-processors

Changes:
- src/skill_seekers/cli/doc_scraper.py (_convert_to_md_urls)
  - Added URL parsing to remove fragments
  - Added deduplication with seen_base_urls set
  - Fixed .md extension detection
  - Updated log message to show deduplicated count
- tests/test_url_conversion.py (NEW)
  - 12 comprehensive tests covering all edge cases
  - Real-world MikroORM case validation
  - 54/54 tests passing (42 existing + 12 new)
- CHANGELOG.md
  - Documented bug fix and solution

Reported-by: @devjones <https://github.com/yusufkaraaslan/Skill_Seekers/issues/277>
2026-02-04 21:16:13 +03:00
yusyus
8f99ed0003 docs: Add documentation for 7 new programming languages
Update documentation for PR #275 extended language detection:
- CHANGELOG.md: Add comprehensive section for new languages
- language_detector.py: Update docstrings from 20+ to 27+ languages

New languages:
- Dart (Flutter framework)
- Scala (pattern matching, case classes)
- SCSS/SASS (CSS preprocessors)
- Elixir (functional, pipe operator)
- Lua (game scripting)
- Perl (text processing)

70 regex patterns with confidence scoring (0.6-0.8+ thresholds)
7 new tests, 30/30 passing (100%)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-04 21:01:40 +03:00
yusyus
2b104dc021 docs: Add multi-agent support documentation
Update documentation for PR #270 multi-agent enhancement feature:
- CHANGELOG.md: Add comprehensive section for multi-agent support
- README.md: Update LOCAL Enhancement section with agent options
- ENHANCEMENT_MODES.md: Add multi-agent guide with security details

Includes:
- Agent selection (claude, codex, copilot, opencode, custom)
- CLI flags and environment variables
- Security validation details
- Agent aliases and normalization
- Usage examples for all modes

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-04 20:52:46 +03:00
yusyus
2d64a2be48 docs: Mark C3.10 as NEW feature in CHANGELOG 2026-02-02 23:16:40 +03:00
yusyus
809f00cb2c Merge feature/fix-csharp-and-config-type-bugs: C3.10 Signal Flow + Complete Godot Support
Features:
- C3.10: Signal Flow Analysis for Godot projects (208 signals, 634 connections)
- Complete Godot game engine support (.gd, .tscn, .tres, .gdshader)
- GDScript dependency extraction with preload/load/extends patterns
- GDScript test extraction (GUT, gdUnit4, WAT frameworks)
- Signal-based how-to guides generation

Fixes:
- GDScript dependency extraction (265+ syntax errors eliminated)
- Framework detection false positive (Unity → Godot)
- Circular dependency detection (self-loops filtered)
- GDScript test discovery (32 test files found)
- Config extractor array handling (JSON/YAML root arrays)
- Progress indicators for small batches

Tests:
- Added comprehensive GDScript test extraction test case
- 396 test cases extracted from 20 GUT test files
2026-02-02 23:10:51 +03:00
yusyus
174ce0a8fd docs: Update CHANGELOG with C3.10 Signal Flow Analysis and Godot features 2026-02-02 23:10:00 +03:00
yusyus
5292a79ad1 chore: Release v2.8.0
Major feature release with enhanced code analysis and documentation.

Features:
- C3.9: Project documentation extraction
- Granular AI enhancement control (--enhance-level 0-3)
- C# language support for test extraction
- 6-12x faster parallel LOCAL mode AI enhancement
- Auto-enhancement and LOCAL mode fallbacks
- GLM-4.7 and custom Claude-compatible API support

Bug Fixes:
- Fixed C# test extraction language errors
- Fixed config type field mismatch
- Fixed LocalSkillEnhancer import issues
- Fixed critical linter errors

Contributors:
- @xuintl - Chinese README improvements
- @Zhichang Yu - GLM-4.7 support and PDF fixes
- @YusufKaraaslanSpyke - Core features and maintenance

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-01 17:03:33 +03:00
yusyus
86e77e2a30 chore: Post-merge cleanup - remove client docs and fix linter errors
- Remove SPYKE-related client documentation files
- Fix critical ruff linter errors:
  - Remove unused 'os' import in test_analyze_e2e.py
  - Remove unused 'setups' variable in test_test_example_extractor.py
  - Prefix unused output_dir parameter in codebase_scraper.py
  - Fix import sorting in test_integration.py
- Update CHANGELOG.md with comprehensive PR #272 feature documentation

These changes were part of PR #272 cleanup but didn't make it into the squash merge.
2026-01-31 14:58:09 +03:00
yusyus
03ac78173b chore: Remove client-specific docs, fix linter errors, update documentation
- Remove SPYKE-related client documentation files
- Fix critical ruff linter errors:
  - Remove unused 'os' import in test_analyze_e2e.py
  - Remove unused 'setups' variable in test_test_example_extractor.py
  - Prefix unused output_dir parameter with underscore in codebase_scraper.py
  - Fix import sorting in test_integration.py
- Update CHANGELOG.md with comprehensive C3.9 and enhancement features
- Update CLAUDE.md with --enhance-level documentation

All critical code quality issues resolved.
2026-01-31 14:38:15 +03:00
yusyus
5a78522dbc docs: Update all documentation to use new 'analyze' command
- Update Chinese README (README.zh-CN.md) with new preset flags
- Update docs/features/*.md (PATTERN_DETECTION, HOW_TO_GUIDES, BOOTSTRAP_SKILL_TECHNICAL)
- Update scripts/bootstrap_skill.sh to use 'skill-seekers analyze'
- Update scripts/skill_header.md command examples
- Update tests/test_bootstrap_skill.py assertions
- Fix CHANGELOG.md historical entry with correct command name

All references to 'skill-seekers-codebase' updated to 'skill-seekers analyze'
except where needed for backward compatibility (pyproject.toml, E2E tests).

Related to Phase 1 implementation from previous commits.
2026-01-29 22:56:33 +03:00
Zhichang Yu
9435d2911d feat: Add GLM-4.7 support and fix PDF scraper issues (#266)
Merging with admin override due to known issues:

 **What Works**:
- GLM-4.7 Claude-compatible API support (correctly implemented)
- PDF scraper improvements (content truncation fixed, page traceability added)  
- Documentation updates comprehensive

⚠️ **Known Issues (will be fixed in next commit)**:
1. Import bugs in 3 files causing UnboundLocalError (30 tests failing)
2. PDF scraper test expectations need updating for new behavior (5 tests failing)
3. test_godot_config failure (pre-existing, not caused by this PR - 1 test failing)

**Action Plan**:
Fixes for issues #1 and #2 are ready and will be committed immediately after merge.
Issue #3 requires separate investigation as it's a pre-existing problem.

Total: 36 failing tests, 35 will be fixed in next commit.
2026-01-27 21:10:40 +03:00
yusyus
2855b59165 chore: Bump version to 2.7.4 for language link fix
This patch release fixes the broken Chinese language selector link
on PyPI by using absolute GitHub URLs instead of relative paths.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-22 00:12:08 +03:00
yusyus
02ce5b4a33 chore: Bump version to 2.7.3 for i18n documentation release
This patch release focuses on internationalization and making Skill Seekers
accessible to the Chinese developer community.

Key updates:
- Complete Chinese (简体中文) README translation
- PyPI metadata updated with i18n support
- Natural Language classifiers added
- Community engagement issue created

See CHANGELOG.md for complete release notes.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-22 00:00:27 +03:00
yusyus
ac53017ec8 chore: Bump version to 2.7.2 for hotfix release
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-21 23:23:33 +03:00
yusyus
dc6b82f06d chore: Bump version to 2.7.1 for hotfix release
Version Bump:
- pyproject.toml: 2.8.0-dev → 2.7.1
- src/skill_seekers/__init__.py: 2.8.0-dev → 2.7.1
- src/skill_seekers/cli/__init__.py: 2.8.0-dev → 2.7.1
- src/skill_seekers/mcp/__init__.py: 2.8.0-dev → 2.7.1
- src/skill_seekers/mcp/tools/__init__.py: 2.8.0-dev → 2.7.1

CHANGELOG:
- Added v2.7.1 entry documenting critical config download bug fix
- Root cause, solution, files fixed, impact, and testing documented

This hotfix resolves the critical 404 error bug when downloading configs
from the skillseekersweb.com API.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-18 22:39:34 +03:00
yusyus
b13a5c67af docs: Add missing items to v2.7.0 CHANGELOG
Added:
- Git Submodules for Configuration Management
- Config Discovery Enhancements (--all flag)
2026-01-18 14:49:45 +03:00
yusyus
84f0f99595 docs: Update CHANGELOG.md for v2.7.0 release
Added all changes from 2026-01-18 to v2.7.0 section:

### Added
- Documentation overhaul (7 new files, 10 updated files)

### Fixed
- Code quality improvements (21 ruff errors fixed)
- Version synchronization (Issue #248)
- Case-insensitive regex (Issue #236)
- Test fixture error
- MCP setup modernization (PR #252)

Updated release date to 2026-01-18.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-18 14:15:52 +03:00