Commit Graph

612 Commits

Author SHA1 Message Date
yusyus
b81d55fda0 feat(B2): add Microsoft Word (.docx) support
Implements ROADMAP task B2 — full .docx scraping support via mammoth +
python-docx, producing SKILL.md + references/ output identical to other
source types.

New files:
- src/skill_seekers/cli/word_scraper.py — WordToSkillConverter class +
  main() entry point (~600 lines); mammoth → BeautifulSoup pipeline;
  handles headings, code detection (incl. monospace <p><br> blocks),
  tables, images, metadata extraction
- src/skill_seekers/cli/arguments/word.py — add_word_arguments() +
  WORD_ARGUMENTS dict
- src/skill_seekers/cli/parsers/word_parser.py — WordParser for unified
  CLI parser registry
- tests/test_word_scraper.py — comprehensive test suite (~300 lines)

Modified files:
- src/skill_seekers/cli/main.py — registered "word" command module
- src/skill_seekers/cli/source_detector.py — .docx auto-detection +
  _detect_word() classmethod
- src/skill_seekers/cli/create_command.py — _route_word() + --help-word
- src/skill_seekers/cli/arguments/create.py — WORD_ARGUMENTS + routing
- src/skill_seekers/cli/arguments/__init__.py — export word args
- src/skill_seekers/cli/parsers/__init__.py — register WordParser
- src/skill_seekers/cli/unified_scraper.py — _scrape_word() integration
- src/skill_seekers/cli/pdf_scraper.py — fix: real enhancement instead
  of stub; remove [:3] reference file limit; capture run_workflows return
- src/skill_seekers/cli/github_scraper.py — fix: remove arbitrary
  open_issues[:20] / closed_issues[:10] reference file limits
- pyproject.toml — skill-seekers-word entry point + docx optional dep
- tests/test_cli_parsers.py — update parser count 21→22

Bug fixes applied during real-world testing:
- Code detection: detect monospace <p><br> blocks as code (mammoth
  renders Courier paragraphs this way, not as <pre>/<code>)
- Language detector: fix wrong method name detect_from_text →
  detect_from_code
- Description inference: pass None from main() so extract_docx() can
  infer description from Word document subject/title metadata
- Bullet-point guard: exclude prose starting with •/-/* from code scoring
- Enhancement: implement real API/LOCAL enhancement (was stub)
- pip install message: add quotes around skill-seekers[docx]

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-25 21:47:30 +03:00
yusyus
e42aade992 style: auto-format 6 files with ruff format (CI formatting check)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-24 22:28:11 +03:00
yusyus
91d6340c3c chore: bump version to 3.1.3
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-24 22:24:03 +03:00
yusyus
bbc1674f77 docs: complete changelog for unreleased session work
Add missing entries to [Unreleased]:
- Issue #299 fix (package --target claude argument crash)
- package_skill.py argparser refactor (105-line inline → add_package_arguments())
- Expand setup_logging() entry to include doc_scraper.py

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-24 22:17:24 +03:00
yusyus
73adda0b17 docs: update all chunk flag names to match renamed CLI flags
Replace all occurrences of old ambiguous flag names with the new explicit ones:
  --chunk-size (tokens)  → --chunk-tokens
  --chunk-overlap        → --chunk-overlap-tokens
  --chunk                → --chunk-for-rag
  --streaming-chunk-size → --streaming-chunk-chars
  --streaming-overlap    → --streaming-overlap-chars
  --chunk-size (pages)   → --pdf-pages-per-chunk

Updated: CLI_REFERENCE (EN+ZH), user-guide (EN+ZH), integrations (Haystack,
Chroma, Weaviate, FAISS, Qdrant), features/PDF_CHUNKING, examples/haystack-pipeline,
strategy docs, archive docs, and CHANGELOG.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-24 22:15:14 +03:00
yusyus
7a2ffb286c refactor: rename all chunk flags to include explicit units
Replace ambiguous --chunk-size / --chunk-overlap names that meant different
things in different contexts (tokens vs characters) with fully explicit names:

- --chunk-size (RAG tokens)     → --chunk-tokens
- --chunk-overlap (RAG tokens)  → --chunk-overlap-tokens
- --chunk (enable RAG chunking) → --chunk-for-rag
- --streaming-chunk-size (chars) → --streaming-chunk-chars
- --streaming-overlap (chars)    → --streaming-overlap-chars
- --chunk-size (PDF pages)       → --pdf-pages-per-chunk (poc file)

Also aligns stream_parser.py help with streaming_ingest.py standalone parser.
All 2167 tests pass.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-24 22:07:56 +03:00
yusyus
b636a0a292 fix: resolve issue #299 and Phase 1 cleanup
- Fix #299: rename --chunk-size/--chunk-overlap to --streaming-chunk-size/
  --streaming-overlap in arguments/package.py to avoid collision with the
  RAG --chunk-size flag from arguments/common.py
- Phase 1a: make package_skill.py import args via add_package_arguments()
  instead of a 105-line inline duplicate argparse block; fixes the root
  cause of _reconstruct_argv() passing unrecognised flag names
- Phase 1b: centralise setup_logging() into utils.py and remove 4
  duplicate module-level logging.basicConfig() calls from doc_scraper.py,
  github_scraper.py, codebase_scraper.py, and unified_scraper.py
- Fix test_package_structure.py / test_cli_paths.py version strings
  (3.1.1 → 3.1.2)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-24 21:22:05 +03:00
yusyus
90e5e8f557 Merge pull request #298 from yusufkaraaslan/development
hotfix: v3.1.2 — Gemini model fix, enhance dispatcher, arg forwarding
2026-02-24 07:09:39 +03:00
yusyus
93ed5c79a8 chore: bump version to 3.1.2 and update CHANGELOG
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-24 07:09:22 +03:00
yusyus
1229ff2baf style: auto-format enhance_skill_local.py and test with ruff
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-24 07:05:50 +03:00
yusyus
5ae57d192a fix: update Gemini model to 2.5-flash and add API auto-detection in enhance
Fix 1 — gemini.py: replace deprecated gemini-2.0-flash-exp (404 errors)
with gemini-2.5-flash (stable, GA, Google's recommended replacement).
Closes #290.

Fix 2 — enhance dispatcher: implement the documented auto-detection that
was missing from the code. skill-seekers enhance now correctly routes:
  - ANTHROPIC_API_KEY set → Claude API mode (enhance_skill.py)
  - GOOGLE_API_KEY set    → Gemini API mode
  - OPENAI_API_KEY set    → OpenAI API mode
  - No API keys           → LOCAL mode (Claude Code Max, free)

Use --mode LOCAL to force local mode even when an API key is present.

9 new tests cover _detect_api_target() priority logic and main()
routing (API delegation, --mode LOCAL override, no-key fallback).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-24 06:52:55 +03:00
yusyus
63968c05f6 Merge pull request #297 from YusufKaraaslanSpyke/feature/fix-create-arg-forwarding
fix: unify scraper argument interface and fix create command forwarding
2026-02-24 06:14:36 +03:00
YusufKaraaslanSpyke
3adc5a8c1d fix: unify scraper argument interface and fix create command forwarding
All scrapers (scrape, github, analyze, pdf) now share a common argument
contract via add_all_standard_arguments() in arguments/common.py.
Universal flags (--dry-run, --verbose, --quiet, --name, --description,
workflow args) work consistently across all source types.

Previously, `create <url> --dry-run`, `create owner/repo --dry-run`,
and `create ./path --dry-run` would crash because sub-scrapers didn't
accept those flags. Also fixes main.py _handle_analyze_command() not
forwarding --dry-run, --preset, --quiet, --name, --description to
codebase_scraper.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-23 20:56:13 +03:00
yusyus
022b8a440c Merge pull request #296 from yusufkaraaslan/development
Max page hot-fix
2026-02-23 12:09:09 +03:00
yusyus
c725a6421e Merge pull request #295 from yusufkaraaslan/claude/hotfix-merge-release-u1ZB8
hotfix: v3.1.1 — fix create command max_pages AttributeError
2026-02-23 10:15:25 +03:00
Claude
40cec4dffd hotfix: v3.1.1 — fix create command max_pages AttributeError
Merge fix from development (#293, #294) and bump version to 3.1.1.
Fixes crash when max_pages argument was not provided in web source routing.

https://claude.ai/code/session_01HS5q7ghjfEUravNPZRCGux
2026-02-23 06:37:39 +00:00
yusyus
1ebf797b34 Merge pull request #294 from YusufKaraaslanSpyke/fix/293-create-command-max-pages-attribute-error
fix: use getattr for max_pages in create command web routing
2026-02-23 08:59:48 +03:00
YusufKaraaslanSpyke
2e273b214f fix: use getattr for max_pages in create command web routing (#293)
The create command crashed with 'Namespace' object has no attribute
'max_pages' because it accessed args.max_pages directly instead of
using getattr() like all other source-specific attributes in the
same method.

Closes #293

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-23 08:58:06 +03:00
yusyus
1456e8be6b docs: move MseeP security badge to bottom of README
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-23 01:54:49 +03:00
yusyus
b9b82f6e4d feat: add new Skill Seekers logo to repo and README
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-23 01:53:45 +03:00
yusyus
d799a8d8c8 chore: update CHANGELOG for v3.1.0 release — add configs work, correct test count
- Update date to 2026-02-23
- Update test count: 2115 → 2280+ (2158 non-MCP + ~122 MCP)
- Add "Config Repository" section documenting all 178 configs reviewed,
  max_pages removed, URL fixes, structural fixes, doc/script alignment

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-23 01:35:34 +03:00
yusyus
dbf0e949c0 chore: update configs submodule — docs/scripts review and cleanup
- README, CONTRIBUTING, QUALITY_GUIDELINES, AGENTS.md all aligned with
  production best practices (accurate counts, no max_pages, unified format)
- validate-config.py: fix two bugs (unified config categories lookup,
  max_pages warning logic)
- Delete old submit-config.md (duplicate of submit-config.yml with
  outdated content)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-23 01:25:24 +03:00
yusyus
c5ad4f7d0e chore: update configs submodule — web-frameworks review complete
- 21 web-framework configs reviewed: angular, astro, django, express, fastapi, flask, hono, htmx, httpx, laravel, nestjs, nextjs, nuxt, react, react-query, ruby-on-rails, solidjs, svelte, sveltekit, vue, zod
- All bumped to v1.1.0; functional fixes for astro, htmx, httpx, laravel, react-query, solidjs, zod
2026-02-23 01:14:21 +03:00
yusyus
f472bf0708 chore: update configs submodule — testing category (11 configs) 2026-02-23 01:07:22 +03:00
yusyus
89dada4a82 chore: update configs submodule — mobile, payments, search, security 2026-02-23 01:06:01 +03:00
yusyus
5c81032eb4 chore: update configs submodule — messaging category (kafka, rabbitmq) 2026-02-23 01:04:07 +03:00
yusyus
ab13fcee78 chore: update configs submodule — languages/typescript enhanced to v1.1.0
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-23 01:01:13 +03:00
yusyus
05dbf90283 chore: update configs submodule — all 2 graphics configs enhanced to v1.1.0
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-23 00:59:50 +03:00
yusyus
510f72a7b5 chore: update configs submodule — gaming/steam-economy-complete enhanced to v1.1.0
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-23 00:58:12 +03:00
yusyus
0c0ea1eadd chore: update configs submodule — all 35 game-engines configs enhanced to v1.1.0
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-23 00:56:16 +03:00
yusyus
d6af2a24b0 chore: update configs submodule — development-tools and devops enhancements
- development-tools (8 configs): claude-code URL fix, docker-compose
  cleanup, eslint flat config, git cleanup, prettier editors update,
  storybook writing-tests, vscode copilot category, zod v4 packages
- devops (9 configs): ansible start_urls + metadata, docker base_url fix,
  github-actions exclude fix + reusable_workflows, grafana simplify,
  helm leading-slash fix, kubernetes setup start_urls + scheduling,
  prometheus alertmanager, terraform base_url fix, vault shamir space bug

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-23 00:40:54 +03:00
yusyus
fab07ce9d8 chore: update configs submodule — enhance databases + 6 more categories
- databases (16 configs): fix DynamoDB path, MySQL 8.4 LTS, PostgreSQL remove stale /docs/15/, Redis /docs/latest/, add GDS for Neo4j, vector/AI categories for Supabase/Redis, TimescaleDB actions/tiering, Prisma /docs/orm/ structure
- development-tools (8 configs): v1.0.0 → v1.1.0
- devops (9 configs): v1.0.0 → v1.1.0
- game-engines + gaming (36 configs): v1.0.0 → v1.1.0
- graphics + languages + messaging + mobile + payments + search (9 configs): v1.0.0 → v1.1.0
- security + test-examples + testing (17 configs): v1.0.0 → v1.1.0
- web-frameworks (20 configs): v1.0.0 → v1.1.0

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-23 00:31:10 +03:00
yusyus
c1841b69bb chore: update configs submodule — data-science enhancements
numpy, pandas, pytorch, tensorflow: start_urls, categories, fixes
2026-02-23 00:18:40 +03:00
yusyus
ff43c708e7 chore: update configs submodule — css-frameworks category enhanced (6 configs) 2026-02-23 00:09:49 +03:00
yusyus
3fd1ce6c69 chore: update configs submodule — cms category enhanced (3 configs) 2026-02-23 00:06:08 +03:00
yusyus
b241f839be chore: update configs submodule — cloud category enhanced (9 configs) 2026-02-23 00:04:08 +03:00
yusyus
2c5b288b53 chore: update configs submodule — build-tools category review
Review and update all 7 configs in build-tools:
esbuild, rollup, storybook, swc, turborepo, vite, webpack — all v1.1.0

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-22 23:57:35 +03:00
yusyus
281e3e455a chore: update configs submodule — api-tech category review
Review and update all 2 configs in api-tech:
- graphql.json: add mutations/subscriptions/variables categories,
  more start_urls, v1.1.0
- trpc.json: update for tRPC v11, TanStack Query, more start_urls,
  data_transformers category, v1.1.0

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-22 23:52:50 +03:00
yusyus
f62f00fd2d chore: update configs submodule to merged main (ai-ml review)
Points submodule to merged main commit (bf9b0ff) after ai-ml
category review and enhancement was merged.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-22 23:46:25 +03:00
yusyus
ed120992a8 chore: update configs submodule — ai-ml category review and enhancement
Review and update all 34 configs in the ai-ml category:
- Remove max_pages from all configs
- Rewrite anthropic, openai-api, langchain, ollama for current state
- Fix URL patterns in chroma, seaborn, nltk, keras, deepspeed
- All configs pass dry-run validation

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-22 23:41:06 +03:00
yusyus
cb97e8ed1f chore: update configs submodule — deepspeed fix (bad URL, category bugs) 2026-02-22 23:09:25 +03:00
yusyus
39c4362d85 fix: update configs submodule to latest (14 → 178 configs) and fix categorization
The api/configs_repo git submodule was pinned to commit d4c0710 which only
had 14 configs. Updated to latest main (4275d6f) which has 178 configs across
21 categories (web-frameworks, ai-ml, game-engines, databases, devops, etc.)

Also fixed ConfigAnalyzer._categorize_config() to use directory structure
(official/{category}/{name}.json) as authoritative category instead of
keyword matching, which was classifying most new configs as "uncategorized".

Result: API /api/configs now returns 178 configs (was 14).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-22 22:51:55 +03:00
yusyus
ef14fd4b5d style: auto-format 12 files with ruff format (CI formatting check)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-22 22:32:31 +03:00
yusyus
efc722eeed fix: resolve all CI ruff linting errors (F401, F821, ARG001, SIM117, SIM105, C408)
- Remove unused imports (F401): os/Path/json/threading in tests; os in estimate_pages;
  Path in install_skill; pytest in test_unified_scraper_orchestration
- Fix F821 undefined 'args' in unified_scraper._scrape_local() by storing
  self._cli_args = args in run() and reading via getattr in _scrape_local()
- Fix ARG001/ARG005 unused lambda/function arguments with _ prefix or # noqa:ARG001
  where parameter names must be preserved for keyword-argument compatibility
- Fix C408 unnecessary dict() calls → dict literals in test_enhance_command
- Fix F841 unused variable 'stub' in test_enhance_command
- Fix SIM117 nested with statements → single with in test_unified_scraper_orchestration
- Fix SIM105 try/except/pass → contextlib.suppress in test_unified_scraper_orchestration
- Rewrite TestScrapeLocal to test fixed behavior (not the NameError bug)

All 2267 tests pass, 11 skipped.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-22 22:30:52 +03:00
yusyus
f7117c35a9 chore: bump version to 3.1.0 and update CHANGELOG
- pyproject.toml: version 3.0.0 → 3.1.0
- src/skill_seekers/_version.py: update hardcoded fallback to 3.1.0
- CHANGELOG.md: comprehensive [3.1.0] release notes covering all
  features and fixes since v3.0.0 (unified create command, workflow
  presets, RST parser, smart enhance dispatcher, CLI flag parity,
  60 new workflow YAMLs, test suite improvements)
- Deprecation messages: update "removed in v3.0.0" → "v4.0.0" across
  analyze_presets.py, codebase_scraper.py, mcp/server.py
- tests/test_cli_paths.py: update version assertion to 3.1.0
- tests/test_package_structure.py: update __version__ assertions to 3.1.0
- tests/test_preset_system.py: update deprecation message version to v4.0.0

All 2267 tests passing.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-22 21:52:04 +03:00
yusyus
db63e67986 fix: resolve all test failures — 2115 passing, 0 failures
Fixes several categories of test failures to achieve a clean test suite:

**Python 3.14 / chromadb compatibility**
- chroma.py: broaden except clause to catch pydantic ConfigError on Python 3.14
- test_adaptors_e2e.py, test_integration_adaptors.py: skip on (ImportError, Exception)

**sys.modules corruption (test isolation)**
- test_swift_detection.py: save/restore all skill_seekers.cli modules AND parent
  package attributes in test_empty_swift_patterns_handled_gracefully; prevents
  @patch decorators in downstream test files from targeting stale module objects

**Removed unnecessary @unittest.skip decorators**
- test_claude_adaptor.py, test_gemini_adaptor.py, test_openai_adaptor.py: remove
  skip from tests that already had pass-body or were compatible once deps installed

**Fixed openai import guard for installed package**
- test_openai_adaptor.py: use patch.dict(sys.modules, {"openai": None}) for
  test_upload_missing_library since openai is now a transitive dep

**langchain import path update**
- test_rag_chunker.py: fix from langchain.schema → langchain_core.documents

**config_extractor tomllib fallback**
- config_extractor.py: use stdlib tomllib (Python 3.11+) as fallback when
  tomli/toml packages are not installed

**Remove redundant sys.path.insert() calls**
- codebase_scraper.py, doc_scraper.py, enhance_skill.py, enhance_skill_local.py,
  estimate_pages.py, install_skill.py: remove legacy path manipulation no longer
  needed with pip install -e . (src/ layout)

**Test fixes: removed @requires_github from fully-mocked tests**
- test_unified_analyzer.py: 5 tests that mock GitHubThreeStreamFetcher don't
  need a real token; remove decorator so they always run

**macOS-specific test improvements**
- test_terminal_detection.py: use @patch(sys.platform, "darwin") instead of
  runtime skipTest() so tests run on all platforms

**Dependency updates**
- pyproject.toml, uv.lock: add langchain and llama-index as core dependencies

**New workflow presets and tests**
- src/skill_seekers/workflows/: add 60 new domain-specific workflow YAML presets
- tests/test_mcp_workflow_tools.py: tests for MCP workflow tool implementations
- tests/test_unified_scraper_orchestration.py: tests for UnifiedScraper methods

Result: 2115 passed, 158 skipped (external services/long-running), 0 failures

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-22 20:43:17 +03:00
yusyus
fee89d5897 fix: smart enhancement dispatcher — Gemini/API mode + root/Docker detection
Fixes issues #289 and #286 (agent switching and Docker/root failures).

enhance_command.py (new smart dispatcher):
- Routes skill-seekers enhance to API mode (Gemini/OpenAI/Claude API)
  when an API key is available, or LOCAL mode (Claude Code CLI) otherwise
- Decision priority: --target flag > config default_agent > auto-detect
  from env vars (ANTHROPIC_API_KEY → claude, GOOGLE_API_KEY → gemini,
  OPENAI_API_KEY → openai) > LOCAL fallback
- Blocks LOCAL mode when running as root (Docker/VPS) with clear error
  message + API mode instructions
- Supports --dry-run, --target, --api-key as first-class flags

arguments/enhance.py:
- Added --target, --api-key, --dry-run, --interactive-enhancement to
  ENHANCE_ARGUMENTS (shared by unified CLI parser and standalone entry point)

enhance_skill_local.py:
- Error output no longer truncated at 200 chars (shows up to 20 lines)
- Detects root/permission errors in stderr and prints actionable hint

config_manager.py:
- Added default_agent field to DEFAULT_CONFIG ai_enhancement section
- Added get_default_agent() and set_default_agent() methods

main.py:
- enhance command routed to enhance_command (was enhance_skill_local)
- _handle_analyze_command uses smart dispatcher for post-analysis enhancement

pyproject.toml:
- skill-seekers-enhance entry point updated to enhance_command:main

Tests: 1977 passed, 0 failed (28 new tests in test_enhance_command.py)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-22 01:26:19 +03:00
yusyus
2e2941e0d4 chore: remove planning/analysis artifacts from repo
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-22 01:03:54 +03:00
yusyus
ba9a8ff8b5 docs: complete documentation overhaul with v3.1.0 release notes and zh-CN translations
Documentation restructure:
- New docs/getting-started/ guide (4 files: install, quick-start, first-skill, next-steps)
- New docs/user-guide/ section (6 files: core concepts through troubleshooting)
- New docs/reference/ section (CLI_REFERENCE, CONFIG_FORMAT, ENVIRONMENT_VARIABLES, MCP_REFERENCE)
- New docs/advanced/ section (custom-workflows, mcp-server, multi-source)
- New docs/ARCHITECTURE.md - system architecture overview
- Archived legacy files (QUICKSTART.md, QUICK_REFERENCE.md, docs/guides/USAGE.md) to docs/archive/legacy/

Chinese (zh-CN) translations:
- Full zh-CN mirror of all user-facing docs (getting-started, user-guide, reference, advanced)
- GitHub Actions workflow for translation sync (.github/workflows/translate-docs.yml)
- Translation sync checker script (scripts/check_translation_sync.sh)
- Translation helper script (scripts/translate_doc.py)

Content updates:
- CHANGELOG.md: [Unreleased] → [3.1.0] - 2026-02-22
- README.md: updated with new doc structure links
- AGENTS.md: updated agent documentation
- docs/features/UNIFIED_SCRAPING.md: updated for unified scraper workflow JSON config

Analysis/planning artifacts (kept for reference):
- DOCUMENTATION_OVERHAUL_PLAN.md, DOCUMENTATION_OVERHAUL_SUMMARY.md
- FEATURE_GAP_ANALYSIS.md, IMPLEMENTATION_GAPS_ANALYSIS.md, CREATE_COMMAND_COVERAGE_ANALYSIS.md
- CHINESE_TRANSLATION_IMPLEMENTATION_SUMMARY.md, ISSUE_260_UPDATE.md

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-22 01:01:51 +03:00
yusyus
22bdd4f5f6 fix: sync CLI flags across analyze/pdf/unified commands and fix workflow JSON config
Flag/option synchronization fixes:
- analyze: add --dry-run, --api-key, and all workflow flags (--enhance-workflow,
  --enhance-stage, --var, --workflow-dry-run) via WORKFLOW_ARGUMENTS merge
- pdf: add --api-key to PDF_ARGUMENTS; replace 5 hardcoded add_argument() calls
  in pdf_scraper.py:main() with add_pdf_arguments() to activate all defined args
- unified: add --api-key and --enhance-level (global override) to UNIFIED_ARGUMENTS
  and standalone parser; wire enhance_level CLI override into run() per-source loop
- codebase_scraper: fix --enhance-workflow to use action="append" (was type=str),
  enabling multiple workflow chaining instead of silently dropping all but last

ConfigManager test isolation fix:
- __init__ now reads self.CONFIG_DIR/CONFIG_FILE/PROGRESS_DIR class variables
  instead of calling _get_config_dir()/_get_progress_dir() directly, enabling
  monkeypatching in tests (fixes pre-existing test_add_and_retrieve_github_profile)

Workflow JSON config support in unified_scraper:
- Phase 5 now reads workflows/workflow_stages/workflow_vars from top-level JSON
  config and merges them with CLI args (CLI-first ordering); supports running
  workflows even when unified scraper is called without CLI args (args=None)

Tests: 1,949 passed, 0 failed (added 18 new tests across 3 test files)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-22 00:44:02 +03:00