fix: Resolve PDF processing (#267), How-To Guide (#242), Chinese README (#260) + code quality (#273)

Thanks @franklegolasyoung for the excellent work on the core fixes for issues #267, #242, and #260! 🙏 Your comprehensive approach to fixing PDF processing, expanding workflow detection, and improving the Chinese README documentation is much appreciated. I've added code quality fixes and comprehensive tests to ensure everything passes CI. All 1266+ tests are now passing, and the issues are resolved! 🎉
2026-01-31 21:30:00 +03:00
parent f726a9abc5
commit 91bd2184e5
19 changed files with 622 additions and 174 deletions
--- a/src/skill_seekers/cli/pdf_extractor_poc.py
+++ b/src/skill_seekers/cli/pdf_extractor_poc.py
@@ -792,8 +792,9 @@ class PDFExtractor:
        # Use "text" format with layout info for PyMuDF 1.24+
        try:
            markdown = page.get_text("markdown")
-        except (AssertionError, ValueError):
-            # Fallback to text format for older/newer PyMuDF versions
+        except (AssertionError, ValueError, RuntimeError, TypeError, AttributeError):
+            # Fallback to text format for incompatible PyMuPDF versions
+            # Some versions don't support "markdown" format or have internal errors
            markdown = page.get_text(
                "text",
                flags=fitz.TEXT_PRESERVE_WHITESPACE