fix: Resolve PDF processing (#267), How-To Guide (#242), Chinese README (#260) + code quality (#273)

Thanks @franklegolasyoung for the excellent work on the core fixes for issues #267, #242, and #260! 🙏

Your comprehensive approach to fixing PDF processing, expanding workflow detection, and improving the Chinese README documentation is much appreciated. I've added code quality fixes and comprehensive tests to ensure everything passes CI.

All 1266+ tests are now passing, and the issues are resolved! 🎉
This commit is contained in:
yusyus
2026-01-31 21:30:00 +03:00
committed by GitHub
parent f726a9abc5
commit 91bd2184e5
19 changed files with 622 additions and 174 deletions

View File

@@ -792,8 +792,9 @@ class PDFExtractor:
# Use "text" format with layout info for PyMuDF 1.24+
try:
markdown = page.get_text("markdown")
except (AssertionError, ValueError):
# Fallback to text format for older/newer PyMuDF versions
except (AssertionError, ValueError, RuntimeError, TypeError, AttributeError):
# Fallback to text format for incompatible PyMuPDF versions
# Some versions don't support "markdown" format or have internal errors
markdown = page.get_text(
"text",
flags=fitz.TEXT_PRESERVE_WHITESPACE