fix: Resolve PDF processing (#267), How-To Guide (#242), Chinese README (#260) + code quality (#273)
Thanks @franklegolasyoung for the excellent work on the core fixes for issues #267, #242, and #260! 🙏 Your comprehensive approach to fixing PDF processing, expanding workflow detection, and improving the Chinese README documentation is much appreciated. I've added code quality fixes and comprehensive tests to ensure everything passes CI. All 1266+ tests are now passing, and the issues are resolved! 🎉
This commit is contained in:
@@ -792,8 +792,9 @@ class PDFExtractor:
|
||||
# Use "text" format with layout info for PyMuDF 1.24+
|
||||
try:
|
||||
markdown = page.get_text("markdown")
|
||||
except (AssertionError, ValueError):
|
||||
# Fallback to text format for older/newer PyMuDF versions
|
||||
except (AssertionError, ValueError, RuntimeError, TypeError, AttributeError):
|
||||
# Fallback to text format for incompatible PyMuPDF versions
|
||||
# Some versions don't support "markdown" format or have internal errors
|
||||
markdown = page.get_text(
|
||||
"text",
|
||||
flags=fitz.TEXT_PRESERVE_WHITESPACE
|
||||
|
||||
Reference in New Issue
Block a user