mirror of
https://github.com/docling-project/docling.git
synced 2026-05-17 13:10:38 +00:00
1f914826bb
* fix: add failed pages to DoclingDocument for page break consistency When some PDF pages fail to parse, they were not added to DoclingDocument.pages, causing page break markers to be incorrect during export. This adds failed/skipped pages with their size info (if available) to maintain correct page numbering and structure. - Add _add_failed_pages_to_document() method in StandardPdfPipeline - Add test cases for failed page handling - Add test cases for normal page handling (regression test) - Add test PDF files Signed-off-by: jhchoi1182 <jhchoi1182@gmail.com> * fix: ensure resource cleanup and simplify type hints - Wrap page_backend usage in try-finally to guarantee unload (prevents resource leaks). - Simplify redundant 'float | None | None' type hint. Signed-off-by: jhchoi1182 <jhchoi1182@gmail.com> * fix: add groundtruth for normal_4pages.pdf and exclude failing PDFs from e2e test Signed-off-by: jhchoi1182 <jhchoi1182@gmail.com> * fix: ensure correct status assertion for failed pages in tests Signed-off-by: jhchoi1182 <jhchoi1182@gmail.com> --------- Signed-off-by: jhchoi1182 <jhchoi1182@gmail.com>