Commit Graph

  • f53ab21558 perf: update perf scripts (#271) main Peter W. J. Staar 2026-05-12 15:12:25 +02:00
  • 603467c7bc chore: bump version to 6.0.0 [skip ci] v6.0.0 github-actions[bot] 2026-05-11 14:26:20 +00:00
  • b066b26215 feat!: Public threaded PDF parser and rendering API (#265) Christoph Auer 2026-05-11 15:37:22 +02:00
  • 15157952d3 fix: upgrade packages for vulnerabilities (#270) Peter W. J. Staar 2026-05-11 11:03:42 +02:00
  • a41ef14bb2 fix: upgraded pillow, requests, pygments, cryptography (#269) Peter W. J. Staar 2026-05-11 09:57:08 +02:00
  • 7c8dd3d33d chore: bump version to 5.11.0 [skip ci] v5.11.0 github-actions[bot] 2026-05-08 11:26:25 +00:00
  • ac0a361a4f feat: rendering of math and latex symbols (#264) Peter W. J. Staar 2026-05-08 12:14:22 +02:00
  • f1316fa14b docs(security): Document security processes (#268) Michele Dolfi 2026-05-08 10:00:21 +02:00
  • e56632d962 fix: locale-independent float parsing (fixes docling#1455) (#243) Eric Van Boxsom 2026-05-06 19:11:31 +02:00
  • 1ef8e22aca ci: add concurrency control to cancel outdated workflow runs (#266) Michele Dolfi 2026-04-29 07:53:57 +02:00
  • a9bad4c65a chore: bump version to 5.10.1 [skip ci] v5.10.1 github-actions[bot] 2026-04-24 14:26:45 +00:00
  • db84017ca7 fix: memory management for docling upstream (#263) Peter W. J. Staar 2026-04-24 15:18:25 +02:00
  • 94046d5642 added a std::back_inserter for src/parse/utils/string.h fix/try-to-understand-bad_alloc Peter Staar 2026-04-24 14:28:49 +02:00
  • 6f7191b2a2 added the list_loaded_keys_with_pages() Peter Staar 2026-04-24 12:41:56 +02:00
  • d3ab595c1c fix to unload the correct page Peter Staar 2026-04-24 11:37:57 +02:00
  • 6a3c782f59 adding some extra logging to see what is being deleted Peter Staar 2026-04-24 11:18:02 +02:00
  • d98abe2a33 updates to reduce logging and remove glyphs Peter Staar 2026-04-23 17:09:47 +02:00
  • c5d9e6755f fix: work on trying to find the bad-alloc Peter Staar 2026-04-23 08:08:49 +02:00
  • b463cc6669 Merge branch 'main' of github.com:docling-project/docling-parse Peter Staar 2026-04-23 06:52:33 +02:00
  • 4a8f7cf550 chore: bump version to 5.10.0 [skip ci] v5.10.0 github-actions[bot] 2026-04-22 07:50:54 +00:00
  • 31a1862afc feat: upgrade the fonts resolution with differences and cmap dev/fix-differences-versus-cmap Peter Staar 2026-04-22 09:32:06 +02:00
  • c011a7eef1 Merge branch 'main' of github.com:docling-project/docling-parse Peter Staar 2026-04-22 08:49:18 +02:00
  • 8546560474 feat: add jpeg2000 pixel data (#259) Peter W. J. Staar 2026-04-22 08:47:15 +02:00
  • c06ce3860d moved to 1 thread for windows compilation dev/add-jbig2-decoder Peter Staar 2026-04-21 15:14:35 +02:00
  • a59a0f0134 fixed the jpeg-utils Peter Staar 2026-04-21 09:25:42 +02:00
  • 7e1b3a92ec almost all full-page images are rendered correctly Peter Staar 2026-04-21 08:58:46 +02:00
  • d988e11aa9 Merge branch 'main' of github.com:docling-project/docling-parse Peter Staar 2026-04-18 05:58:11 +02:00
  • b5804c1654 fix: refactored the black to ruff (#258) Peter W. J. Staar 2026-04-18 05:56:59 +02:00
  • 76d251ca10 refactored the black to ruff Peter Staar 2026-04-17 17:02:26 +02:00
  • 7be5d62336 feat: add jbig2 decoder (#252) Peter W. J. Staar 2026-04-17 15:46:44 +02:00
  • cc53ae378d make the renderer more robust against weird fonts Peter Staar 2026-04-17 14:48:39 +02:00
  • b8935787c3 ran pre-commit Peter Staar 2026-04-17 13:11:31 +02:00
  • b4afb77784 ran pre-commit Peter Staar 2026-04-17 12:49:26 +02:00
  • c529b9c4c2 fixing the asserts to give clean assert errors Peter Staar 2026-04-17 12:34:15 +02:00
  • 42274119b0 refactored the unit-tests with using huggingface-hub to store regression documents and data Peter Staar 2026-04-17 10:33:00 +02:00
  • 59a8d5427a fixed the ccitt k=0 case Peter Staar 2026-04-16 15:35:27 +02:00
  • dc62f30fc0 fixed the orientation of the bitmaps Peter Staar 2026-04-16 14:19:40 +02:00
  • e30475a0a1 fixed the cmyk -> rgb Peter Staar 2026-04-16 13:41:18 +02:00
  • 35a194cdf5 Adding masks to bitmap_instructions Peter Staar 2026-04-16 11:22:22 +02:00
  • b940efcc8c updated the complex_jbig2_overlays Peter Staar 2026-04-16 10:34:17 +02:00
  • fb05a3b22b updated the regression data Peter Staar 2026-04-16 10:32:30 +02:00
  • f60c12cd88 Updated the tests and DoclingPdfRenderer Peter Staar 2026-04-16 10:08:43 +02:00
  • 0288f6ea85 added test renderer Peter Staar 2026-04-16 08:01:40 +02:00
  • 746d4e3a9a added the src/third_party/pdfium_jbig2/build Peter Staar 2026-04-16 05:46:46 +02:00
  • cda82e10bc added regression doc Peter Staar 2026-04-15 18:24:03 +02:00
  • 6db8850be6 fixed the jbig2 renderer with masks Peter Staar 2026-04-15 17:56:35 +02:00
  • 120055108c leveraging the mask to get the correct bitmaps for jbig2 Peter Staar 2026-04-15 17:50:55 +02:00
  • e0e2e78ac3 fixed the cmake such that we have no linking warnings Peter Staar 2026-04-15 16:41:38 +02:00
  • 9c992dbecf chore: bump version to 5.9.0 [skip ci] v5.9.0 github-actions[bot] 2026-04-15 14:10:04 +00:00
  • 6bf52d54ee adding the bitmap_exporter Peter Staar 2026-04-15 16:07:37 +02:00
  • e99914bab1 fixed the JBig2 with Peter Staar 2026-04-15 09:48:04 +02:00
  • 4e26fbeaf8 feat: adding a jbig2 decoder derived from pypdfium Peter Staar 2026-04-15 07:54:02 +02:00
  • 70fa30054e feat: adding the cpp analysis script and enhancing the extraction of bitmap types (fix for rotated images). (#250) Peter W. J. Staar 2026-04-15 05:30:14 +02:00
  • 8b287466d4 chore: bump version to 5.8.0 [skip ci] v5.8.0 github-actions[bot] 2026-04-08 08:14:48 +00:00
  • 9ea4099e82 fix: boolean conversion (#248) Attila Oláh 2026-04-07 14:44:32 +02:00
  • c3c1e85da3 feat: improve extraction from fillable fields (#247) Peter W. J. Staar 2026-04-07 14:42:32 +02:00
  • 130672acce ci(mergify): upgrade configuration to current format (#249) mergify[bot] 2026-04-06 09:15:02 +02:00
  • efebb9f4c8 chore: bump version to 5.7.0 [skip ci] v5.7.0 github-actions[bot] 2026-04-01 08:06:48 +00:00
  • e7ef57fbf6 feat: extend the renderer (#245) Peter W. J. Staar 2026-04-01 06:48:09 +02:00
  • 7db8af7f83 chore: bump version to 5.6.2 [skip ci] v5.6.2 github-actions[bot] 2026-03-29 08:33:20 +00:00
  • 092d1b8aa2 fix: prevent infinite loop in TOC extraction with circular PDF refererences (#246) LDD19 2026-03-29 03:21:06 -04:00
  • d804d13436 chore: bump version to 5.6.1 [skip ci] v5.6.1 github-actions[bot] 2026-03-24 10:26:10 +00:00
  • 1f650dd412 fix: bo10k document failures (#244) Peter W. J. Staar 2026-03-23 19:49:33 +01:00
  • 2a518ac6f8 chore: bump version to 5.6.0 [skip ci] v5.6.0 github-actions[bot] 2026-03-20 15:53:28 +00:00
  • 70be865db8 feat: adding rudimentary renderer (#206) Peter W. J. Staar 2026-03-20 16:37:28 +01:00
  • e54775c80e perf: optimize stream decoding with regexp fast path (#242) Roman Yurchak 2026-03-20 16:08:14 +01:00
  • 0a2147ef35 added the case-18 fix/broken-page-count Peter Staar 2026-03-09 06:03:07 +01:00
  • a2feb2e3c7 fix: robustify page-count Peter Staar 2026-03-08 11:54:06 +01:00
  • ebd24d78f8 fix: correct page-count and add new unit test Peter Staar 2026-03-07 11:26:50 +01:00
  • 93fb7b3ca7 chore: core functionality renaming (#236) Peter W. J. Staar 2026-03-05 11:40:54 +01:00
  • e64d753c51 feat: make everything by default thread-safe (#235) Peter W. J. Staar 2026-03-05 09:43:44 +01:00
  • aafdc36878 chore: bump version to 5.5.0 [skip ci] v5.5.0 github-actions[bot] 2026-03-04 10:06:20 +00:00
  • ae66f6ddf0 feat: add parallelization for parsing (#216) Peter W. J. Staar 2026-03-04 10:42:04 +01:00
  • e783362692 chore: bump version to 5.4.2 [skip ci] v5.4.2 github-actions[bot] 2026-03-03 14:57:50 +00:00
  • 856c0fedb9 fix: ligatures and unicode chars in Differences (#234) Peter W. J. Staar 2026-03-03 15:40:39 +01:00
  • 84fcd8cc29 chore: bump version to 5.4.1 [skip ci] v5.4.1 github-actions[bot] 2026-03-03 05:14:37 +00:00
  • 0316060f2c fix: Map characters into the proper chars (#233) Peter W. J. Staar 2026-03-02 18:14:57 +01:00
  • a4fecd1e06 fix: robustify the page number count (#232) Peter W. J. Staar 2026-03-02 18:03:46 +01:00
  • 2fffb14513 chore: bump version to 5.4.0 [skip ci] v5.4.0 github-actions[bot] 2026-02-24 11:17:32 +00:00
  • 96570232f6 feat: add config option to remove glyph output (#231) Peter W. J. Staar 2026-02-24 09:36:17 +01:00
  • 36eb3928fd fix: updated the debug log (#229) Peter W. J. Staar 2026-02-23 17:54:59 +01:00
  • 6e45c74e53 chore: bump version to 5.3.4 [skip ci] v5.3.4 github-actions[bot] 2026-02-23 09:42:19 +00:00
  • e0264dd22d fix: robustify parse of broken pdfs (#228) Peter W. J. Staar 2026-02-23 10:27:55 +01:00
  • 3eb7241696 fix: use only development groups and not extras (#225) Michele Dolfi 2026-02-20 13:18:59 +01:00
  • f6067edc17 chore: bump version to 5.3.3 [skip ci] v5.3.3 github-actions[bot] 2026-02-20 06:56:15 +00:00
  • 237cef698a fix: replace fixed-size utf8::append buffers with std::back_inserter to prevent segfaults (#224) Peter W. J. Staar 2026-02-20 07:26:30 +01:00
  • b0817dbac1 fix: bridge PointerHolder<T> to std::shared_ptr<T> for qpdf 10.x + (#221) Lalatendu Mohanty 2026-02-19 11:15:38 -05:00
  • c5877e5d67 chore: update lock file (#222) Michele Dolfi 2026-02-18 09:50:25 +01:00
  • d1fd46b3cd chore: bump version to 5.3.2 [skip ci] v5.3.2 github-actions[bot] 2026-02-17 16:05:33 +00:00
  • 6d984796a9 fix: rotated pages (missing commits) (#219) Peter W. J. Staar 2026-02-17 16:50:28 +01:00
  • aeca2496c9 chore: Exclude pypy wheels for windows builds (#218) Christoph Auer 2026-02-17 12:46:41 +01:00
  • 283ef6c2f2 chore: bump version to 5.3.1 [skip ci] v5.3.1 github-actions[bot] 2026-02-17 09:35:49 +00:00
  • 0b592f6d09 fix: deal with image containing rotated pages (#217) Peter W. J. Staar 2026-02-17 10:13:21 +01:00
  • d619967dae chore: bump version to 5.3.0 [skip ci] v5.3.0 github-actions[bot] 2026-02-16 13:46:23 +00:00
  • e7812a122a feat: Refactor pdf resources to pdf page item (#215) Peter W. J. Staar 2026-02-13 17:25:41 +01:00
  • 3dd83fcc60 chore: refactored some pdf-resources to page-items (#214) Peter W. J. Staar 2026-02-12 20:56:52 +01:00
  • 67d2922913 feat: refactored the code and removed a lot of extra json parameters (#213) Peter W. J. Staar 2026-02-12 18:23:20 +01:00
  • 2fd79a05c5 perf: improve recursive form xobject (#212) Peter W. J. Staar 2026-02-11 14:49:17 +01:00
  • 3272dd8d0b feat: removing the json from the pdf-parser (#210) Peter W. J. Staar 2026-02-11 07:30:12 +01:00
  • ea5f1d8d7b feat: renaming lines to shapes and enriching with graphics (color, filling and stroking) (#209) Peter W. J. Staar 2026-02-10 05:35:19 +01:00