Commit Graph

256 Commits

Author SHA1 Message Date
Peter Staar 31a1862afc feat: upgrade the fonts resolution with differences and cmap
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
2026-04-22 09:32:06 +02:00
Peter Staar c011a7eef1 Merge branch 'main' of github.com:docling-project/docling-parse 2026-04-22 08:49:18 +02:00
Peter W. J. Staar 8546560474 feat: add jpeg2000 pixel data (#259)
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
2026-04-22 08:47:15 +02:00
Peter Staar d988e11aa9 Merge branch 'main' of github.com:docling-project/docling-parse 2026-04-18 05:58:11 +02:00
Peter W. J. Staar b5804c1654 fix: refactored the black to ruff (#258)
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
2026-04-18 05:56:59 +02:00
Peter Staar 76d251ca10 refactored the black to ruff
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
2026-04-17 17:02:26 +02:00
Peter W. J. Staar 7be5d62336 feat: add jbig2 decoder (#252)
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
2026-04-17 15:46:44 +02:00
github-actions[bot] 9c992dbecf chore: bump version to 5.9.0 [skip ci] v5.9.0 2026-04-15 14:10:04 +00:00
Peter W. J. Staar 70fa30054e feat: adding the cpp analysis script and enhancing the extraction of bitmap types (fix for rotated images). (#250)
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
2026-04-15 05:30:14 +02:00
github-actions[bot] 8b287466d4 chore: bump version to 5.8.0 [skip ci] v5.8.0 2026-04-08 08:14:48 +00:00
Attila Oláh 9ea4099e82 fix: boolean conversion (#248)
Signed-off-by: Attila Oláh <attila@dorn.haus>
2026-04-07 14:44:32 +02:00
Peter W. J. Staar c3c1e85da3 feat: improve extraction from fillable fields (#247)
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
2026-04-07 14:42:32 +02:00
mergify[bot] 130672acce ci(mergify): upgrade configuration to current format (#249)
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2026-04-06 09:15:02 +02:00
github-actions[bot] efebb9f4c8 chore: bump version to 5.7.0 [skip ci] v5.7.0 2026-04-01 08:06:48 +00:00
Peter W. J. Staar e7ef57fbf6 feat: extend the renderer (#245)
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
2026-04-01 06:48:09 +02:00
github-actions[bot] 7db8af7f83 chore: bump version to 5.6.2 [skip ci] v5.6.2 2026-03-29 08:33:20 +00:00
LDD19 092d1b8aa2 fix: prevent infinite loop in TOC extraction with circular PDF refererences (#246)
Signed-off-by: LDD19 <25827839+LDD19@users.noreply.github.com>
2026-03-29 09:21:06 +02:00
github-actions[bot] d804d13436 chore: bump version to 5.6.1 [skip ci] v5.6.1 2026-03-24 10:26:10 +00:00
Peter W. J. Staar 1f650dd412 fix: bo10k document failures (#244)
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
2026-03-23 19:49:33 +01:00
github-actions[bot] 2a518ac6f8 chore: bump version to 5.6.0 [skip ci] v5.6.0 2026-03-20 15:53:28 +00:00
Peter W. J. Staar 70be865db8 feat: adding rudimentary renderer (#206)
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
2026-03-20 16:37:28 +01:00
Roman Yurchak e54775c80e perf: optimize stream decoding with regexp fast path (#242)
Signed-off-by: Roman Yurchak <rth.yurchak@gmail.com>
2026-03-20 16:08:14 +01:00
Peter W. J. Staar 93fb7b3ca7 chore: core functionality renaming (#236)
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
2026-03-05 11:40:54 +01:00
Peter W. J. Staar e64d753c51 feat: make everything by default thread-safe (#235)
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
2026-03-05 09:43:44 +01:00
github-actions[bot] aafdc36878 chore: bump version to 5.5.0 [skip ci] v5.5.0 2026-03-04 10:06:20 +00:00
Peter W. J. Staar ae66f6ddf0 feat: add parallelization for parsing (#216)
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
2026-03-04 10:42:04 +01:00
github-actions[bot] e783362692 chore: bump version to 5.4.2 [skip ci] v5.4.2 2026-03-03 14:57:50 +00:00
Peter W. J. Staar 856c0fedb9 fix: ligatures and unicode chars in Differences (#234)
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
2026-03-03 15:40:39 +01:00
github-actions[bot] 84fcd8cc29 chore: bump version to 5.4.1 [skip ci] v5.4.1 2026-03-03 05:14:37 +00:00
Peter W. J. Staar 0316060f2c fix: Map characters into the proper chars (#233)
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
2026-03-02 18:14:57 +01:00
Peter W. J. Staar a4fecd1e06 fix: robustify the page number count (#232)
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
2026-03-02 18:03:46 +01:00
github-actions[bot] 2fffb14513 chore: bump version to 5.4.0 [skip ci] v5.4.0 2026-02-24 11:17:32 +00:00
Peter W. J. Staar 96570232f6 feat: add config option to remove glyph output (#231)
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
2026-02-24 09:36:17 +01:00
Peter W. J. Staar 36eb3928fd fix: updated the debug log (#229)
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
2026-02-23 17:54:59 +01:00
github-actions[bot] 6e45c74e53 chore: bump version to 5.3.4 [skip ci] v5.3.4 2026-02-23 09:42:19 +00:00
Peter W. J. Staar e0264dd22d fix: robustify parse of broken pdfs (#228)
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
2026-02-23 10:27:55 +01:00
Michele Dolfi 3eb7241696 fix: use only development groups and not extras (#225)
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
2026-02-20 13:18:59 +01:00
github-actions[bot] f6067edc17 chore: bump version to 5.3.3 [skip ci] v5.3.3 2026-02-20 06:56:15 +00:00
Peter W. J. Staar 237cef698a fix: replace fixed-size utf8::append buffers with std::back_inserter to prevent segfaults (#224)
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
Co-authored-by: Michele Dolfi <dol@zurich.ibm.com>
2026-02-20 07:26:30 +01:00
Lalatendu Mohanty b0817dbac1 fix: bridge PointerHolder<T> to std::shared_ptr<T> for qpdf 10.x + (#221)
Signed-off-by: Lalatendu Mohanty <lmohanty@redhat.com>
2026-02-19 17:15:38 +01:00
Michele Dolfi c5877e5d67 chore: update lock file (#222)
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
2026-02-18 09:50:25 +01:00
github-actions[bot] d1fd46b3cd chore: bump version to 5.3.2 [skip ci] v5.3.2 2026-02-17 16:05:33 +00:00
Peter W. J. Staar 6d984796a9 fix: rotated pages (missing commits) (#219)
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
2026-02-17 16:50:28 +01:00
Christoph Auer aeca2496c9 chore: Exclude pypy wheels for windows builds (#218)
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
2026-02-17 12:46:41 +01:00
github-actions[bot] 283ef6c2f2 chore: bump version to 5.3.1 [skip ci] v5.3.1 2026-02-17 09:35:49 +00:00
Peter W. J. Staar 0b592f6d09 fix: deal with image containing rotated pages (#217)
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
2026-02-17 10:13:21 +01:00
github-actions[bot] d619967dae chore: bump version to 5.3.0 [skip ci] v5.3.0 2026-02-16 13:46:23 +00:00
Peter W. J. Staar e7812a122a feat: Refactor pdf resources to pdf page item (#215)
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
2026-02-13 17:25:41 +01:00
Peter W. J. Staar 3dd83fcc60 chore: refactored some pdf-resources to page-items (#214)
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
2026-02-12 20:56:52 +01:00
Peter W. J. Staar 67d2922913 feat: refactored the code and removed a lot of extra json parameters (#213)
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
2026-02-12 18:23:20 +01:00