Christoph Auer
b066b26215
feat!: Public threaded PDF parser and rendering API ( #265 )
...
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
2026-05-11 15:37:22 +02:00
Peter W. J. Staar
ac0a361a4f
feat: rendering of math and latex symbols ( #264 )
2026-05-08 12:14:22 +02:00
Eric Van Boxsom
e56632d962
fix: locale-independent float parsing (fixes docling#1455) ( #243 )
...
Signed-off-by: Eric Van Boxsom <14831976+evb87-tech@users.noreply.github.com >
2026-05-06 13:11:31 -04:00
Peter W. J. Staar
8546560474
feat: add jpeg2000 pixel data ( #259 )
...
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
2026-04-22 08:47:15 +02:00
Peter W. J. Staar
b5804c1654
fix: refactored the black to ruff ( #258 )
...
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
2026-04-18 05:56:59 +02:00
Peter W. J. Staar
7be5d62336
feat: add jbig2 decoder ( #252 )
...
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
2026-04-17 15:46:44 +02:00
Peter W. J. Staar
70fa30054e
feat: adding the cpp analysis script and enhancing the extraction of bitmap types (fix for rotated images). ( #250 )
...
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
2026-04-15 05:30:14 +02:00
Peter W. J. Staar
c3c1e85da3
feat: improve extraction from fillable fields ( #247 )
...
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
2026-04-07 14:42:32 +02:00
Peter W. J. Staar
e7ef57fbf6
feat: extend the renderer ( #245 )
...
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
2026-04-01 06:48:09 +02:00
Peter W. J. Staar
1f650dd412
fix: bo10k document failures ( #244 )
...
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
2026-03-23 19:49:33 +01:00
Peter W. J. Staar
ae66f6ddf0
feat: add parallelization for parsing ( #216 )
...
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
2026-03-04 10:42:04 +01:00
Peter W. J. Staar
856c0fedb9
fix: ligatures and unicode chars in Differences ( #234 )
...
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
2026-03-03 15:40:39 +01:00
Peter W. J. Staar
96570232f6
feat: add config option to remove glyph output ( #231 )
...
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
2026-02-24 09:36:17 +01:00
Peter W. J. Staar
237cef698a
fix: replace fixed-size utf8::append buffers with std::back_inserter to prevent segfaults ( #224 )
...
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com >
Co-authored-by: Michele Dolfi <dol@zurich.ibm.com >
2026-02-20 07:26:30 +01:00
Peter W. J. Staar
6d984796a9
fix: rotated pages (missing commits) ( #219 )
...
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
2026-02-17 16:50:28 +01:00
Peter W. J. Staar
e7812a122a
feat: Refactor pdf resources to pdf page item ( #215 )
...
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
2026-02-13 17:25:41 +01:00
Peter W. J. Staar
67d2922913
feat: refactored the code and removed a lot of extra json parameters ( #213 )
...
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
2026-02-12 18:23:20 +01:00
Peter W. J. Staar
3272dd8d0b
feat: removing the json from the pdf-parser ( #210 )
...
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
2026-02-11 07:30:12 +01:00
Peter W. J. Staar
ea5f1d8d7b
feat: renaming lines to shapes and enriching with graphics (color, filling and stroking) ( #209 )
...
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
2026-02-10 05:35:19 +01:00
Peter W. J. Staar
f01ce848aa
feat: add decoding config to decode_page ( #208 )
...
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
2026-02-06 15:32:39 +01:00
Peter W. J. Staar
25672da1e8
feat: add-image-extraction ( #207 )
...
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
2026-02-04 17:35:00 +01:00
Peter W. J. Staar
fe25ac9854
chore: added the constrained test groundtruth ( #205 )
...
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
2026-01-30 11:22:47 +01:00
Sam Quigley
bb0b4ef0b1
fix: recursively traverse parent chain for inherited MediaBox ( #204 )
...
Signed-off-by: Sam Quigley <quigley@emerose.com >
2026-01-30 08:46:13 +01:00
Peter W. J. Staar
23c7fb8e8f
feat: add typed serialization ( #201 )
...
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
2026-01-28 17:51:51 +01:00
Peter W. J. Staar
a98871e9e3
chore: removed the v2 naming in the code ( #198 )
...
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
2026-01-26 13:36:37 +01:00
Peter W. J. Staar
adcb9b00e5
feat!: Remove deprecated v1 api ( #189 )
...
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
2026-01-19 17:16:32 +01:00
Peter W. J. Staar
ec6149ecd7
fix: updated the font-parsing ( #193 )
...
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
2026-01-13 09:10:16 +01:00
Peter W. J. Staar
365a7175ce
chore: remove timings from test_parse_v2 regression files ( #192 )
...
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
2026-01-12 16:51:25 +01:00
Peter W. J. Staar
a6facf09ec
chore: updating test-parse-v2 regression data ( #190 )
...
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
2026-01-12 14:46:16 +01:00
Michele Dolfi
1d3f78e514
fix: "could not find the page-dimensions" error solved restoring the parent mediabox ( #181 )
...
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com >
2025-12-02 06:01:13 -08:00
Peter W. J. Staar
327dc4ba13
fix: 360 rotated pages ( #177 )
...
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
2025-11-05 16:52:32 +01:00
Peter W. J. Staar
f8d53ee481
feat: add perf tools ( #165 )
...
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
2025-09-16 16:53:46 +02:00
Peter W. J. Staar
5ded3b8f7f
fix: media box ( #157 )
...
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
2025-08-22 12:26:29 +02:00
Peter W. J. Staar
fe3482f7d7
feat: add page unloading ( #150 )
...
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
2025-08-19 08:49:18 +02:00
Michele Dolfi
4a578a165c
ci: switch to windows 2025 ( #149 )
...
Signed-off-by: Michele Dolfi <97102151+dolfim-ibm@users.noreply.github.com >
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
Co-authored-by: Peter Staar <taa@zurich.ibm.com >
2025-08-04 17:00:57 +02:00
Rui Dias Gomes
29d62f58be
chore: switch to uv ( #135 )
...
Signed-off-by: rmdg88 <rmdg88@gmail.com >
2025-06-24 13:00:03 +02:00
Peter W. J. Staar
8872e736bf
feat: Fixed char ordering in text lines ( #138 )
...
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
Co-authored-by: Christoph Auer <cau@zurich.ibm.com >
2025-06-24 12:43:01 +02:00
Peter W. J. Staar
63972876e8
fix: glyph issue with encodings ( #129 )
...
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
2025-06-20 10:40:09 +02:00
David Huggins-Daines
38ddbb5256
fix: Use FontMatrix to scale Type3 font metrics ( #113 )
...
Signed-off-by: David Huggins-Daines <dhd@ecolingui.ca >
2025-04-09 05:18:15 +02:00
Christoph Auer
ca7d584fa3
feat!: Update API, naming, and tests. Move data model to docling-core ( #107 )
...
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com >
Co-authored-by: Michele Dolfi <dol@zurich.ibm.com >
2025-03-14 13:00:24 +01:00
Peter W. J. Staar
c2f9741a5b
feat: Establish char_cells, word_cells and line_cells, other fixes ( #101 )
...
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
2025-02-18 09:54:17 +01:00
Peter W. J. Staar
25b1e64846
feat: add support for RtL ( #94 )
...
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
2025-02-06 07:11:19 +01:00
Peter W. J. Staar
9718762209
feat: Added the pure chars and fixed the duplicate text ( #91 )
...
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
2025-02-02 13:57:45 +01:00
Peter W. J. Staar
d663eec5fd
fix: added the fix for rotated pages ( #90 )
...
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
2025-01-30 13:28:37 +01:00
Peter W. J. Staar
de18986f03
fix: added more updates to better font-parsing ( #87 )
...
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
Signed-off-by: Peter W. J. Staar <91719829+PeterStaar-IBM@users.noreply.github.com >
Co-authored-by: Michele Dolfi <97102151+dolfim-ibm@users.noreply.github.com >
2025-01-27 08:24:52 +01:00
Peter W. J. Staar
525ed8e380
feat: Update for complex fonts, rendering, and experimental high-level API ( #82 )
...
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
Co-authored-by: Christoph Auer <cau@zurich.ibm.com >
2025-01-17 18:46:16 +01:00
Peter W. J. Staar
1fccb29d3f
feat!: Massive quality improvements to v2 parser and new sanitize_cells API ( #73 )
...
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
Co-authored-by: Christoph Auer <cau@zurich.ibm.com >
2024-12-09 10:20:52 +01:00
Peter W. J. Staar
22cf280b1f
feat: add the export of annotations and ToC ( #58 )
...
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
2024-11-20 14:32:36 +01:00
Christoph Auer
6fdd74870d
feat!: Upgrade to v2.0.0 ( #48 )
...
* feat!: Upgrade to v2.0.0
Signed-off-by: Christoph Auer <60343111+cau-git@users.noreply.github.com >
* Dummy change
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
* rename old parser as pdf_parser_v1
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com >
---------
Signed-off-by: Christoph Auer <60343111+cau-git@users.noreply.github.com >
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com >
Co-authored-by: Michele Dolfi <dol@zurich.ibm.com >
2024-10-23 14:15:52 +02:00
Peter W. J. Staar
48451ad095
feat: fixed the v2 parser to only return the pages that are requested ( #47 )
...
* fixed the v2 parser to only return the pages that are requested
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
* updated the visualize script
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
* fixed the default args for compilation
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
* put std::make_pair to avoid warnings
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
---------
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
2024-10-23 10:14:39 +02:00