Peter W. J. Staar
f8d53ee481
feat: add perf tools ( #165 )
...
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
2025-09-16 16:53:46 +02:00
Peter W. J. Staar
5ded3b8f7f
fix: media box ( #157 )
...
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
2025-08-22 12:26:29 +02:00
Peter W. J. Staar
fe3482f7d7
feat: add page unloading ( #150 )
...
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
2025-08-19 08:49:18 +02:00
Michele Dolfi
4a578a165c
ci: switch to windows 2025 ( #149 )
...
Signed-off-by: Michele Dolfi <97102151+dolfim-ibm@users.noreply.github.com >
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
Co-authored-by: Peter Staar <taa@zurich.ibm.com >
2025-08-04 17:00:57 +02:00
Rui Dias Gomes
29d62f58be
chore: switch to uv ( #135 )
...
Signed-off-by: rmdg88 <rmdg88@gmail.com >
2025-06-24 13:00:03 +02:00
Peter W. J. Staar
8872e736bf
feat: Fixed char ordering in text lines ( #138 )
...
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
Co-authored-by: Christoph Auer <cau@zurich.ibm.com >
2025-06-24 12:43:01 +02:00
Peter W. J. Staar
63972876e8
fix: glyph issue with encodings ( #129 )
...
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
2025-06-20 10:40:09 +02:00
David Huggins-Daines
38ddbb5256
fix: Use FontMatrix to scale Type3 font metrics ( #113 )
...
Signed-off-by: David Huggins-Daines <dhd@ecolingui.ca >
2025-04-09 05:18:15 +02:00
Christoph Auer
ca7d584fa3
feat!: Update API, naming, and tests. Move data model to docling-core ( #107 )
...
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com >
Co-authored-by: Michele Dolfi <dol@zurich.ibm.com >
2025-03-14 13:00:24 +01:00
Peter W. J. Staar
c2f9741a5b
feat: Establish char_cells, word_cells and line_cells, other fixes ( #101 )
...
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
2025-02-18 09:54:17 +01:00
Peter W. J. Staar
25b1e64846
feat: add support for RtL ( #94 )
...
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
2025-02-06 07:11:19 +01:00
Peter W. J. Staar
9718762209
feat: Added the pure chars and fixed the duplicate text ( #91 )
...
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
2025-02-02 13:57:45 +01:00
Peter W. J. Staar
d663eec5fd
fix: added the fix for rotated pages ( #90 )
...
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
2025-01-30 13:28:37 +01:00
Peter W. J. Staar
de18986f03
fix: added more updates to better font-parsing ( #87 )
...
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
Signed-off-by: Peter W. J. Staar <91719829+PeterStaar-IBM@users.noreply.github.com >
Co-authored-by: Michele Dolfi <97102151+dolfim-ibm@users.noreply.github.com >
2025-01-27 08:24:52 +01:00
Peter W. J. Staar
525ed8e380
feat: Update for complex fonts, rendering, and experimental high-level API ( #82 )
...
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
Co-authored-by: Christoph Auer <cau@zurich.ibm.com >
2025-01-17 18:46:16 +01:00
Peter W. J. Staar
1fccb29d3f
feat!: Massive quality improvements to v2 parser and new sanitize_cells API ( #73 )
...
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
Co-authored-by: Christoph Auer <cau@zurich.ibm.com >
2024-12-09 10:20:52 +01:00
Peter W. J. Staar
22cf280b1f
feat: add the export of annotations and ToC ( #58 )
...
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
2024-11-20 14:32:36 +01:00
Christoph Auer
6fdd74870d
feat!: Upgrade to v2.0.0 ( #48 )
...
* feat!: Upgrade to v2.0.0
Signed-off-by: Christoph Auer <60343111+cau-git@users.noreply.github.com >
* Dummy change
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
* rename old parser as pdf_parser_v1
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com >
---------
Signed-off-by: Christoph Auer <60343111+cau-git@users.noreply.github.com >
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com >
Co-authored-by: Michele Dolfi <dol@zurich.ibm.com >
2024-10-23 14:15:52 +02:00
Peter W. J. Staar
48451ad095
feat: fixed the v2 parser to only return the pages that are requested ( #47 )
...
* fixed the v2 parser to only return the pages that are requested
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
* updated the visualize script
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
* fixed the default args for compilation
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
* put std::make_pair to avoid warnings
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
---------
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
2024-10-23 10:14:39 +02:00
Peter Staar
e5856f009a
feat: add an experimental v2 parser to improve performance ( #29 )
...
---------
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com >
Signed-off-by: rmdg88 <rmdg88@gmail.com >
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
Co-authored-by: Peter Staar <taa@zurich.ibm.com >
Co-authored-by: Peter W. J. Staar <91719829+PeterStaar-IBM@users.noreply.github.com >
Co-authored-by: Michele Dolfi <97102151+dolfim-ibm@users.noreply.github.com >
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Rui Dias Gomes <66125272+rmdg88@users.noreply.github.com >
Co-authored-by: Christoph Auer <cau@zurich.ibm.com >
Co-authored-by: rmdg88 <rmdg88@gmail.com >
Co-authored-by: Michele Dolfi <dol@zurich.ibm.com >
2024-10-11 11:52:33 +02:00
Peter W. J. Staar
dd122d0c93
feat!: adding load/unload from key ( #9 )
...
* adding load/unload from key
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
* updated tests
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
* all fixed, still need to clean all commented out code
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
* ran pre-commit hooks
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
* allow more tabulate versions
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com >
* renamed some key functions
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
* ran pre-commit hooks (2)
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
---------
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com >
Co-authored-by: Michele Dolfi <dol@zurich.ibm.com >
2024-08-22 12:31:19 +02:00
Peter W. J. Staar
92e02ec4c1
feat: read page by page ( #7 )
...
* first working version to parse page-by-page
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
* added the read page-by-page using bytesio
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
---------
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
2024-08-21 08:50:57 +02:00
Peter W. J. Staar
195777b656
feat: add reading from BytesIO ( #6 )
...
* added reading from BytesIO
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
* run pre-commit hooks
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
---------
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
2024-08-13 14:36:09 +02:00
Peter W. J. Staar
fa7bef7f35
fix: unit-tests ( #3 )
...
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
2024-08-06 13:19:45 +02:00
Peter Staar
68ef8904a6
removed the PDF_DATA_DIR and replaced it with the resources
...
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
2024-08-05 14:12:37 +02:00
Peter Staar
4d88cf3a60
added the resources
...
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
2024-08-05 10:46:29 +02:00
Peter Staar
ebb3736fe6
removed assembler stuff
...
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
2024-07-31 13:58:29 +02:00
Peter Staar
705868302d
refactored the src-folder
...
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
2024-07-31 12:32:44 +02:00