Commit Graph

21 Commits

Author SHA1 Message Date
David Huggins-Daines 38ddbb5256 fix: Use FontMatrix to scale Type3 font metrics (#113)
Signed-off-by: David Huggins-Daines <dhd@ecolingui.ca>
2025-04-09 05:18:15 +02:00
Christoph Auer ca7d584fa3 feat!: Update API, naming, and tests. Move data model to docling-core (#107)
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
Co-authored-by: Michele Dolfi <dol@zurich.ibm.com>
2025-03-14 13:00:24 +01:00
Peter W. J. Staar c2f9741a5b feat: Establish char_cells, word_cells and line_cells, other fixes (#101)
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
2025-02-18 09:54:17 +01:00
Peter W. J. Staar 25b1e64846 feat: add support for RtL (#94)
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
2025-02-06 07:11:19 +01:00
Peter W. J. Staar 9718762209 feat: Added the pure chars and fixed the duplicate text (#91)
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
2025-02-02 13:57:45 +01:00
Peter W. J. Staar d663eec5fd fix: added the fix for rotated pages (#90)
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
2025-01-30 13:28:37 +01:00
Peter W. J. Staar de18986f03 fix: added more updates to better font-parsing (#87)
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
Signed-off-by: Peter W. J. Staar <91719829+PeterStaar-IBM@users.noreply.github.com>
Co-authored-by: Michele Dolfi <97102151+dolfim-ibm@users.noreply.github.com>
2025-01-27 08:24:52 +01:00
Peter W. J. Staar 525ed8e380 feat: Update for complex fonts, rendering, and experimental high-level API (#82)
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
Co-authored-by: Christoph Auer <cau@zurich.ibm.com>
2025-01-17 18:46:16 +01:00
Peter W. J. Staar 1fccb29d3f feat!: Massive quality improvements to v2 parser and new sanitize_cells API (#73)
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
Co-authored-by: Christoph Auer <cau@zurich.ibm.com>
2024-12-09 10:20:52 +01:00
Peter W. J. Staar 22cf280b1f feat: add the export of annotations and ToC (#58)
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
2024-11-20 14:32:36 +01:00
Christoph Auer 6fdd74870d feat!: Upgrade to v2.0.0 (#48)
* feat!: Upgrade to v2.0.0

Signed-off-by: Christoph Auer <60343111+cau-git@users.noreply.github.com>

* Dummy change

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* rename old parser as pdf_parser_v1

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

---------

Signed-off-by: Christoph Auer <60343111+cau-git@users.noreply.github.com>
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
Co-authored-by: Michele Dolfi <dol@zurich.ibm.com>
2024-10-23 14:15:52 +02:00
Peter W. J. Staar 48451ad095 feat: fixed the v2 parser to only return the pages that are requested (#47)
* fixed the v2 parser to only return the pages that are requested

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* updated the visualize script

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* fixed the default args for compilation

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* put std::make_pair to avoid warnings

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

---------

Signed-off-by: Peter Staar <taa@zurich.ibm.com>
2024-10-23 10:14:39 +02:00
Peter Staar e5856f009a feat: add an experimental v2 parser to improve performance (#29)
---------

Signed-off-by: Peter Staar <taa@zurich.ibm.com>
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
Signed-off-by: rmdg88 <rmdg88@gmail.com>
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
Co-authored-by: Peter Staar <taa@zurich.ibm.com>
Co-authored-by: Peter W. J. Staar <91719829+PeterStaar-IBM@users.noreply.github.com>
Co-authored-by: Michele Dolfi <97102151+dolfim-ibm@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Rui Dias Gomes <66125272+rmdg88@users.noreply.github.com>
Co-authored-by: Christoph Auer <cau@zurich.ibm.com>
Co-authored-by: rmdg88 <rmdg88@gmail.com>
Co-authored-by: Michele Dolfi <dol@zurich.ibm.com>
2024-10-11 11:52:33 +02:00
Peter W. J. Staar dd122d0c93 feat!: adding load/unload from key (#9)
* adding load/unload from key

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* updated tests

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* all fixed, still need to clean all commented out code

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* ran pre-commit hooks

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* allow more tabulate versions

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* renamed some key functions

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* ran pre-commit hooks (2)

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

---------

Signed-off-by: Peter Staar <taa@zurich.ibm.com>
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
Co-authored-by: Michele Dolfi <dol@zurich.ibm.com>
2024-08-22 12:31:19 +02:00
Peter W. J. Staar 92e02ec4c1 feat: read page by page (#7)
* first working version to parse page-by-page

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* added the read page-by-page using bytesio

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

---------

Signed-off-by: Peter Staar <taa@zurich.ibm.com>
2024-08-21 08:50:57 +02:00
Peter W. J. Staar 195777b656 feat: add reading from BytesIO (#6)
* added reading from BytesIO

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* run pre-commit hooks

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

---------

Signed-off-by: Peter Staar <taa@zurich.ibm.com>
2024-08-13 14:36:09 +02:00
Peter W. J. Staar fa7bef7f35 fix: unit-tests (#3)
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
2024-08-06 13:19:45 +02:00
Peter Staar 68ef8904a6 removed the PDF_DATA_DIR and replaced it with the resources
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
2024-08-05 14:12:37 +02:00
Peter Staar 4d88cf3a60 added the resources
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
2024-08-05 10:46:29 +02:00
Peter Staar ebb3736fe6 removed assembler stuff
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
2024-07-31 13:58:29 +02:00
Peter Staar 705868302d refactored the src-folder
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
2024-07-31 12:32:44 +02:00