28 Commits

Author SHA1 Message Date
Peter W. J. Staar 93fb7b3ca7 chore: core functionality renaming (#236)
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
2026-03-05 11:40:54 +01:00
Peter W. J. Staar ae66f6ddf0 feat: add parallelization for parsing (#216)
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
2026-03-04 10:42:04 +01:00
Peter W. J. Staar 96570232f6 feat: add config option to remove glyph output (#231)
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
2026-02-24 09:36:17 +01:00
Peter W. J. Staar e7812a122a feat: Refactor pdf resources to pdf page item (#215)
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
2026-02-13 17:25:41 +01:00
Peter W. J. Staar 3dd83fcc60 chore: refactored some pdf-resources to page-items (#214)
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
2026-02-12 20:56:52 +01:00
Peter W. J. Staar 2fd79a05c5 perf: improve recursive form xobject (#212)
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
2026-02-11 14:49:17 +01:00
Peter W. J. Staar 3272dd8d0b feat: removing the json from the pdf-parser (#210)
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
2026-02-11 07:30:12 +01:00
Peter W. J. Staar ea5f1d8d7b feat: renaming lines to shapes and enriching with graphics (color, filling and stroking) (#209)
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
2026-02-10 05:35:19 +01:00
Peter W. J. Staar f01ce848aa feat: add decoding config to decode_page (#208)
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
2026-02-06 15:32:39 +01:00
Peter W. J. Staar 25672da1e8 feat: add-image-extraction (#207)
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
2026-02-04 17:35:00 +01:00
Peter W. J. Staar f86ff926c8 perf: move map to unordered_map (#202)
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
2026-01-29 08:41:15 +01:00
Peter W. J. Staar 23c7fb8e8f feat: add typed serialization (#201)
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
2026-01-28 17:51:51 +01:00
Peter W. J. Staar a98871e9e3 chore: removed the v2 naming in the code (#198)
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
2026-01-26 13:36:37 +01:00
Peter W. J. Staar adcb9b00e5 feat!: Remove deprecated v1 api (#189)
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
2026-01-19 17:16:32 +01:00
Nimalan 0c64402ddb feat: Support reading password protected PDF (#169)
Signed-off-by: Nimalan <nimalan.m@protonmail.com>
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
Co-authored-by: Michele Dolfi <dol@zurich.ibm.com>
2025-10-20 13:56:18 +02:00
Peter W. J. Staar 0402b3f0a3 feat: reset to the old parameters in sanitation (#163)
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
2025-09-04 12:39:59 +02:00
Peter W. J. Staar 1466548476 feat: accelerate docling parse (#161)
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
2025-09-03 07:46:24 +02:00
Peter W. J. Staar fe3482f7d7 feat: add page unloading (#150)
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
2025-08-19 08:49:18 +02:00
Peter W. J. Staar c2f9741a5b feat: Establish char_cells, word_cells and line_cells, other fixes (#101)
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
2025-02-18 09:54:17 +01:00
Peter W. J. Staar 1fccb29d3f feat!: Massive quality improvements to v2 parser and new sanitize_cells API (#73)
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
Co-authored-by: Christoph Auer <cau@zurich.ibm.com>
2024-12-09 10:20:52 +01:00
Peter W. J. Staar 22cf280b1f feat: add the export of annotations and ToC (#58)
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
2024-11-20 14:32:36 +01:00
Christoph Auer 6fdd74870d feat!: Upgrade to v2.0.0 (#48)
* feat!: Upgrade to v2.0.0

Signed-off-by: Christoph Auer <60343111+cau-git@users.noreply.github.com>

* Dummy change

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* rename old parser as pdf_parser_v1

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

---------

Signed-off-by: Christoph Auer <60343111+cau-git@users.noreply.github.com>
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
Co-authored-by: Michele Dolfi <dol@zurich.ibm.com>
2024-10-23 14:15:52 +02:00
Peter Staar e5856f009a feat: add an experimental v2 parser to improve performance (#29)
---------

Signed-off-by: Peter Staar <taa@zurich.ibm.com>
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
Signed-off-by: rmdg88 <rmdg88@gmail.com>
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
Co-authored-by: Peter Staar <taa@zurich.ibm.com>
Co-authored-by: Peter W. J. Staar <91719829+PeterStaar-IBM@users.noreply.github.com>
Co-authored-by: Michele Dolfi <97102151+dolfim-ibm@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Rui Dias Gomes <66125272+rmdg88@users.noreply.github.com>
Co-authored-by: Christoph Auer <cau@zurich.ibm.com>
Co-authored-by: rmdg88 <rmdg88@gmail.com>
Co-authored-by: Michele Dolfi <dol@zurich.ibm.com>
2024-10-11 11:52:33 +02:00
Peter W. J. Staar dd122d0c93 feat!: adding load/unload from key (#9)
* adding load/unload from key

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* updated tests

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* all fixed, still need to clean all commented out code

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* ran pre-commit hooks

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* allow more tabulate versions

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* renamed some key functions

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* ran pre-commit hooks (2)

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

---------

Signed-off-by: Peter Staar <taa@zurich.ibm.com>
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
Co-authored-by: Michele Dolfi <dol@zurich.ibm.com>
2024-08-22 12:31:19 +02:00
Peter W. J. Staar 92e02ec4c1 feat: read page by page (#7)
* first working version to parse page-by-page

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* added the read page-by-page using bytesio

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

---------

Signed-off-by: Peter Staar <taa@zurich.ibm.com>
2024-08-21 08:50:57 +02:00
Peter W. J. Staar 195777b656 feat: add reading from BytesIO (#6)
* added reading from BytesIO

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* run pre-commit hooks

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

---------

Signed-off-by: Peter Staar <taa@zurich.ibm.com>
2024-08-13 14:36:09 +02:00
Peter Staar 7ec41810d8 python works
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
2024-07-30 17:52:42 +02:00
Peter Staar 6559eb4eda initial commit
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
2024-07-30 13:31:03 +02:00