Peter W. J. Staar
96570232f6
feat: add config option to remove glyph output ( #231 )
...
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
2026-02-24 09:36:17 +01:00
Peter W. J. Staar
e7812a122a
feat: Refactor pdf resources to pdf page item ( #215 )
...
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
2026-02-13 17:25:41 +01:00
Peter W. J. Staar
3dd83fcc60
chore: refactored some pdf-resources to page-items ( #214 )
...
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
2026-02-12 20:56:52 +01:00
Peter W. J. Staar
2fd79a05c5
perf: improve recursive form xobject ( #212 )
...
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
2026-02-11 14:49:17 +01:00
Peter W. J. Staar
3272dd8d0b
feat: removing the json from the pdf-parser ( #210 )
...
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
2026-02-11 07:30:12 +01:00
Peter W. J. Staar
ea5f1d8d7b
feat: renaming lines to shapes and enriching with graphics (color, filling and stroking) ( #209 )
...
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
2026-02-10 05:35:19 +01:00
Peter W. J. Staar
f01ce848aa
feat: add decoding config to decode_page ( #208 )
...
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
2026-02-06 15:32:39 +01:00
Peter W. J. Staar
25672da1e8
feat: add-image-extraction ( #207 )
...
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
2026-02-04 17:35:00 +01:00
Peter W. J. Staar
f86ff926c8
perf: move map to unordered_map ( #202 )
...
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
2026-01-29 08:41:15 +01:00
Peter W. J. Staar
23c7fb8e8f
feat: add typed serialization ( #201 )
...
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
2026-01-28 17:51:51 +01:00
Peter W. J. Staar
a98871e9e3
chore: removed the v2 naming in the code ( #198 )
...
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
2026-01-26 13:36:37 +01:00
Peter W. J. Staar
adcb9b00e5
feat!: Remove deprecated v1 api ( #189 )
...
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
2026-01-19 17:16:32 +01:00
Nimalan
0c64402ddb
feat: Support reading password protected PDF ( #169 )
...
Signed-off-by: Nimalan <nimalan.m@protonmail.com >
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com >
Co-authored-by: Michele Dolfi <dol@zurich.ibm.com >
2025-10-20 13:56:18 +02:00
Peter W. J. Staar
0402b3f0a3
feat: reset to the old parameters in sanitation ( #163 )
...
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
2025-09-04 12:39:59 +02:00
Peter W. J. Staar
1466548476
feat: accelerate docling parse ( #161 )
...
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
2025-09-03 07:46:24 +02:00
Peter W. J. Staar
fe3482f7d7
feat: add page unloading ( #150 )
...
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
2025-08-19 08:49:18 +02:00
Peter W. J. Staar
c2f9741a5b
feat: Establish char_cells, word_cells and line_cells, other fixes ( #101 )
...
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
2025-02-18 09:54:17 +01:00
Peter W. J. Staar
1fccb29d3f
feat!: Massive quality improvements to v2 parser and new sanitize_cells API ( #73 )
...
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
Co-authored-by: Christoph Auer <cau@zurich.ibm.com >
2024-12-09 10:20:52 +01:00
Peter W. J. Staar
22cf280b1f
feat: add the export of annotations and ToC ( #58 )
...
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
2024-11-20 14:32:36 +01:00
Christoph Auer
6fdd74870d
feat!: Upgrade to v2.0.0 ( #48 )
...
* feat!: Upgrade to v2.0.0
Signed-off-by: Christoph Auer <60343111+cau-git@users.noreply.github.com >
* Dummy change
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
* rename old parser as pdf_parser_v1
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com >
---------
Signed-off-by: Christoph Auer <60343111+cau-git@users.noreply.github.com >
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com >
Co-authored-by: Michele Dolfi <dol@zurich.ibm.com >
2024-10-23 14:15:52 +02:00
Peter Staar
e5856f009a
feat: add an experimental v2 parser to improve performance ( #29 )
...
---------
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com >
Signed-off-by: rmdg88 <rmdg88@gmail.com >
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
Co-authored-by: Peter Staar <taa@zurich.ibm.com >
Co-authored-by: Peter W. J. Staar <91719829+PeterStaar-IBM@users.noreply.github.com >
Co-authored-by: Michele Dolfi <97102151+dolfim-ibm@users.noreply.github.com >
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Rui Dias Gomes <66125272+rmdg88@users.noreply.github.com >
Co-authored-by: Christoph Auer <cau@zurich.ibm.com >
Co-authored-by: rmdg88 <rmdg88@gmail.com >
Co-authored-by: Michele Dolfi <dol@zurich.ibm.com >
2024-10-11 11:52:33 +02:00
Peter W. J. Staar
dd122d0c93
feat!: adding load/unload from key ( #9 )
...
* adding load/unload from key
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
* updated tests
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
* all fixed, still need to clean all commented out code
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
* ran pre-commit hooks
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
* allow more tabulate versions
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com >
* renamed some key functions
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
* ran pre-commit hooks (2)
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
---------
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com >
Co-authored-by: Michele Dolfi <dol@zurich.ibm.com >
2024-08-22 12:31:19 +02:00
Peter W. J. Staar
92e02ec4c1
feat: read page by page ( #7 )
...
* first working version to parse page-by-page
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
* added the read page-by-page using bytesio
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
---------
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
2024-08-21 08:50:57 +02:00
Peter W. J. Staar
195777b656
feat: add reading from BytesIO ( #6 )
...
* added reading from BytesIO
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
* run pre-commit hooks
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
---------
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
2024-08-13 14:36:09 +02:00
Peter Staar
7ec41810d8
python works
...
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
2024-07-30 17:52:42 +02:00
Peter Staar
6559eb4eda
initial commit
...
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
2024-07-30 13:31:03 +02:00