mirror of
https://github.com/docling-project/docling-eval.git
synced 2026-05-17 13:10:47 +00:00
5c9f3fadf2
* fix: Close PIL Image objects to prevent memory leaks. * prevent PIL Image memory leaks * serialize images to bytes and add EasyOCR test script * DCO Remediation Commit for samiuc <sami.ullah.chat@gmail.com> I, samiuc <sami.ullah.chat@gmail.com>, hereby add my Signed-off-by to this commit:a592458dceI, samiuc <sami.ullah.chat@gmail.com>, hereby add my Signed-off-by to this commit:ac622cce15I, samiuc <sami.ullah.chat@gmail.com>, hereby add my Signed-off-by to this commit:3648bd972bSigned-off-by: samiuc <sami.ullah.chat@gmail.com> * style: fix isort import ordering * DCO Remediation Commit for samiuc <sami.ullah.chat@gmail.com> I, samiuc <sami.ullah.chat@gmail.com>, hereby add my Signed-off-by to this commit:a592458dceI, samiuc <sami.ullah.chat@gmail.com>, hereby add my Signed-off-by to this commit:ac622cce15I, samiuc <sami.ullah.chat@gmail.com>, hereby add my Signed-off-by to this commit:3648bd972bI, samiuc <sami.ullah.chat@gmail.com>, hereby add my Signed-off-by to this commit:c05b85e42aSigned-off-by: samiuc <sami.ullah.chat@gmail.com> * feat: support omnidocbench parquet format * DCO Remediation Commit for samiuc <sami.ullah.chat@gmail.com> I, samiuc <sami.ullah.chat@gmail.com>, hereby add my Signed-off-by to this commit:a592458dceI, samiuc <sami.ullah.chat@gmail.com>, hereby add my Signed-off-by to this commit:ac622cce15I, samiuc <sami.ullah.chat@gmail.com>, hereby add my Signed-off-by to this commit:3648bd972bI, samiuc <sami.ullah.chat@gmail.com>, hereby add my Signed-off-by to this commit:c05b85e42aI, samiuc <sami.ullah.chat@gmail.com>, hereby add my Signed-off-by to this commit:36044e5eb5Signed-off-by: samiuc <sami.ullah.chat@gmail.com> * fix: format files Signed-off-by: samiuc <sami.ullah.chat@gmail.com> * fix: tags check Signed-off-by: samiuc <sami.ullah.chat@gmail.com> * fix: update pixel evaluator with NumPy 1.x compatible bit counting. Signed-off-by: samiuc <sami.ullah.chat@gmail.com> * fix: prevent memory leaks Signed-off-by: samiuc <sami.ullah.chat@gmail.com> * reformat and fix type errors Signed-off-by: samiuc <sami.ullah.chat@gmail.com> * fix: build error and import tableformer provider optionally Signed-off-by: samiuc <sami.ullah.chat@gmail.com> * remove comment Signed-off-by: samiuc <sami.ullah.chat@gmail.com> * fix: reset streams and adding image refs * fix TableStructureModel import path Signed-off-by: samiuc <sami.ullah.chat@gmail.com> * DCO Remediation Commit for samiuc <sami.ullah.chat@gmail.com> I, samiuc <sami.ullah.chat@gmail.com>, hereby add my Signed-off-by to this commit:a592458dceI, samiuc <sami.ullah.chat@gmail.com>, hereby add my Signed-off-by to this commit:ac622cce15I, samiuc <sami.ullah.chat@gmail.com>, hereby add my Signed-off-by to this commit:3648bd972bI, samiuc <sami.ullah.chat@gmail.com>, hereby add my Signed-off-by to this commit:c05b85e42aI, samiuc <sami.ullah.chat@gmail.com>, hereby add my Signed-off-by to this commit:36044e5eb5I, samiuc <sami.ullah.chat@gmail.com>, hereby add my Signed-off-by to this commit:b83a1f0b7dSigned-off-by: samiuc <sami.ullah.chat@gmail.com> * DCO Remediation Commit for samiuc <sami.ullah.chat@gmail.com> I, samiuc <sami.ullah.chat@gmail.com>, hereby add my Signed-off-by to this commit:a592458dceI, samiuc <sami.ullah.chat@gmail.com>, hereby add my Signed-off-by to this commit:ac622cce15I, samiuc <sami.ullah.chat@gmail.com>, hereby add my Signed-off-by to this commit:3648bd972bI, samiuc <sami.ullah.chat@gmail.com>, hereby add my Signed-off-by to this commit:c05b85e42aI, samiuc <sami.ullah.chat@gmail.com>, hereby add my Signed-off-by to this commit:36044e5eb5I, samiuc <sami.ullah.chat@gmail.com>, hereby add my Signed-off-by to this commit:b83a1f0b7dSigned-off-by: samiuc <sami.ullah.chat@gmail.com> * Update dependencies and uv.lock Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * address review comments Signed-off-by: samiuc <sami.ullah.chat@gmail.com> * dependency upgrades Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * fix: unify group registry to prevent duplicate group creation Merge created_groups and ListHierarchyManager.group_containers into a single shared all_groups registry. This eliminates a class of duplicate DocItem refs that arose when list and non-list code paths each created separate groups for the same path_id. Also adds _validate_unique_crefs() as a post-conversion assertion that fires before coordinate scaling. Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * feat: add flat-layout campaign tools Adds three scripts for workflows using flat delivery directories (as opposed to the submission-style layout the existing tools expect): - convert_flat_cvat_deliveries_to_docling.py: batch-converts flat CVAT XML deliveries into DoclingDocument JSON + HTML visualizations, with worker parallelism, dry-run, and a structured run report. - flat_cvat_deliveries_to_hf.py: aggregates flat-batch DoclingDocument exports into a HuggingFace-ready parquet dataset. - cvat_task_batch_uploader_flat.py: uploads flat folder layouts (folder/images + folder/annotations.xml) as tasks to a CVAT project. Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * upgrade lock and deps Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * fix import sorting Signed-off-by: Christoph Auer <cau@zurich.ibm.com> --------- Signed-off-by: samiuc <sami.ullah.chat@gmail.com> Signed-off-by: Christoph Auer <cau@zurich.ibm.com> Co-authored-by: samiuc <sami.ullah.chat@gmail.com>