Files
Christoph Auer 5c9f3fadf2 feat: flat-layout CVAT campaign tools and resilient shard writing (#206)
* fix: Close PIL Image objects to prevent memory leaks.

* prevent PIL Image memory leaks

* serialize images to bytes and add EasyOCR test script

* DCO Remediation Commit for samiuc <sami.ullah.chat@gmail.com>

I, samiuc <sami.ullah.chat@gmail.com>, hereby add my Signed-off-by to this commit: a592458dce
I, samiuc <sami.ullah.chat@gmail.com>, hereby add my Signed-off-by to this commit: ac622cce15
I, samiuc <sami.ullah.chat@gmail.com>, hereby add my Signed-off-by to this commit: 3648bd972b

Signed-off-by: samiuc <sami.ullah.chat@gmail.com>

* style: fix isort import ordering

* DCO Remediation Commit for samiuc <sami.ullah.chat@gmail.com>

I, samiuc <sami.ullah.chat@gmail.com>, hereby add my Signed-off-by to this commit: a592458dce
I, samiuc <sami.ullah.chat@gmail.com>, hereby add my Signed-off-by to this commit: ac622cce15
I, samiuc <sami.ullah.chat@gmail.com>, hereby add my Signed-off-by to this commit: 3648bd972b
I, samiuc <sami.ullah.chat@gmail.com>, hereby add my Signed-off-by to this commit: c05b85e42a

Signed-off-by: samiuc <sami.ullah.chat@gmail.com>

* feat: support omnidocbench parquet format

* DCO Remediation Commit for samiuc <sami.ullah.chat@gmail.com>

I, samiuc <sami.ullah.chat@gmail.com>, hereby add my Signed-off-by to this commit: a592458dce
I, samiuc <sami.ullah.chat@gmail.com>, hereby add my Signed-off-by to this commit: ac622cce15
I, samiuc <sami.ullah.chat@gmail.com>, hereby add my Signed-off-by to this commit: 3648bd972b
I, samiuc <sami.ullah.chat@gmail.com>, hereby add my Signed-off-by to this commit: c05b85e42a
I, samiuc <sami.ullah.chat@gmail.com>, hereby add my Signed-off-by to this commit: 36044e5eb5

Signed-off-by: samiuc <sami.ullah.chat@gmail.com>

* fix: format files

Signed-off-by: samiuc <sami.ullah.chat@gmail.com>

* fix: tags check

Signed-off-by: samiuc <sami.ullah.chat@gmail.com>

* fix: update pixel evaluator with NumPy 1.x compatible bit counting.

Signed-off-by: samiuc <sami.ullah.chat@gmail.com>

* fix: prevent memory leaks

Signed-off-by: samiuc <sami.ullah.chat@gmail.com>

* reformat and fix type errors

Signed-off-by: samiuc <sami.ullah.chat@gmail.com>

* fix: build error and import tableformer provider optionally

Signed-off-by: samiuc <sami.ullah.chat@gmail.com>

* remove comment

Signed-off-by: samiuc <sami.ullah.chat@gmail.com>

* fix: reset streams and adding image refs

* fix TableStructureModel import path

Signed-off-by: samiuc <sami.ullah.chat@gmail.com>

* DCO Remediation Commit for samiuc <sami.ullah.chat@gmail.com>

I, samiuc <sami.ullah.chat@gmail.com>, hereby add my Signed-off-by to this commit: a592458dce
I, samiuc <sami.ullah.chat@gmail.com>, hereby add my Signed-off-by to this commit: ac622cce15
I, samiuc <sami.ullah.chat@gmail.com>, hereby add my Signed-off-by to this commit: 3648bd972b
I, samiuc <sami.ullah.chat@gmail.com>, hereby add my Signed-off-by to this commit: c05b85e42a
I, samiuc <sami.ullah.chat@gmail.com>, hereby add my Signed-off-by to this commit: 36044e5eb5
I, samiuc <sami.ullah.chat@gmail.com>, hereby add my Signed-off-by to this commit: b83a1f0b7d

Signed-off-by: samiuc <sami.ullah.chat@gmail.com>

* DCO Remediation Commit for samiuc <sami.ullah.chat@gmail.com>

I, samiuc <sami.ullah.chat@gmail.com>, hereby add my Signed-off-by to this commit: a592458dce
I, samiuc <sami.ullah.chat@gmail.com>, hereby add my Signed-off-by to this commit: ac622cce15
I, samiuc <sami.ullah.chat@gmail.com>, hereby add my Signed-off-by to this commit: 3648bd972b
I, samiuc <sami.ullah.chat@gmail.com>, hereby add my Signed-off-by to this commit: c05b85e42a
I, samiuc <sami.ullah.chat@gmail.com>, hereby add my Signed-off-by to this commit: 36044e5eb5
I, samiuc <sami.ullah.chat@gmail.com>, hereby add my Signed-off-by to this commit: b83a1f0b7d

Signed-off-by: samiuc <sami.ullah.chat@gmail.com>

* Update dependencies and uv.lock

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* address review comments

Signed-off-by: samiuc <sami.ullah.chat@gmail.com>

* dependency upgrades

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* fix: unify group registry to prevent duplicate group creation

Merge created_groups and ListHierarchyManager.group_containers into a
single shared all_groups registry. This eliminates a class of duplicate
DocItem refs that arose when list and non-list code paths each created
separate groups for the same path_id.

Also adds _validate_unique_crefs() as a post-conversion assertion that
fires before coordinate scaling.

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* feat: add flat-layout campaign tools

Adds three scripts for workflows using flat delivery directories
(as opposed to the submission-style layout the existing tools expect):

- convert_flat_cvat_deliveries_to_docling.py: batch-converts flat CVAT
  XML deliveries into DoclingDocument JSON + HTML visualizations, with
  worker parallelism, dry-run, and a structured run report.
- flat_cvat_deliveries_to_hf.py: aggregates flat-batch DoclingDocument
  exports into a HuggingFace-ready parquet dataset.
- cvat_task_batch_uploader_flat.py: uploads flat folder layouts
  (folder/images + folder/annotations.xml) as tasks to a CVAT project.

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* upgrade lock and deps

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* fix import sorting

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

---------

Signed-off-by: samiuc <sami.ullah.chat@gmail.com>
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
Co-authored-by: samiuc <sami.ullah.chat@gmail.com>
2026-03-31 13:27:23 +02:00
..