1 Commits

Author SHA1 Message Date
Yusik Kim 3eb4b49361 DocLayNet V1 (#19)
* doclaynet v1 create.py

Signed-off-by: Yusik Kim <kmyusk@gmail.com>

* added benchmark example for doclaynet v1

Signed-off-by: Yusik Kim <kmyusk@gmail.com>

* parameterize input data split

Signed-off-by: Yusik Kim <kmyusk@gmail.com>

* fix typo

Signed-off-by: Yusik Kim <kmyusk@gmail.com>

* use test set for benchmark

Signed-off-by: Yusik Kim <kmyusk@gmail.com>

* fix benchmark script path

Signed-off-by: Yusik Kim <kmyusk@gmail.com>

* fix ground truth bbox

Signed-off-by: Yusik Kim <kmyusk@gmail.com>

* add MARKDOWN_TEXT eval to doclaynet benchmark

Signed-off-by: Yusik Kim <kmyusk@gmail.com>

* follow benchmark output dir convention

Signed-off-by: Yusik Kim <kmyusk@gmail.com>

* doclaynet v1 eval results

Signed-off-by: Yusik Kim <kmyusk@gmail.com>

* fix last shard id

Signed-off-by: Yusik Kim <kmyusk@gmail.com>

* add tables and debug viz

Signed-off-by: Yusik Kim <kmyusk@gmail.com>

* use pdf as conversion source

Signed-off-by: Yusik Kim <kmyusk@gmail.com>

* small fix

Signed-off-by: Yusik Kim <kmyusk@gmail.com>

* updated the create script for doclaynet-v1

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* chore: Update the version of docling-core in pyproject.toml

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* feat: Make the visualization of the reading order during the creation of the datasets optional. WIP

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* fix: Introduce CLI parameter to pass the max-items. Fix the visualizaton on DLNv1 to colorise the
clusters and remove the reading-order.

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* fix: Visualize correctly the true and pred document.

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* feat: Introduce parameter artifacts_path in CLI to allow passing external files of models.
The parameter is propagated to docling PdfPipelineOptions

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* fix: Fix issues when passing custom artifact paths

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* feat: Bring back the drawing of the reading order as an optional feature during the creation of the dataset
Add DLNv1 create modality in the CLI main.

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* fix: Use the HF repo to download DLNv1. Remove the main from the `create()` methods.

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* fix: Fix a split issue in the DLNv1 create

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* fix: Fix bug in DLNv1 create.

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* fix: Rename docling-DocLayNet-v1.1 to DocLayNet-v1.2

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

---------

Signed-off-by: Yusik Kim <kmyusk@gmail.com>
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>
Co-authored-by: Peter Staar <taa@zurich.ibm.com>
Co-authored-by: Nikos Livathinos <nli@zurich.ibm.com>
2025-02-10 17:29:43 +01:00