* doclaynet v1 create.py
Signed-off-by: Yusik Kim <kmyusk@gmail.com>
* added benchmark example for doclaynet v1
Signed-off-by: Yusik Kim <kmyusk@gmail.com>
* parameterize input data split
Signed-off-by: Yusik Kim <kmyusk@gmail.com>
* fix typo
Signed-off-by: Yusik Kim <kmyusk@gmail.com>
* use test set for benchmark
Signed-off-by: Yusik Kim <kmyusk@gmail.com>
* fix benchmark script path
Signed-off-by: Yusik Kim <kmyusk@gmail.com>
* fix ground truth bbox
Signed-off-by: Yusik Kim <kmyusk@gmail.com>
* add MARKDOWN_TEXT eval to doclaynet benchmark
Signed-off-by: Yusik Kim <kmyusk@gmail.com>
* follow benchmark output dir convention
Signed-off-by: Yusik Kim <kmyusk@gmail.com>
* doclaynet v1 eval results
Signed-off-by: Yusik Kim <kmyusk@gmail.com>
* fix last shard id
Signed-off-by: Yusik Kim <kmyusk@gmail.com>
* add tables and debug viz
Signed-off-by: Yusik Kim <kmyusk@gmail.com>
* use pdf as conversion source
Signed-off-by: Yusik Kim <kmyusk@gmail.com>
* small fix
Signed-off-by: Yusik Kim <kmyusk@gmail.com>
* updated the create script for doclaynet-v1
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* chore: Update the version of docling-core in pyproject.toml
Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>
* feat: Make the visualization of the reading order during the creation of the datasets optional. WIP
Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>
* fix: Introduce CLI parameter to pass the max-items. Fix the visualizaton on DLNv1 to colorise the
clusters and remove the reading-order.
Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>
* fix: Visualize correctly the true and pred document.
Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>
* feat: Introduce parameter artifacts_path in CLI to allow passing external files of models.
The parameter is propagated to docling PdfPipelineOptions
Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>
* fix: Fix issues when passing custom artifact paths
Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>
* feat: Bring back the drawing of the reading order as an optional feature during the creation of the dataset
Add DLNv1 create modality in the CLI main.
Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>
* fix: Use the HF repo to download DLNv1. Remove the main from the `create()` methods.
Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>
* fix: Fix a split issue in the DLNv1 create
Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>
* fix: Fix bug in DLNv1 create.
Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>
* fix: Rename docling-DocLayNet-v1.1 to DocLayNet-v1.2
Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>
---------
Signed-off-by: Yusik Kim <kmyusk@gmail.com>
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>
Co-authored-by: Peter Staar <taa@zurich.ibm.com>
Co-authored-by: Nikos Livathinos <nli@zurich.ibm.com>