47 Commits

Author SHA1 Message Date
Nikos Livathinos a850784b4f feat: Improvements in user experience: Performance, error handling, logging (#189)
* feat: Extend evaluate_dpbench_on_external_predictions.sh to include visualisations of the evaluations

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* fix: Improve error checking in main.py:visualize()

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* fix: Improve logging

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* feat: Parallelize the computation of PixelLayoutEvaluator at the level of page

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* fix: Make DatasetPixelLayoutEvaluation a subclass of DatasetEvaluation

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* feat: Parallelize the MarkdownTextEvaluator

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* chore: Improve logging

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

---------

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>
2025-12-16 11:25:55 +01:00
Christoph Auer 373f959633 feat: Visualizer tool and command for datasets (#186)
* chore: Move the teds.py inside the subdir evaluators/table

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* feat: Introduce the external_predictions_path in BaseEvaluator and dummy entries in all evaluators.
Extend the CLI to support the --external-predictions-path

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* feat: Extend test_dataset_builder.py to save document predictions in various formats

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* feat: Extend MarkDownTextEvaluator to support external_predictions_path. Add unit test

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* feat: Extend LayoutEvaluator to support external_predictions_path. Add unit test.

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* fix: Add missing pytest dependencies in tests

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* fix: Fix loading the external predictions in LayoutEvaluator

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* feat: Introduce external predictions in DocStructureEvaluator. Add unit test.

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* feat: Extend the TableEvaluator to support external predictions. Add unit test

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* feat: Extend the KeyValueEvaluator to support external predictions. Add unit test.

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* feat: Extend the PixelLayoutEvaluator to support external predictions. Add unit test

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* feat: Extend the BboxTextEvaluator to support external predictions. Add unit test

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* feat: Disable the OCREvaluator when using the external predictions

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* fix: Fixing guard for external predictions in TimingsEvaluator, ReadingOrderEvaluator. Fix main

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* fix: Export the doctag files with the correct file extension

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* feat: Refactor the ExternalDoclingDocumentLoader to properly load a DoclingDocument from doctags and
the GT image.
- Introduce the staticmethod load_doctags() which covers all cases on page image loading.
- Refactor the FilePredictionProvider to use the load_doctags() from ExternalDoclingDocumentLoader.
- Refactor all evaluators to use the new ExternalDoclingDocumentLoader.

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* chore: Rename code file as external_docling_document_loader.py

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* fix: Fix typo

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* feat: Introduce examples how to evaluate using external predictions using the API and the CLI.

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* feat: Prediction vizualizer

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Update docling_eval/utils/external_predictions_visualizer.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Signed-off-by: Christoph Auer <60343111+cau-git@users.noreply.github.com>

* feat: Update examples bash script to demonstrate visualisations on external predictions

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

---------

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
Signed-off-by: Christoph Auer <60343111+cau-git@users.noreply.github.com>
Co-authored-by: Nikos Livathinos <nli@zurich.ibm.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-12-09 14:47:43 +01:00
Nikos Livathinos 53dbd955ae feat: Extend the evaluators to support external predictions stored in files (#185)
* chore: Move the teds.py inside the subdir evaluators/table

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* feat: Introduce the external_predictions_path in BaseEvaluator and dummy entries in all evaluators.
Extend the CLI to support the --external-predictions-path

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* feat: Extend test_dataset_builder.py to save document predictions in various formats

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* feat: Extend MarkDownTextEvaluator to support external_predictions_path. Add unit test

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* feat: Extend LayoutEvaluator to support external_predictions_path. Add unit test.

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* fix: Add missing pytest dependencies in tests

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* fix: Fix loading the external predictions in LayoutEvaluator

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* feat: Introduce external predictions in DocStructureEvaluator. Add unit test.

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* feat: Extend the TableEvaluator to support external predictions. Add unit test

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* feat: Extend the KeyValueEvaluator to support external predictions. Add unit test.

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* feat: Extend the PixelLayoutEvaluator to support external predictions. Add unit test

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* feat: Extend the BboxTextEvaluator to support external predictions. Add unit test

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* feat: Disable the OCREvaluator when using the external predictions

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* fix: Fixing guard for external predictions in TimingsEvaluator, ReadingOrderEvaluator. Fix main

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* fix: Export the doctag files with the correct file extension

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* feat: Refactor the ExternalDoclingDocumentLoader to properly load a DoclingDocument from doctags and
the GT image.
- Introduce the staticmethod load_doctags() which covers all cases on page image loading.
- Refactor the FilePredictionProvider to use the load_doctags() from ExternalDoclingDocumentLoader.
- Refactor all evaluators to use the new ExternalDoclingDocumentLoader.

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* chore: Rename code file as external_docling_document_loader.py

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* fix: Fix typo

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* feat: Introduce examples how to evaluate using external predictions using the API and the CLI.

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

---------

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>
2025-12-08 16:51:45 +01:00
Nikos Livathinos 409bf9f27a feat: Extend the Consolidator to export Latex files alongside the excel report (#143)
* feat: Extend the consolidator to produce Latex files also. Fix the MultiEvaluator to accept loading
json evalutions with the old format.

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* fix: Style the generated Latex code to have & symbols vertically aligned

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

---------

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>
2025-08-14 15:26:02 +02:00
Christoph Auer c08950b496 perf: Improve parquet writing with plain pyarrow (#134)
* perf: Improve parquet writing with plain pyarrow

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Smaller fixes

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Add pyarrow dep

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Fix circular import

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

---------

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
2025-07-02 10:17:47 +02:00
Christoph Auer 629a451d7b feat: Layout evaluation fixes, mode control and cleanup (#133)
* Misc fixes

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Make DatasetRecord tolerant to old parquet files

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Make DatasetRecord tolerant to old parquet files (2)

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Fix docvqa test, more cleanup

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Important fixes for layout mAP computation

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Adding modes for missing_prediction_strategy and label_filtering_strategy

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Fixes for mismatched docs

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Add F1 no_picture metrics to layout evaluator

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Fixed commands on all READMEs

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Remove extract_images ambiguity, use utility and fix errors on visualizer

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Upgrade to latest docling_core

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Fix ocrmac dep, upgrade uv.lock

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Fix for tableformer provider

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Remove code redundancy

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

---------

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
2025-07-01 10:02:59 +02:00
Christoph Auer 518e1ba342 fix: Misc fixes (#131)
* Misc fixes

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Make DatasetRecord tolerant to old parquet files

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

---------

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
2025-06-25 17:30:54 +02:00
samiuc 17e9fde84f feat: Update OCREvaluator with additional metrics (#78)
* Add README for Docling-DPBench

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* feat: Update OCREvaluator with additional metrics

* fix: bug fix

* add edit-distance lib

* update pure ocr metrics

* Establish SegmentedPage support in DatasetRecord and DatasetRecordWithPrediction

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Add SegmentedPage usage to PixParse dataset provider

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* add pure ocr metrics

* refactor: update dependencies

* fix dependencies and build errors

* feat: add optype and scipy-stubs packages

* fix: fix type error

* fix package name

* fix bugs and add funsd ocr test

* fix type error

* finalize changes

* fix build errors

* fix: ignore edit_distance missing import

* Add functionality to merge cells in Google OCR prediction (#103)

* feat: add global_merge function in google prediction provider for word cell merging

* address review comment

* remove unused imports

* address review comments and remove dictionary conversions

---------

Co-authored-by: samiullahchattha <Sami.Ullah1@ibm.com>

* refactor and address review comments

* fix regression bug

* refactor code and reduce metrics to three

* make ocr classes private

* fix type error

* refactor: update geometry utils to use BoundingBox and TextCell

Signed-off-by: samiullahchattha <Sami.Ullah1@ibm.com>

* refactor: rename metrics variables for consistency and clarity

Signed-off-by: samiullahchattha <Sami.Ullah1@ibm.com>

* Update lock for docling-core

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

---------

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
Signed-off-by: samiuc <sami.ullah.chat@gmail.com>
Signed-off-by: samiullahchattha <Sami.Ullah1@ibm.com>
Co-authored-by: Christoph Auer <cau@zurich.ibm.com>
Co-authored-by: samiullahchattha <Sami.Ullah1@ibm.com>
2025-06-02 14:48:32 +02:00
Michele Dolfi a469279ee3 ci: Refactor using uv for dependencies and add package CD (#113)
* refactor for using uv

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* fix deprecated classifier

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* missing uv.lock

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* move xmltodict to deps

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

---------

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
2025-05-28 11:01:27 +02:00
Nikos Livathinos 42e16152c5 feat: Extend the FileProvider and the CLI to accept parameters that control the source of the prediction images (#111)
* feat: Extend the FileProvider and the CLI to accept parameters that control the source of the
prediction images. This is used in case of DocTags:
- By default the images from GT will be used.
- The user can provide an external image path.
- Add documentation example how to evaluate doctag files.

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* fix: FileProvider. Fix loading of the image file.

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

---------

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>
2025-05-27 10:13:27 +02:00
Nikos Livathinos 04fe2d916f feat: Improvements for the MultiEvaluator (#95)
* fix: MultiEvaluator fix minor logging issue

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* chore: Improve code comments

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* feat: Refactor the MultiEvaluator to allow arbitrary experiment names for the benchmark subdirs.
- In case there is no eval dataset, the experiment name must match a provider's name and this
  will be used to run the predictions.
- In case there is eval dataset, the experiment name is just a tag and the information about the
  prediction provider will be extracted by the corresponding column of the parquet.
- If there is not eval dataset and the experiment name does not match any prediction provider,
  an exception is raised.

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* fix: MultiEvalutor rename the GT_LEAF_DIR and introduce the EVALUATIONS_DIR to make the dir structure
created/used by MultiEvaluator the same with the ones created by the CLI

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* fix: Change the pipeline settings of Docling to use 16 CPU threads.

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* chore: MultiEvaluator improve logging

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* fix: Fix the MultiEvaluator.load_multi_evaluation() to properly scan the multi evalution dir structure

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* fix: Ensure to use all CPU cores for the DoclingPredictionProvider

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

---------

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>
2025-05-15 17:00:33 +02:00
Christoph Auer 7903b6a1d9 feat: Add extra args for docling-provider and default annotations for CVAT (#98)
* Add README for Docling-DPBench

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Add CVAT annotation features, fix DatasetRecord.features usage

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* dev: Updates for CVAT and docling provider args

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* documentation for SmolDocling, fix artifacts_path

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Update lock

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

---------

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
2025-05-14 17:47:15 +02:00
Peter W. J. Staar 54d013bc5e feat: add area level f1 (#86)
* added the area-level precision, recall and f1

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* WIP: adding timing modality

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* updated the code with timings

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* added the timings modality

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* reformatted the code

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* fixed the test

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* ran the test_run_dpbench_tables with success

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* commented out test_run_dpbench_tables

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* reformatted code

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* found potential bug in base_prediction_provider

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* found potential bug in base_prediction_provider (2)

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* fixed the timings in base-predictor

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* removed prints and added logging-level for matplotlib

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* found bug in stats

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* updated the logging

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

---------

Signed-off-by: Peter Staar <taa@zurich.ibm.com>
2025-04-29 12:27:29 +02:00
Peter W. J. Staar 1e2040a629 fix: propagate cvat parameters (#82)
* propagated the CVAT parameters in the cli and updated the documentation

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* fixed the formatting

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* fix the export in PDF_Docling

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* fixed the PDF_Docling to parquet

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* fixed the visualisations and leveraged the new docling-core visualization capability

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* cleaned up the visualisation code

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* cleaned up the code and reformatted

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

---------

Signed-off-by: Peter Staar <taa@zurich.ibm.com>
2025-04-25 15:32:28 +02:00
Nikos Livathinos dee40e8f7d feat: Consolidate multiple evaluation results and generate a comparison matrix (#64)
* feat: Introduce the pred_modalities parameter in the BasePredictionProvider and its implementations

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* feat: Refactor main:get_prediction_provider() to add parameter that controls the visualizations.
Refactor the evaluate() to return the DatasetEvaluation as object.

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* feat: Introduce the MultiEvaluator that can generate ground truth and prediction datasets and also
compute the evalution across multiple providers and modalities. Add unit test.

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* chore: Update toml dependencies to include pandas, openpyxl

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* feat: Introduce staticmethod MultiEvaluator.load_multi_evaluation() to load multi-evaluations from
the disk. Update unit tests.

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* fix: Allow PENDING in the _accepted_status of BaseEvaluator

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* fix: Modifications in the unit test of MultiEvaluator. Code clean up.

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* feat: Introduce the Consolidator class that collects evaluation results and generates one excel report

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* chore: Improve the header names of excel export

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* feat: Extend the DatasetLayoutEvaluation with DatasetStatistics for all metrics

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* feat: Extend the Consolidator to include the standard deviation for each metric

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* chore: Update the pyproject.toml to pin to the  docling branch that supports the RT-DETRv2 model

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* feat: Extend the DatasetEvaluation to contain the evaluated and rejected samples. The rejected ones
are itemized per rejection type.

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* feat: Extend the Consolidator to include the samples (evaluated, rejected) in the generated excel

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* feat: Refactor the directory structure for MultiEvaluator and Consolidator classes.
- Refactor the generated excel matrix to include the experiment and provider columns.
- Refactor the BasePredictionProvider and all providers to have class attributes for the
  prediction_provider_type and prediction_modalities.
- Introduce CLI in the examples for the generation of the consolidation matrix.

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* fix: Remove ConversionStatus.PENDING accepted status from BaseEvaluator

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* fix: MultiEvaluator fix the load_multi_evaluation()

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* feat: Add the class attributes for supported modalities in all prediction providers

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* chore: Fix code typos

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* Address predictor_info TODOs

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* fix: Add the test_multi_evaluator as a pytest dependency for test_consolidator

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* Repin to docling release

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Regenerate lock file

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* fix: Use defaultdict for rejected_samples

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

---------

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
Co-authored-by: Christoph Auer <cau@zurich.ibm.com>
2025-04-22 12:59:21 +02:00
Christoph Auer 14a038e05f fix: Add CLI option for FileDatasetBuilder (#76)
* Add README for Docling-DPBench

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Add FileDatasetBuilder

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Add test and fixes

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Add FileDatasetBuilder to CLI

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

---------

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
2025-04-22 11:30:34 +02:00
laurachiticariu 732433b75a chore: Corrected PubTabNet_benchmarks.md (#75)
Add split vall to the other commands

Signed-off-by: laurachiticariu <chiti@us.ibm.com>
2025-04-22 11:29:56 +02:00
laurachiticariu c458dc5de1 Update PubTabNet_benchmarks.md (#74)
* Update PubTabNet_benchmarks.md

With the default `--split test`, the create-gt method throws an exception `Error creating dataset builder: Unknown split "test". Should be one of ['train', 'val'].`. Indeed, https://huggingface.co/datasets/ds4sd/PubTabNet_OTSL only has train and val splits. Given this, I believe using the `val` split is more suitable for this dataset.

Signed-off-by: laurachiticariu <chiti@us.ibm.com>

* Small typo

Signed-off-by: laurachiticariu <chiti@us.ibm.com>

* Update PubTabNet_benchmarks.md

Signed-off-by: laurachiticariu <chiti@us.ibm.com>

---------

Signed-off-by: laurachiticariu <chiti@us.ibm.com>
2025-04-21 07:08:15 -07:00
laurachiticariu fe52432f84 Update FinTabNet_benchmarks.md (#72)
Small typo

Signed-off-by: laurachiticariu <chiti@us.ibm.com>
2025-04-18 07:02:41 -07:00
Christoph Auer e3debd61d7 fix: Address missing conversion status (PENDING), add artifacts path, remove unused CLI args (#69)
* Add README for Docling-DPBench

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Restructured CVAT builder (WIP)

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* CVAT preannotation and dataset builders, with test cases

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Add CLI, merge from main

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Update README for CVAT

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Add artifacts path option to CLI, several fixes

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Remove raise

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

---------

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
2025-04-17 14:14:58 +02:00
Christoph Auer 28c2e1887b feat: Refactor CVAT builder (#68)
* Add README for Docling-DPBench

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Restructured CVAT builder (WIP)

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* CVAT preannotation and dataset builders, with test cases

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Add CLI, merge from main

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Update README for CVAT

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

---------

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
2025-04-17 12:30:30 +02:00
Christoph Auer ddf40241a9 Add README for Docling-DPBench (#60)
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
2025-04-07 14:50:14 +02:00
Christoph Auer a3d99b9f13 feat: Establish new API encapsulation for dataset creation and prediction providers (#30)
* correct mpy

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* reformatting

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* adding the script to make an initial dataset from pdf's

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* before switching to specific docling-core branch

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* rebased on kv-items and updated the create script in CVAT

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* fixed the cvat

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* added the annotation description on CVAT

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* added the annotation description on CVAT (2)

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* added the annotation description on CVAT (3)

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* [WIP] Crafting new dataset builder and prediction provider API

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Restructure to docling_eval_next

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Fix mypy

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Fix f-strings

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Changes for prediction_provider interface, to support all cases.

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Add omnidocbench DatasetBuilder

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Add doclaynet v1, funsd

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Fixes

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Add XFUND, more fixes

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* update the kv cell creation to prevent false positives

Signed-off-by: Saidgurbuz <said.gurbuz@epfl.ch>

* chore: Fixing imports

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* chore: Update docling-core version

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* feat: Introduce new design for Evaluators based on BaseEvaluator that accept external predictions.
And utility adapters.

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* Factor PredictionProvider out of dataset builder, many fixes on DatasetRecord

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Sketch example for file-directory prediction provider

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* chore: Fix typing hints

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* chore: Update poetry to doclign-core 2.24.0

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* feat: WIP: Introduce the FilePredictionProvider that reads files with predictions from the disk
- It currently supports doctags, markdown, json, yaml formats.
- We still need to improve the returned type so that it allows for no DoclingDocument but only for
  the source data (e.g. in case of markdown).

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* Add DocLayNetV2DatasetBuilder

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Added TableDatasetBuilder and test, update TableFormerPredictionProvider

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* chore: Update MyPy configuration in toml

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* feat: Refactor the BasePredictionProvider.predict() to return DatasetRecordWithPrediction

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* Fixes

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* fix: Fix the FilePredictionProvider. Return None in the predicted document in case of Markdown.

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* fix: Remove the kwargs from all PredictonProvider classes and introduce provider specific
initialization arguments

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* feat: Introduce the parameter "ignore_missing_files" in FilePredictionProvider

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* Add do_visualization to PredictionProvider

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Move next-gen API to main source tree, re-organize module paths

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Fixes

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Cleanup, change path handling

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Cleanup, change path handling

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* More module removal and renaming

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Small test fixes

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* fix: Add the "prediction_format" in the serialization of DatasetRecordWithPrediction

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* feat: Refactor the MarkdownTextEvaluator to support the new classes design. Add unit test.

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* fix: Improve the new design of MarkdownEvaluator to move common functionalities into the base class

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* feat: Refactor the LayoutEvaluator to use the new class design. Add unit test.

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* fix: Clean up LayoutEvaluator code

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* chore: Implementation cleanup and fixes for new class design (#52)

* More module removal and renaming

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Small test fixes

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Small test fixes

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Cleanup of tests and more fixes

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

---------

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Add visualization for tables

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Add visualization for all tests

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Fixes for test files, FilePredictionProvider changes

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Put new CLI

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Cleanup

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Rename CLI

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Update all README with new commands.

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Remove old examples

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Several Fixes

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* README updates

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Add gt_dir arg to create-eval, README fixes

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Fixes, pass tests

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* feat: Refactor the TableEvaluator to use the new class design.
Move common evaluator code to BaseEvaluator.
Add more unit tests. Introduce pytest dependencies.

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* Update lockfile

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Update lockfile

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Make pytest CI output more verbose

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* feat: Refactor the ReadingOrderEvaluator to use the new class design.
Remove the BaseReadingOrderEvaluator. Add unit test.

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* Optimize GT downloading behaviour

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Add file sources

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Allow pytest output on CI

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Disable tests in CI

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Reenable tests in CI

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Add correct @pytest.mark.dependency()

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* feat: Introduce TypeVars for the UnitEvaluation and DatasetEvaluation used by the BaseEvaluator.

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* Minimize tests in CI

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* feat: Refactor BboxTestEvaluator to use the new design. Introduce unit test.

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* Remove streaming in DocLaynet v1

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Add back test dependency

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

---------

Signed-off-by: Peter Staar <taa@zurich.ibm.com>
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
Signed-off-by: Saidgurbuz <said.gurbuz@epfl.ch>
Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>
Co-authored-by: Peter Staar <taa@zurich.ibm.com>
Co-authored-by: Saidgurbuz <said.gurbuz@epfl.ch>
Co-authored-by: Nikos Livathinos <nli@zurich.ibm.com>
2025-04-01 13:04:03 +02:00
Maxim Lysak ff2c9c5936 chore: Readme picture (#49)
* Added picture to the README

Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>

* Adjusted picture size in readme

Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>

* Slightly rotated picture for better alignment

Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>

---------

Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
Co-authored-by: Maksym Lysak <mly@zurich.ibm.com>
2025-03-24 13:07:41 +01:00
Nikos Livathinos dd4920a34d feat: Externally provided predictions for LayoutEvaluator, TableEvaluator. Support --begin-index, --end-index, --debug parameters. Code clean up. (#39)
* feat: Refactor main to remove --max-items and replace with --begin-index, --end-index. Add --debug.
Provide implementation for create_dlnv1_e2e_dataset()

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* docs: Update Readme

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* fix: DLNv1 create: Reduce shard size to 100. Fix debug error when not in SmolDocling. Improve logging

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* chore: Delete unused file

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* chore: Working on logging

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* chore: Code clean up

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* feat: Extend the LayoutEvaluator to accept external dictionary with DoclingDocuments

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* fix: DLNv1 create: Introduce hardcoded list of blacklisted doc_ids

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* feat: Extend the TabelEvalautor to accept optional dict with predicted DoclingDocument objects keyed
by the doc_id

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* fix: Fix DatsetStatistics:to_table() in case there is no data, to avoid division by zero.

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* fix: Fix bug in the visualizations:save_comparison_html_with_clusters()

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* Add label_mapping feature, penalize empty predictions instead of skipping.

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* chore: Pin docling to dev/new-tf-weights

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* fix: Update the FTN-OTSL and P1M-OTSL to v1.1 from our HF datasets

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* fix: Ensure that the table bboxes are not negative and within the page size

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* fix: In case pred_dict is given to TableEvaluator remove the ".png", ".jpg" suffixes from doc_id
before trying to match

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* fix: Protect the prediction in TableEvaluator within try..except

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* feat: Introduce the parameter structure_only in TableEvaluator

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* feat: Add optional parameter intermediate_results_dir in TableEvaluator

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* fix: TableEvaluator: Fixs in save_table_evaluations()

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* chore: Update to the latest docling 2.26.0

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* feat: Ensure that all create methods support the `begin_index`, `end_index` parameters.
- DPBench.
- OmniDocBench.
- DocLayNetv1.
- Table datasets (FinTabNet, PubTabNet, Pub1M).

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

---------

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
Co-authored-by: Christoph Auer <cau@zurich.ibm.com>
2025-03-18 13:22:56 +01:00
Saidgurbuz b6f7d0b02f feat: XFUND Dataset Creation (#44)
* add XFUND dataset creation script

Signed-off-by: Saidgurbuz <said.gurbuz@epfl.ch>

* add create_xfund.py

Signed-off-by: Saidgurbuz <said.gurbuz@epfl.ch>

* update implementation with split

Signed-off-by: Saidgurbuz <said.gurbuz@epfl.ch>

* fix the bug in download

Signed-off-by: Saidgurbuz <said.gurbuz@epfl.ch>

* fix download path

Signed-off-by: Saidgurbuz <said.gurbuz@epfl.ch>

---------

Signed-off-by: Saidgurbuz <said.gurbuz@epfl.ch>
2025-03-17 14:53:26 +01:00
Yusik Kim cfd91ac10f feat: Add Doclaynet v2 dataset creation (#12)
* doclaynet v2

Signed-off-by: Yusik Kim <kmyusk@gmail.com>

* True doc bounding boxes to bottom left origin

Signed-off-by: Yusik Kim <kmyusk@gmail.com>

* add reading order to benchmark script

Signed-off-by: Yusik Kim <kmyusk@gmail.com>

* make true and pred doc names the same

Signed-off-by: Yusik Kim <kmyusk@gmail.com>

* bound mem usage through limiting single shard size

Signed-off-by: Yusik Kim <kmyusk@gmail.com>

* add key-value items to the DoclingDocument

Signed-off-by: Saidgurbuz <said.gurbuz@epfl.ch>
Signed-off-by: Yusik Kim <kmyusk@gmail.com>

* adopt v1 format to v2

Signed-off-by: Yusik Kim <yusik.kim@ibm.com>
Signed-off-by: Yusik Kim <kmyusk@gmail.com>

* update implementation of kv items based on structure change, and add prov item

Signed-off-by: Saidgurbuz <said.gurbuz@epfl.ch>
Signed-off-by: Yusik Kim <kmyusk@gmail.com>

* fix version and format issues

Signed-off-by: Saidgurbuz <said.gurbuz@epfl.ch>
Signed-off-by: Yusik Kim <kmyusk@gmail.com>

* handle empty prov, update benchmark script

Signed-off-by: Yusik Kim <kmyusk@gmail.com>

* adapt create.py to KeyValue structure change

Signed-off-by: Saidgurbuz <said.gurbuz@epfl.ch>
Signed-off-by: Yusik Kim <kmyusk@gmail.com>

* add rule-based method, classify_cells, to assign labels to GraphCells

Signed-off-by: Saidgurbuz <said.gurbuz@epfl.ch>
Signed-off-by: Yusik Kim <kmyusk@gmail.com>

* add visualize_docling_document method to generate highlighted image for all item types with options

Signed-off-by: Saidgurbuz <said.gurbuz@epfl.ch>
Signed-off-by: Yusik Kim <kmyusk@gmail.com>

* Revert "add visualize_docling_document method to generate highlighted image for all item types with options"

This reverts commit 267918b59f892513ca633bbd1e7a31a0a45acc60.

Signed-off-by: Saidgurbuz <said.gurbuz@epfl.ch>
Signed-off-by: Yusik Kim <kmyusk@gmail.com>

* update method name

Signed-off-by: Saidgurbuz <said.gurbuz@epfl.ch>
Signed-off-by: Yusik Kim <kmyusk@gmail.com>

* fix: max_items loop break

Signed-off-by: Yusik Kim <yusik.kim@ibm.com>
Signed-off-by: Yusik Kim <kmyusk@gmail.com>

* fix tqdm total

Signed-off-by: Yusik Kim <yusik.kim@ibm.com>
Signed-off-by: Yusik Kim <kmyusk@gmail.com>

* rebase main utils.py

Signed-off-by: Saidgurbuz <said.gurbuz@epfl.ch>
Signed-off-by: Yusik Kim <kmyusk@gmail.com>

* chore: rebased and refactored

Signed-off-by: Yusik Kim <yusik.kim@ibm.com>
Signed-off-by: Yusik Kim <kmyusk@gmail.com>

* fix: modalities for DLN

Signed-off-by: Yusik Kim <kmyusk@gmail.com>

* remove unrelated code

Signed-off-by: Saidgurbuz <said.gurbuz@epfl.ch>
Signed-off-by: Yusik Kim <kmyusk@gmail.com>

* fix: ocr options, use image converter

Signed-off-by: Yusik Kim <kmyusk@gmail.com>

* change max_items back to 1000

Signed-off-by: Yusik Kim <yusik.kim@ibm.com>

* precommit black

Signed-off-by: Yusik Kim <yusik.kim@ibm.com>

---------

Signed-off-by: Yusik Kim <kmyusk@gmail.com>
Signed-off-by: Saidgurbuz <said.gurbuz@epfl.ch>
Signed-off-by: Yusik Kim <yusik.kim@ibm.com>
Co-authored-by: Saidgurbuz <said.gurbuz@epfl.ch>
Co-authored-by: Yusik Kim <yusik.kim@ibm.com>
2025-03-17 14:53:08 +01:00
Nikos Livathinos ddae1ec966 fix: Fix the modalities for DPBench, OmniDocBench, DLNv1. Switch to new settings in SmolDocling API. Improve the documentation. (#37)
* chore: Change the pinning of docling

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* fix: Fix the modalities supported for DPBench, OmniDocBench, DLNv1. Clean up code.

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* docs: Update documentation to have all benchmarks in separate md files and place links in Readme.

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* fix: Change the initialization of the create_smol_docling_converter() to allow flash-attn

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* docs: List benchmarks in the main readme with short description. Fix broken links in the documentation.

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* docs: Fix broken link in Readme.

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* chore: Update lock file

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* fix: Add debug code to dump the predicted text in create_dlnv1_e2e_dataset()

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* chore: Update toml to pin docling with branch and extras

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* fix: Disable the generation of VLM text debugging files for DLNv1 benchmark

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* chore: Update toml to docling v2.25.0 with vln extra

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

---------

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>
2025-02-26 15:50:02 +01:00
Nikos Livathinos df7e403b86 fix: Refactor docling-eval to improve the overall codebase structure (#36)
* chore: Rename `docling/` dir as converters. Introduce `visualization/` dir.

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* chore: Remove unused imports and other code formatting

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* chore: Remove the `utils/` dir, delete unused files and move used code in appropriate locations

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* chore: Introduce the file visualisation/visualisations.py and move there functions from benchmarks/utils.py

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* chore: Update MyPy configuration in toml to override tqdm module

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* chore: Clean up commented code

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* chore: Add CONVERTER_TYPE and MODALITIES columns to all produced datasets

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* chore: Update pinning of docling

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* fix: Code refactoring:
- Move converters/teds.py into evaluators/teds.py
- Move all functions from converters/utils.py into benchmarks/utils.py.
- Rename create_xxx_converter() functions.
- Rename BenchMarkColumns.DOCLING_VERSION as BenchMarkColumns.CONVERTER_VERSION

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

---------

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>
2025-02-25 12:45:51 +01:00
Nikos Livathinos 71789ac20c feat: Enable the usage of SmolDocling VLM document converter. Introduce CLI parameter --converter_type (#34)
* feat: Introduce create_vln_converter() that uses SmolDocling.

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* feat: Introduce the converter_type parameter in the CLI and initialize the converter appropriately
to use Docling or SmolDocling.

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* feat: Add the column CONVERTER_TYPE in the produced dataset.

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* fix: Fixing the benchmarks/utils/save_comparison_html_with_clusters() to skip docitems without provenances

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* chore: Rename conversion/create_converter() as create_docling_converter()

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* chore: Replace prints with logger in doclaynet_v1/create.py

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* chore: Update pinning of docling to a certain commit for smoldocling

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* fix: Fixing missing imports. Code formatting.

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

---------

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>
2025-02-24 15:54:19 +01:00
Christoph Auer 8a6d2fce9d feat: Add ReadingOrderEvaluator for new reading-order model (#29)
* correct mpy

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* reformatting

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* adding the script to make an initial dataset from pdf's

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* before switching to specific docling-core branch

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* rebased on kv-items and updated the create script in CVAT

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* fixed the cvat

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* added the annotation description on CVAT

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* added the annotation description on CVAT (2)

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* added the annotation description on CVAT (3)

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* feat: Add new reading-order model evaluator, re-factor to BaseReadingOrderEvaluator

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Rename New

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* fix: Fix calling DoclingDocument methods. Pin latest versions of docling and docling-core.
Remove commented out code, remove unused imports, code formatting.

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* docs: Re-run the evaluations for DPBench and update the docs

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* docs: Re-run the evaluations for OmniDocBench and update the docs

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* docs: Re-run the evaluations for DPBench and update the docs

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* docs: Re-run the evaluations for OmniDocBench and update the docs

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* docs: Minor improvements in DPBench OmniDocBench documentation

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

---------

Signed-off-by: Peter Staar <taa@zurich.ibm.com>
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>
Co-authored-by: Peter Staar <taa@zurich.ibm.com>
Co-authored-by: Nikos Livathinos <nli@zurich.ibm.com>
2025-02-21 17:18:33 +01:00
Saidgurbuz a12b47ad11 feat: Funsd Dataset Creation (#31)
* add doclingdocument version of funsd dataset creation script

Signed-off-by: Saidgurbuz <said.gurbuz@epfl.ch>

* add base create script for funsd dataset

Signed-off-by: Saidgurbuz <said.gurbuz@epfl.ch>

* update BenchmarkNames and format

Signed-off-by: Saidgurbuz <said.gurbuz@epfl.ch>

* fix styling

Signed-off-by: Saidgurbuz <said.gurbuz@epfl.ch>

---------

Signed-off-by: Saidgurbuz <said.gurbuz@epfl.ch>
2025-02-19 14:32:04 +01:00
Christoph Auer d3c8a13c2c Fix old usage of EvaluationModality.TABLEFORMER (#32)
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
2025-02-19 11:49:36 +01:00
Peter W. J. Staar 68ea3cd2ca feat: Improve CVAT dataset builder and add end-to-end documentation (#25)
* correct mpy

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* reformatting

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* adding the script to make an initial dataset from pdf's

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* before switching to specific docling-core branch

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* rebased on kv-items and updated the create script in CVAT

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* fixed the cvat

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* added the annotation description on CVAT

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* added the annotation description on CVAT (2)

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* added the annotation description on CVAT (3)

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* Update lock

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Fixes for CVAT

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Disable Examples in CI

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

---------

Signed-off-by: Peter Staar <taa@zurich.ibm.com>
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
Co-authored-by: Christoph Auer <cau@zurich.ibm.com>
2025-02-19 10:30:45 +01:00
Nikos Livathinos cba0e30f69 feat: Extend CLI to allow loading the full split with the -n parameter set to -1. (#26)
Add the MODALITIES field in constants.

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>
2025-02-13 09:58:46 +01:00
Yusik Kim 3eb4b49361 DocLayNet V1 (#19)
* doclaynet v1 create.py

Signed-off-by: Yusik Kim <kmyusk@gmail.com>

* added benchmark example for doclaynet v1

Signed-off-by: Yusik Kim <kmyusk@gmail.com>

* parameterize input data split

Signed-off-by: Yusik Kim <kmyusk@gmail.com>

* fix typo

Signed-off-by: Yusik Kim <kmyusk@gmail.com>

* use test set for benchmark

Signed-off-by: Yusik Kim <kmyusk@gmail.com>

* fix benchmark script path

Signed-off-by: Yusik Kim <kmyusk@gmail.com>

* fix ground truth bbox

Signed-off-by: Yusik Kim <kmyusk@gmail.com>

* add MARKDOWN_TEXT eval to doclaynet benchmark

Signed-off-by: Yusik Kim <kmyusk@gmail.com>

* follow benchmark output dir convention

Signed-off-by: Yusik Kim <kmyusk@gmail.com>

* doclaynet v1 eval results

Signed-off-by: Yusik Kim <kmyusk@gmail.com>

* fix last shard id

Signed-off-by: Yusik Kim <kmyusk@gmail.com>

* add tables and debug viz

Signed-off-by: Yusik Kim <kmyusk@gmail.com>

* use pdf as conversion source

Signed-off-by: Yusik Kim <kmyusk@gmail.com>

* small fix

Signed-off-by: Yusik Kim <kmyusk@gmail.com>

* updated the create script for doclaynet-v1

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* chore: Update the version of docling-core in pyproject.toml

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* feat: Make the visualization of the reading order during the creation of the datasets optional. WIP

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* fix: Introduce CLI parameter to pass the max-items. Fix the visualizaton on DLNv1 to colorise the
clusters and remove the reading-order.

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* fix: Visualize correctly the true and pred document.

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* feat: Introduce parameter artifacts_path in CLI to allow passing external files of models.
The parameter is propagated to docling PdfPipelineOptions

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* fix: Fix issues when passing custom artifact paths

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* feat: Bring back the drawing of the reading order as an optional feature during the creation of the dataset
Add DLNv1 create modality in the CLI main.

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* fix: Use the HF repo to download DLNv1. Remove the main from the `create()` methods.

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* fix: Fix a split issue in the DLNv1 create

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* fix: Fix bug in DLNv1 create.

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* fix: Rename docling-DocLayNet-v1.1 to DocLayNet-v1.2

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

---------

Signed-off-by: Yusik Kim <kmyusk@gmail.com>
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>
Co-authored-by: Peter Staar <taa@zurich.ibm.com>
Co-authored-by: Nikos Livathinos <nli@zurich.ibm.com>
2025-02-10 17:29:43 +01:00
Nikos Livathinos 8dee74c09f fix: Refactor code to support the TableFormerMode as an input parameter for TableFormerUpdater (#22)
* fix: Many code refactorings to support TableFormerMode (by default ACCURATE). Additonally:
- Clean up the overall implementation of TableFormerUpdater and set the AcceleratorOptions.
- Extend CLI to allow the creation of table datasets (PTN, FTN, P1M).

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* docs: WIP: Updating the Readme with benchmarks for the FTN, PTN, P1M.
Move the DP-Bench and OmniDocBench in separate files inside docs/ and provide links in Readme

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* docs: Add evaluation files inside docs/ with json/png files for FTN, PTN, P1M

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* docs: Fix the images links for DP-Bench, OmniDocBench docs

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* docs: Refactor the Readme to have evaluation data and results for the table datasets FTN, PTN, Pub1M
Add code snippets how to run the evalutions and visualizations for all datasets.

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* docs: Fix broken links in Readme

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

---------

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>
2025-02-03 09:17:57 +01:00
Nikos Livathinos a08741e80b feat: Ensure that the split is respected when loading/evaluating the dataset:
- Introduce the 'split' optional argument in CLI.
- Refactor main.evaluate, main.visualize to receive the split.
- Refactor the LayoutEvaluator, TableEvaluator to load the given split.

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>
2025-01-29 15:07:28 +01:00
Nikos Livathinos c65238d7ef fix: Fix the docs/examples that create/evaluate/visualize the tableformer datasets for PTN, FTN, P1M
Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>
2025-01-29 10:53:51 +01:00
Peter W. J. Staar d72fa112f8 feat: allow-for-buckets-in-cvat (#11)
* allow-for-buckets-in-cvat

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* refactored the CVAT create script

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* refactoring the CVAT pre-annotate and create

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* added eval for cvat-annotations

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* reformatted the code

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

---------

Signed-off-by: Peter Staar <taa@zurich.ibm.com>
2025-01-22 09:36:20 +01:00
Nikos Livathinos d04d016843 feat: Refactor the reading order evaluation to skip document items that have multiple provenances (#10)
- fix: Refactor/improve the code to save log files with evaluation tables and png files with the plots
and ensure to produce all the evaluations/visualizations in the docs/examples/benchmark_xxx.py files
- Introduce optional parameter in create methods for DP-Bench and OmniDocBench to generate
visualizations.
- Update the evaluation files (json/txt/png) in docs/evaluations per dataset. Update Readme.
- Update Readme with the OmniDocBench evaluation/visualization files
- Poetry: Move to docling 2.15.1

---------

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>
2025-01-21 14:58:17 +01:00
Nikos Livathinos 3569e75b32 feat: Add ReadingOrder and Markdown text evaluation (#8)
* chore: Add tqdm in the dependencies

* chore: Move the DatasetStats outside of table_evaluator into the utils/stats.py

* feat: ReadingOrderEvaluator: Full implementation with Average Relative Distance metric

* fix: Add reading_order in the visualise() method of main

* fix: utils/stats.py: Add the metric name as a parameter. Clean up code

* chore: Add reading_order evaluation and visualization in the examples for dp-bench and omnidocbench
Add the doc_id in the evaluation report

* feat: MarkdownTextEvaluator: Introduce text evaluation based on markdown export of DoclingDocument.
Use BLEU metric

* feat: Add ReadingOrderVisualizer and use it in the main

* chore: Add pillow lib to the poetry

* fix: ReadingOrderEvaluator: Convert the bboxes in bottom-left origin before calling the reading-order

* chore: Update poetry lock

* fix: Refactor to move the evaluator statistics in a separate file evaluators/stats.py.
Decouple the code to draw arrows in a separate function inside utils.py
Delete unused code.
Fix mypy issues.

* chore: Update Readme to include the evaluations and visualizations for the "reading-order" and the
"markdown-text" modalities.

* fix: Refactor the stats.py:save_historgram() to receive generic name for the plot
Generate histogram plots for the reading_order and markdown_text visualizations
Update Readme statistics for the reading_order and markdown_text modalities.

* feat: ReadingOrder: Implement weighted ARD where the weight is based on the bbox size

* chore: Update Readme with ARD and weighted ARD and histograms

---------

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>
2025-01-17 09:45:14 +01:00
Peter W. J. Staar 4bf28de7ed feat: Adding script to prepopulate CVAT and create GT annotations from CVAT annotation files (#9)
* adding script to prepopulate CVAT and create GT annotations from CVAT annotation files

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* it works

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* Fixed the DPBench with the refactoring

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* refactored the cvat annotations in preannotate and create script

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* updated the code to export layout after re-annotated the DP-Bench dataset

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* fixed some nasty bugs

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* adding documentation files for CVAT annotation

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* updated code

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* major updates

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* work-in-progress

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* working on reformatting and getting mypy alignment

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* fixed the mypy errors

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* moved the code to cvat_annotation

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* updated the png packaging

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

---------

Signed-off-by: Peter Staar <taa@zurich.ibm.com>
2025-01-15 15:43:56 +01:00
Peter W. J. Staar 6bc9140325 Add omnidocbench, many optimizations (#4)
* adding the omnidocbench benchmarkl

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* added the table-parsing in omnidocbench

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* finished the OmniDocBench implementation

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* reformatted the code

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* updated the README

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* updated the README and the cli

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* clean up the DP-Bench example

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* made the DPBench and OmniDocBench follow the same example code

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* reformatted the code

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* cleaned up the dp-bench create script

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* added the ability to see the clusters and reading order for layout

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* reformatted the code

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* working on making datasets from pdf collections

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* added the package_pdfs example

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* added the FinTabNet-OTSL benchmark

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* reformatted the code

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* added the fintabnet example evaluation

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* updated the README fort FinTabNet

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* updated the README

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* refactored the table evaluations

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* added the text inclusion in the table prediction

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* fixed the header of the HTML

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* reformatted the code

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* fix: Formatting and unused code cleanup

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* feat: Extend the CLI to create the OMNIDOCBENCH datasets for the layout and tableformer modalities

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* Added exit to benchmark end-to-end scripts in case git-lfs is not installed (#5)

Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
Co-authored-by: Maksym Lysak <mly@zurich.ibm.com>

* fix: Use TableStructureModel from docling, use backends, fix boundingbox coordinates

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Reinstate layout test on dpbench

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Comments

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Comments

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Remove unused code

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Remove more unused code

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Fixes for Omnidoc

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Fixes for layout eval bounding boxes

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* More fixes for OmniDoc, README updates

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* More fixes for OmniDoc, README updates

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Replace git-lsf with HF snapshot_download

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

---------

Signed-off-by: Peter Staar <taa@zurich.ibm.com>
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
Co-authored-by: Christoph Auer <cau@zurich.ibm.com>
Co-authored-by: Nikos Livathinos <nli@zurich.ibm.com>
Co-authored-by: Maxim Lysak <101627549+maxmnemonic@users.noreply.github.com>
Co-authored-by: Maksym Lysak <mly@zurich.ibm.com>
2025-01-07 12:52:29 +01:00
Peter Staar 6337b29baa forgot the end-2-end script
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
2024-12-21 11:29:25 +01:00
Peter Staar 05c42f175d updated the README (3)
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
2024-12-20 14:13:00 +01:00
Peter Staar 0a6829d2ee updated the README
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
2024-12-20 14:09:20 +01:00