1 Commits

Author SHA1 Message Date
Nikos Livathinos 53dbd955ae feat: Extend the evaluators to support external predictions stored in files (#185)
* chore: Move the teds.py inside the subdir evaluators/table

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* feat: Introduce the external_predictions_path in BaseEvaluator and dummy entries in all evaluators.
Extend the CLI to support the --external-predictions-path

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* feat: Extend test_dataset_builder.py to save document predictions in various formats

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* feat: Extend MarkDownTextEvaluator to support external_predictions_path. Add unit test

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* feat: Extend LayoutEvaluator to support external_predictions_path. Add unit test.

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* fix: Add missing pytest dependencies in tests

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* fix: Fix loading the external predictions in LayoutEvaluator

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* feat: Introduce external predictions in DocStructureEvaluator. Add unit test.

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* feat: Extend the TableEvaluator to support external predictions. Add unit test

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* feat: Extend the KeyValueEvaluator to support external predictions. Add unit test.

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* feat: Extend the PixelLayoutEvaluator to support external predictions. Add unit test

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* feat: Extend the BboxTextEvaluator to support external predictions. Add unit test

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* feat: Disable the OCREvaluator when using the external predictions

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* fix: Fixing guard for external predictions in TimingsEvaluator, ReadingOrderEvaluator. Fix main

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* fix: Export the doctag files with the correct file extension

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* feat: Refactor the ExternalDoclingDocumentLoader to properly load a DoclingDocument from doctags and
the GT image.
- Introduce the staticmethod load_doctags() which covers all cases on page image loading.
- Refactor the FilePredictionProvider to use the load_doctags() from ExternalDoclingDocumentLoader.
- Refactor all evaluators to use the new ExternalDoclingDocumentLoader.

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* chore: Rename code file as external_docling_document_loader.py

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* fix: Fix typo

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* feat: Introduce examples how to evaluate using external predictions using the API and the CLI.

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

---------

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>
2025-12-08 16:51:45 +01:00