Files
docling-eval/docs/examples/evaluate_external_predictions.py
Nikos Livathinos 53dbd955ae feat: Extend the evaluators to support external predictions stored in files (#185)
* chore: Move the teds.py inside the subdir evaluators/table

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* feat: Introduce the external_predictions_path in BaseEvaluator and dummy entries in all evaluators.
Extend the CLI to support the --external-predictions-path

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* feat: Extend test_dataset_builder.py to save document predictions in various formats

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* feat: Extend MarkDownTextEvaluator to support external_predictions_path. Add unit test

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* feat: Extend LayoutEvaluator to support external_predictions_path. Add unit test.

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* fix: Add missing pytest dependencies in tests

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* fix: Fix loading the external predictions in LayoutEvaluator

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* feat: Introduce external predictions in DocStructureEvaluator. Add unit test.

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* feat: Extend the TableEvaluator to support external predictions. Add unit test

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* feat: Extend the KeyValueEvaluator to support external predictions. Add unit test.

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* feat: Extend the PixelLayoutEvaluator to support external predictions. Add unit test

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* feat: Extend the BboxTextEvaluator to support external predictions. Add unit test

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* feat: Disable the OCREvaluator when using the external predictions

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* fix: Fixing guard for external predictions in TimingsEvaluator, ReadingOrderEvaluator. Fix main

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* fix: Export the doctag files with the correct file extension

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* feat: Refactor the ExternalDoclingDocumentLoader to properly load a DoclingDocument from doctags and
the GT image.
- Introduce the staticmethod load_doctags() which covers all cases on page image loading.
- Refactor the FilePredictionProvider to use the load_doctags() from ExternalDoclingDocumentLoader.
- Refactor all evaluators to use the new ExternalDoclingDocumentLoader.

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* chore: Rename code file as external_docling_document_loader.py

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* fix: Fix typo

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* feat: Introduce examples how to evaluate using external predictions using the API and the CLI.

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

---------

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>
2025-12-08 16:51:45 +01:00

86 lines
2.0 KiB
Python

import argparse
import logging
from pathlib import Path
from docling_eval.cli.main import evaluate
from docling_eval.datamodels.types import BenchMarkNames, EvaluationModality
_log = logging.getLogger(__name__)
def evaluate_external_predictions(
benchmark: BenchMarkNames,
modality: EvaluationModality,
gt_path: Path,
predictions_dir: Path,
save_dir: Path,
):
r""" """
evaluate(
modality,
benchmark,
gt_path,
save_dir,
external_predictions_path=predictions_dir,
)
def main():
r""" """
parser = argparse.ArgumentParser(
description="Example how to use GT from parquet and predictions from externally provided prediction files",
formatter_class=argparse.RawTextHelpFormatter,
)
parser.add_argument(
"-b",
"--benchmark",
required=True,
type=BenchMarkNames,
help="Evaluation modality",
)
parser.add_argument(
"-m",
"--modality",
required=True,
type=EvaluationModality,
help="Evaluation modality",
)
parser.add_argument(
"-g",
"--gt_parquet_dir",
required=True,
type=Path,
help="Path to the parquet GT dataset",
)
parser.add_argument(
"-p",
"--predictions_dir",
required=True,
type=Path,
help="Dir with the external prediction files (json, dt, yaml)",
)
parser.add_argument(
"-s",
"--save_dir",
required=False,
type=Path,
help="Path to save the produced evaluation files",
)
args = parser.parse_args()
# Configure logger
log_format = "%(asctime)s - %(levelname)s - %(message)s"
logging.basicConfig(level=logging.INFO, format=log_format)
evaluate_external_predictions(
args.benchmark,
args.modality,
args.gt_parquet_dir,
args.predictions_dir,
args.save_dir,
)
if __name__ == "__main__":
main()