Files
docling-eval/docs/DocLayNetv1_benchmarks.md
Christoph Auer 629a451d7b feat: Layout evaluation fixes, mode control and cleanup (#133)
* Misc fixes

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Make DatasetRecord tolerant to old parquet files

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Make DatasetRecord tolerant to old parquet files (2)

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Fix docvqa test, more cleanup

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Important fixes for layout mAP computation

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Adding modes for missing_prediction_strategy and label_filtering_strategy

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Fixes for mismatched docs

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Add F1 no_picture metrics to layout evaluator

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Fixed commands on all READMEs

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Remove extract_images ambiguity, use utility and fix errors on visualizer

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Upgrade to latest docling_core

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Fix ocrmac dep, upgrade uv.lock

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Fix for tableformer provider

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Remove code redundancy

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

---------

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
2025-07-01 10:02:59 +02:00

89 lines
2.5 KiB
Markdown

# DocLayNet V1.2 Benchmarks
Create and evaluate DocLayNetv1.2 datasets using the following commands. This downloads the [DocLayNetv1.2_OTSL](https://huggingface.co/datasets/ds4sd/DocLayNet-v1.2) from HuggingFace and runs the evaluations using the PDF Docling converter for all supported modalities.
Create evaluation datasets
```sh
# Make the ground-truth
docling-eval create-gt --benchmark DocLayNetV1 --output-dir ./benchmarks/DocLayNetV1/
# Make predictions for different modalities.
docling-eval create-eval \
--benchmark DocLayNetV1 \
--output-dir ./benchmarks/DocLayNetV1/ \
--prediction-provider Docling # use full-document predictions from docling
docling-eval create-eval \
--benchmark DPBench \
--output-dir ./benchmarks/DocLayNetV1/ \
--prediction-provider TableFormer # use tableformer predictions only
```
## Layout Evaluation
Create the evaluation report:
```sh
docling-eval evaluate \
--modality layout \
--benchmark DocLayNetV1 \
--output-dir ./benchmarks/DocLayNetV1/
```
[Layout evaluation json](evaluations/DocLayNetV1/evaluation_DocLayNetV1_layout.json)
Visualize the report:
```sh
docling-eval visualize \
--modality layout \
--benchmark DocLayNetV1 \
--output-dir ./benchmarks/DocLayNetV1/
```
[mAP[0.5:0.95] report](evaluations/DocLayNetV1/evaluation_DocLayNetV1_layout_mAP_0.5_0.95.txt)
![mAP[0.5:0.95] plot](evaluations/DocLayNetV1/evaluation_DocLayNetV1_layout_mAP_0.5_0.95.png)
## Markdown text Evaluation
Create the report:
```sh
docling-eval evaluate \
--modality markdown_text \
--benchmark DocLayNetV1 \
--output-dir ./benchmarks/DocLayNetV1/
```
[Markdown text json](evaluations/DocLayNetV1/evaluation_DocLayNetV1_markdown_text.json)
Visualize the report:
```sh
docling-eval visualize \
--modality markdown_text \
--benchmark DocLayNetV1 \
--output-dir ./benchmarks/DocLayNetV1/
```
[Markdown text report](evaluations/DocLayNetV1/evaluation_DocLayNetV1_markdown_text.txt)
![BLEU plot](evaluations/DocLayNetV1/evaluation_DocLayNetV1_markdown_text_BLEU.png)
![Edit distance plot](evaluations/DocLayNetV1/evaluation_DocLayNetV1_markdown_text_edit_distance.png)
![F1 plot](evaluations/DocLayNetV1/evaluation_DocLayNetV1_markdown_text_F1.png)
![Meteor plot](evaluations/DocLayNetV1/evaluation_DocLayNetV1_markdown_text_meteor.png)
![Precision plot](evaluations/DocLayNetV1/evaluation_DocLayNetV1_markdown_text_precision.png)
![Recall plot](evaluations/DocLayNetV1/evaluation_DocLayNetV1_markdown_text_recall.png)