mirror of
https://github.com/docling-project/docling-eval.git
synced 2026-05-17 13:10:47 +00:00
629a451d7b
* Misc fixes Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * Make DatasetRecord tolerant to old parquet files Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * Make DatasetRecord tolerant to old parquet files (2) Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * Fix docvqa test, more cleanup Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * Important fixes for layout mAP computation Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * Adding modes for missing_prediction_strategy and label_filtering_strategy Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * Fixes for mismatched docs Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * Add F1 no_picture metrics to layout evaluator Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * Fixed commands on all READMEs Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * Remove extract_images ambiguity, use utility and fix errors on visualizer Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * Upgrade to latest docling_core Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * Fix ocrmac dep, upgrade uv.lock Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * Fix for tableformer provider Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * Remove code redundancy Signed-off-by: Christoph Auer <cau@zurich.ibm.com> --------- Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
4.0 KiB
4.0 KiB
DP-Bench Benchmarks
Create DPBench evaluation datasets:
# Make the ground-truth
docling-eval create-gt --benchmark DPBench --output-dir ./benchmarks/DPBench-gt/
# Make predictions for different modalities.
docling-eval create-eval \
--benchmark DPBench \
--gt-dir ./benchmarks/DPBench-gt/gt_dataset/ \
--output-dir ./benchmarks/DPBench-e2e/ \
--prediction-provider Docling # use full-document predictions from docling
docling-eval create-eval \
--benchmark DPBench \
--gt-dir ./benchmarks/DPBench-gt/gt_dataset/ \
--output-dir ./benchmarks/DPBench-tables/ \
--prediction-provider TableFormer # use tableformer predictions only
Layout Evaluation
Create the evaluation report:
docling-eval evaluate \
--modality layout \
--benchmark DPBench \
--output-dir ./benchmarks/DPBench-e2e/
Visualize the report:
docling-eval visualize \
--modality layout \
--benchmark DPBench \
--output-dir ./benchmarks/DPBench-e2e/
TableFormer Evaluation
Create the evaluation report:
docling-eval evaluate \
--modality table_structure \
--benchmark DPBench \
--output-dir ./benchmarks/DPBench-tables/
Visualize the report:
Visualize the report:
docling-eval visualize \
--modality table_structure \
--benchmark DPBench \
--output-dir ./benchmarks/DPBench-tables/
Reading order Evaluation
Create the evaluation report:
docling-eval evaluate \
--modality reading_order \
--benchmark DPBench \
--output-dir ./benchmarks/DPBench-e2e/
Visualize the report:
docling-eval visualize \
--modality reading_order \
--benchmark DPBench \
--output-dir ./benchmarks/DPBench-e2e/
Markdown text Evaluation
Create the evaluation report:
docling-eval evaluate \
--modality markdown_text \
--benchmark DPBench \
--output-dir ./benchmarks/DPBench-e2e/
Visualize the report:
docling-eval visualize \
--modality markdown_text \
--benchmark DPBench \
--output-dir ./benchmarks/DPBench-e2e/
![mAP[0.5:0.95] plot](/docling-project/docling-eval/media/commit/cb28df734e5a2fa965fbbabecf0ec8a920ecf0dd/docs/evaluations/DPBench/evaluation_DPBench_layout_mAP_0.5_0.95.png)










