docling-eval

mirror of https://github.com/docling-project/docling-eval.git synced 2026-05-17 13:10:47 +00:00

Author	SHA1	Message	Date
Nikos Livathinos	a850784b4f	feat: Improvements in user experience: Performance, error handling, logging (#189 ) * feat: Extend evaluate_dpbench_on_external_predictions.sh to include visualisations of the evaluations Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> * fix: Improve error checking in main.py:visualize() Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> * fix: Improve logging Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> * feat: Parallelize the computation of PixelLayoutEvaluator at the level of page Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> * fix: Make DatasetPixelLayoutEvaluation a subclass of DatasetEvaluation Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> * feat: Parallelize the MarkdownTextEvaluator Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> * chore: Improve logging Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> --------- Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>	2025-12-16 11:25:55 +01:00
Christoph Auer	373f959633	feat: Visualizer tool and command for datasets (#186 ) * chore: Move the teds.py inside the subdir evaluators/table Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> * feat: Introduce the external_predictions_path in BaseEvaluator and dummy entries in all evaluators. Extend the CLI to support the --external-predictions-path Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> * feat: Extend test_dataset_builder.py to save document predictions in various formats Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> * feat: Extend MarkDownTextEvaluator to support external_predictions_path. Add unit test Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> * feat: Extend LayoutEvaluator to support external_predictions_path. Add unit test. Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> * fix: Add missing pytest dependencies in tests Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> * fix: Fix loading the external predictions in LayoutEvaluator Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> * feat: Introduce external predictions in DocStructureEvaluator. Add unit test. Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> * feat: Extend the TableEvaluator to support external predictions. Add unit test Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> * feat: Extend the KeyValueEvaluator to support external predictions. Add unit test. Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> * feat: Extend the PixelLayoutEvaluator to support external predictions. Add unit test Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> * feat: Extend the BboxTextEvaluator to support external predictions. Add unit test Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> * feat: Disable the OCREvaluator when using the external predictions Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> * fix: Fixing guard for external predictions in TimingsEvaluator, ReadingOrderEvaluator. Fix main Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> * fix: Export the doctag files with the correct file extension Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> * feat: Refactor the ExternalDoclingDocumentLoader to properly load a DoclingDocument from doctags and the GT image. - Introduce the staticmethod load_doctags() which covers all cases on page image loading. - Refactor the FilePredictionProvider to use the load_doctags() from ExternalDoclingDocumentLoader. - Refactor all evaluators to use the new ExternalDoclingDocumentLoader. Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> * chore: Rename code file as external_docling_document_loader.py Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> * fix: Fix typo Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> * feat: Introduce examples how to evaluate using external predictions using the API and the CLI. Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> * feat: Prediction vizualizer Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * Update docling_eval/utils/external_predictions_visualizer.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Signed-off-by: Christoph Auer <60343111+cau-git@users.noreply.github.com> * feat: Update examples bash script to demonstrate visualisations on external predictions Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> --------- Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> Signed-off-by: Christoph Auer <cau@zurich.ibm.com> Signed-off-by: Christoph Auer <60343111+cau-git@users.noreply.github.com> Co-authored-by: Nikos Livathinos <nli@zurich.ibm.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	2025-12-09 14:47:43 +01:00
Nikos Livathinos	53dbd955ae	feat: Extend the evaluators to support external predictions stored in files (#185 ) * chore: Move the teds.py inside the subdir evaluators/table Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> * feat: Introduce the external_predictions_path in BaseEvaluator and dummy entries in all evaluators. Extend the CLI to support the --external-predictions-path Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> * feat: Extend test_dataset_builder.py to save document predictions in various formats Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> * feat: Extend MarkDownTextEvaluator to support external_predictions_path. Add unit test Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> * feat: Extend LayoutEvaluator to support external_predictions_path. Add unit test. Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> * fix: Add missing pytest dependencies in tests Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> * fix: Fix loading the external predictions in LayoutEvaluator Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> * feat: Introduce external predictions in DocStructureEvaluator. Add unit test. Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> * feat: Extend the TableEvaluator to support external predictions. Add unit test Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> * feat: Extend the KeyValueEvaluator to support external predictions. Add unit test. Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> * feat: Extend the PixelLayoutEvaluator to support external predictions. Add unit test Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> * feat: Extend the BboxTextEvaluator to support external predictions. Add unit test Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> * feat: Disable the OCREvaluator when using the external predictions Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> * fix: Fixing guard for external predictions in TimingsEvaluator, ReadingOrderEvaluator. Fix main Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> * fix: Export the doctag files with the correct file extension Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> * feat: Refactor the ExternalDoclingDocumentLoader to properly load a DoclingDocument from doctags and the GT image. - Introduce the staticmethod load_doctags() which covers all cases on page image loading. - Refactor the FilePredictionProvider to use the load_doctags() from ExternalDoclingDocumentLoader. - Refactor all evaluators to use the new ExternalDoclingDocumentLoader. Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> * chore: Rename code file as external_docling_document_loader.py Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> * fix: Fix typo Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> * feat: Introduce examples how to evaluate using external predictions using the API and the CLI. Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> --------- Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>	2025-12-08 16:51:45 +01:00
Nikos Livathinos	409bf9f27a	feat: Extend the Consolidator to export Latex files alongside the excel report (#143 ) * feat: Extend the consolidator to produce Latex files also. Fix the MultiEvaluator to accept loading json evalutions with the old format. Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> * fix: Style the generated Latex code to have & symbols vertically aligned Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> --------- Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>	2025-08-14 15:26:02 +02:00
Christoph Auer	c08950b496	perf: Improve parquet writing with plain pyarrow (#134 ) * perf: Improve parquet writing with plain pyarrow Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * Smaller fixes Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * Add pyarrow dep Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * Fix circular import Signed-off-by: Christoph Auer <cau@zurich.ibm.com> --------- Signed-off-by: Christoph Auer <cau@zurich.ibm.com>	2025-07-02 10:17:47 +02:00
Christoph Auer	629a451d7b	feat: Layout evaluation fixes, mode control and cleanup (#133 ) * Misc fixes Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * Make DatasetRecord tolerant to old parquet files Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * Make DatasetRecord tolerant to old parquet files (2) Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * Fix docvqa test, more cleanup Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * Important fixes for layout mAP computation Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * Adding modes for missing_prediction_strategy and label_filtering_strategy Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * Fixes for mismatched docs Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * Add F1 no_picture metrics to layout evaluator Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * Fixed commands on all READMEs Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * Remove extract_images ambiguity, use utility and fix errors on visualizer Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * Upgrade to latest docling_core Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * Fix ocrmac dep, upgrade uv.lock Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * Fix for tableformer provider Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * Remove code redundancy Signed-off-by: Christoph Auer <cau@zurich.ibm.com> --------- Signed-off-by: Christoph Auer <cau@zurich.ibm.com>	2025-07-01 10:02:59 +02:00
Christoph Auer	518e1ba342	fix: Misc fixes (#131 ) * Misc fixes Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * Make DatasetRecord tolerant to old parquet files Signed-off-by: Christoph Auer <cau@zurich.ibm.com> --------- Signed-off-by: Christoph Auer <cau@zurich.ibm.com>	2025-06-25 17:30:54 +02:00
samiuc	17e9fde84f	feat: Update OCREvaluator with additional metrics (#78 ) * Add README for Docling-DPBench Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * feat: Update OCREvaluator with additional metrics * fix: bug fix * add edit-distance lib * update pure ocr metrics * Establish SegmentedPage support in DatasetRecord and DatasetRecordWithPrediction Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * Add SegmentedPage usage to PixParse dataset provider Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * add pure ocr metrics * refactor: update dependencies * fix dependencies and build errors * feat: add optype and scipy-stubs packages * fix: fix type error * fix package name * fix bugs and add funsd ocr test * fix type error * finalize changes * fix build errors * fix: ignore edit_distance missing import * Add functionality to merge cells in Google OCR prediction (#103) * feat: add global_merge function in google prediction provider for word cell merging * address review comment * remove unused imports * address review comments and remove dictionary conversions --------- Co-authored-by: samiullahchattha <Sami.Ullah1@ibm.com> * refactor and address review comments * fix regression bug * refactor code and reduce metrics to three * make ocr classes private * fix type error * refactor: update geometry utils to use BoundingBox and TextCell Signed-off-by: samiullahchattha <Sami.Ullah1@ibm.com> * refactor: rename metrics variables for consistency and clarity Signed-off-by: samiullahchattha <Sami.Ullah1@ibm.com> * Update lock for docling-core Signed-off-by: Christoph Auer <cau@zurich.ibm.com> --------- Signed-off-by: Christoph Auer <cau@zurich.ibm.com> Signed-off-by: samiuc <sami.ullah.chat@gmail.com> Signed-off-by: samiullahchattha <Sami.Ullah1@ibm.com> Co-authored-by: Christoph Auer <cau@zurich.ibm.com> Co-authored-by: samiullahchattha <Sami.Ullah1@ibm.com>	2025-06-02 14:48:32 +02:00
Michele Dolfi	a469279ee3	ci: Refactor using uv for dependencies and add package CD (#113 ) * refactor for using uv Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * fix deprecated classifier Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * missing uv.lock Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * move xmltodict to deps Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> --------- Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>	2025-05-28 11:01:27 +02:00
Nikos Livathinos	42e16152c5	feat: Extend the FileProvider and the CLI to accept parameters that control the source of the prediction images (#111 ) * feat: Extend the FileProvider and the CLI to accept parameters that control the source of the prediction images. This is used in case of DocTags: - By default the images from GT will be used. - The user can provide an external image path. - Add documentation example how to evaluate doctag files. Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> * fix: FileProvider. Fix loading of the image file. Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> --------- Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>	2025-05-27 10:13:27 +02:00
Nikos Livathinos	04fe2d916f	feat: Improvements for the MultiEvaluator (#95 ) * fix: MultiEvaluator fix minor logging issue Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> * chore: Improve code comments Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> * feat: Refactor the MultiEvaluator to allow arbitrary experiment names for the benchmark subdirs. - In case there is no eval dataset, the experiment name must match a provider's name and this will be used to run the predictions. - In case there is eval dataset, the experiment name is just a tag and the information about the prediction provider will be extracted by the corresponding column of the parquet. - If there is not eval dataset and the experiment name does not match any prediction provider, an exception is raised. Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> * fix: MultiEvalutor rename the GT_LEAF_DIR and introduce the EVALUATIONS_DIR to make the dir structure created/used by MultiEvaluator the same with the ones created by the CLI Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> * fix: Change the pipeline settings of Docling to use 16 CPU threads. Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> * chore: MultiEvaluator improve logging Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> * fix: Fix the MultiEvaluator.load_multi_evaluation() to properly scan the multi evalution dir structure Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> * fix: Ensure to use all CPU cores for the DoclingPredictionProvider Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> --------- Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>	2025-05-15 17:00:33 +02:00
Christoph Auer	7903b6a1d9	feat: Add extra args for docling-provider and default annotations for CVAT (#98 ) * Add README for Docling-DPBench Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * Add CVAT annotation features, fix DatasetRecord.features usage Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * dev: Updates for CVAT and docling provider args Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * documentation for SmolDocling, fix artifacts_path Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * Update lock Signed-off-by: Christoph Auer <cau@zurich.ibm.com> --------- Signed-off-by: Christoph Auer <cau@zurich.ibm.com>	2025-05-14 17:47:15 +02:00
Peter W. J. Staar	54d013bc5e	feat: add area level f1 (#86 ) * added the area-level precision, recall and f1 Signed-off-by: Peter Staar <taa@zurich.ibm.com> * WIP: adding timing modality Signed-off-by: Peter Staar <taa@zurich.ibm.com> * updated the code with timings Signed-off-by: Peter Staar <taa@zurich.ibm.com> * added the timings modality Signed-off-by: Peter Staar <taa@zurich.ibm.com> * reformatted the code Signed-off-by: Peter Staar <taa@zurich.ibm.com> * fixed the test Signed-off-by: Peter Staar <taa@zurich.ibm.com> * ran the test_run_dpbench_tables with success Signed-off-by: Peter Staar <taa@zurich.ibm.com> * commented out test_run_dpbench_tables Signed-off-by: Peter Staar <taa@zurich.ibm.com> * reformatted code Signed-off-by: Peter Staar <taa@zurich.ibm.com> * found potential bug in base_prediction_provider Signed-off-by: Peter Staar <taa@zurich.ibm.com> * found potential bug in base_prediction_provider (2) Signed-off-by: Peter Staar <taa@zurich.ibm.com> * fixed the timings in base-predictor Signed-off-by: Peter Staar <taa@zurich.ibm.com> * removed prints and added logging-level for matplotlib Signed-off-by: Peter Staar <taa@zurich.ibm.com> * found bug in stats Signed-off-by: Peter Staar <taa@zurich.ibm.com> * updated the logging Signed-off-by: Peter Staar <taa@zurich.ibm.com> --------- Signed-off-by: Peter Staar <taa@zurich.ibm.com>	2025-04-29 12:27:29 +02:00
Peter W. J. Staar	1e2040a629	fix: propagate cvat parameters (#82 ) * propagated the CVAT parameters in the cli and updated the documentation Signed-off-by: Peter Staar <taa@zurich.ibm.com> * fixed the formatting Signed-off-by: Peter Staar <taa@zurich.ibm.com> * fix the export in PDF_Docling Signed-off-by: Peter Staar <taa@zurich.ibm.com> * fixed the PDF_Docling to parquet Signed-off-by: Peter Staar <taa@zurich.ibm.com> * fixed the visualisations and leveraged the new docling-core visualization capability Signed-off-by: Peter Staar <taa@zurich.ibm.com> * cleaned up the visualisation code Signed-off-by: Peter Staar <taa@zurich.ibm.com> * cleaned up the code and reformatted Signed-off-by: Peter Staar <taa@zurich.ibm.com> --------- Signed-off-by: Peter Staar <taa@zurich.ibm.com>	2025-04-25 15:32:28 +02:00
Nikos Livathinos	dee40e8f7d	feat: Consolidate multiple evaluation results and generate a comparison matrix (#64 ) * feat: Introduce the pred_modalities parameter in the BasePredictionProvider and its implementations Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> * feat: Refactor main:get_prediction_provider() to add parameter that controls the visualizations. Refactor the evaluate() to return the DatasetEvaluation as object. Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> * feat: Introduce the MultiEvaluator that can generate ground truth and prediction datasets and also compute the evalution across multiple providers and modalities. Add unit test. Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> * chore: Update toml dependencies to include pandas, openpyxl Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> * feat: Introduce staticmethod MultiEvaluator.load_multi_evaluation() to load multi-evaluations from the disk. Update unit tests. Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> * fix: Allow PENDING in the _accepted_status of BaseEvaluator Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> * fix: Modifications in the unit test of MultiEvaluator. Code clean up. Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> * feat: Introduce the Consolidator class that collects evaluation results and generates one excel report Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> * chore: Improve the header names of excel export Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> * feat: Extend the DatasetLayoutEvaluation with DatasetStatistics for all metrics Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> * feat: Extend the Consolidator to include the standard deviation for each metric Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> * chore: Update the pyproject.toml to pin to the docling branch that supports the RT-DETRv2 model Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> * feat: Extend the DatasetEvaluation to contain the evaluated and rejected samples. The rejected ones are itemized per rejection type. Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> * feat: Extend the Consolidator to include the samples (evaluated, rejected) in the generated excel Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> * feat: Refactor the directory structure for MultiEvaluator and Consolidator classes. - Refactor the generated excel matrix to include the experiment and provider columns. - Refactor the BasePredictionProvider and all providers to have class attributes for the prediction_provider_type and prediction_modalities. - Introduce CLI in the examples for the generation of the consolidation matrix. Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> * fix: Remove ConversionStatus.PENDING accepted status from BaseEvaluator Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> * fix: MultiEvaluator fix the load_multi_evaluation() Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> * feat: Add the class attributes for supported modalities in all prediction providers Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> * chore: Fix code typos Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> * Address predictor_info TODOs Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * fix: Add the test_multi_evaluator as a pytest dependency for test_consolidator Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> * Repin to docling release Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * Regenerate lock file Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * fix: Use defaultdict for rejected_samples Signed-off-by: Christoph Auer <cau@zurich.ibm.com> --------- Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> Signed-off-by: Christoph Auer <cau@zurich.ibm.com> Co-authored-by: Christoph Auer <cau@zurich.ibm.com>	2025-04-22 12:59:21 +02:00
Christoph Auer	14a038e05f	fix: Add CLI option for FileDatasetBuilder (#76 ) * Add README for Docling-DPBench Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * Add FileDatasetBuilder Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * Add test and fixes Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * Add FileDatasetBuilder to CLI Signed-off-by: Christoph Auer <cau@zurich.ibm.com> --------- Signed-off-by: Christoph Auer <cau@zurich.ibm.com>	2025-04-22 11:30:34 +02:00
laurachiticariu	732433b75a	chore: Corrected PubTabNet_benchmarks.md (#75 ) Add split vall to the other commands Signed-off-by: laurachiticariu <chiti@us.ibm.com>	2025-04-22 11:29:56 +02:00
laurachiticariu	c458dc5de1	Update PubTabNet_benchmarks.md (#74 ) * Update PubTabNet_benchmarks.md With the default `--split test`, the create-gt method throws an exception `Error creating dataset builder: Unknown split "test". Should be one of ['train', 'val'].`. Indeed, https://huggingface.co/datasets/ds4sd/PubTabNet_OTSL only has train and val splits. Given this, I believe using the `val` split is more suitable for this dataset. Signed-off-by: laurachiticariu <chiti@us.ibm.com> * Small typo Signed-off-by: laurachiticariu <chiti@us.ibm.com> * Update PubTabNet_benchmarks.md Signed-off-by: laurachiticariu <chiti@us.ibm.com> --------- Signed-off-by: laurachiticariu <chiti@us.ibm.com>	2025-04-21 07:08:15 -07:00
laurachiticariu	fe52432f84	Update FinTabNet_benchmarks.md (#72 ) Small typo Signed-off-by: laurachiticariu <chiti@us.ibm.com>	2025-04-18 07:02:41 -07:00
Christoph Auer	e3debd61d7	fix: Address missing conversion status (PENDING), add artifacts path, remove unused CLI args (#69 ) * Add README for Docling-DPBench Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * Restructured CVAT builder (WIP) Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * CVAT preannotation and dataset builders, with test cases Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * Add CLI, merge from main Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * Update README for CVAT Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * Add artifacts path option to CLI, several fixes Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * Remove raise Signed-off-by: Christoph Auer <cau@zurich.ibm.com> --------- Signed-off-by: Christoph Auer <cau@zurich.ibm.com>	2025-04-17 14:14:58 +02:00
Christoph Auer	28c2e1887b	feat: Refactor CVAT builder (#68 ) * Add README for Docling-DPBench Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * Restructured CVAT builder (WIP) Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * CVAT preannotation and dataset builders, with test cases Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * Add CLI, merge from main Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * Update README for CVAT Signed-off-by: Christoph Auer <cau@zurich.ibm.com> --------- Signed-off-by: Christoph Auer <cau@zurich.ibm.com>	2025-04-17 12:30:30 +02:00
Christoph Auer	ddf40241a9	Add README for Docling-DPBench (#60 ) Signed-off-by: Christoph Auer <cau@zurich.ibm.com>	2025-04-07 14:50:14 +02:00
Christoph Auer	a3d99b9f13	feat: Establish new API encapsulation for dataset creation and prediction providers (#30 ) * correct mpy Signed-off-by: Peter Staar <taa@zurich.ibm.com> * reformatting Signed-off-by: Peter Staar <taa@zurich.ibm.com> * adding the script to make an initial dataset from pdf's Signed-off-by: Peter Staar <taa@zurich.ibm.com> * before switching to specific docling-core branch Signed-off-by: Peter Staar <taa@zurich.ibm.com> * rebased on kv-items and updated the create script in CVAT Signed-off-by: Peter Staar <taa@zurich.ibm.com> * fixed the cvat Signed-off-by: Peter Staar <taa@zurich.ibm.com> * added the annotation description on CVAT Signed-off-by: Peter Staar <taa@zurich.ibm.com> * added the annotation description on CVAT (2) Signed-off-by: Peter Staar <taa@zurich.ibm.com> * added the annotation description on CVAT (3) Signed-off-by: Peter Staar <taa@zurich.ibm.com> * [WIP] Crafting new dataset builder and prediction provider API Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * Restructure to docling_eval_next Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * Fix mypy Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * Fix f-strings Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * Changes for prediction_provider interface, to support all cases. Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * Add omnidocbench DatasetBuilder Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * Add doclaynet v1, funsd Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * Fixes Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * Add XFUND, more fixes Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * update the kv cell creation to prevent false positives Signed-off-by: Saidgurbuz <said.gurbuz@epfl.ch> * chore: Fixing imports Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> * chore: Update docling-core version Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> * feat: Introduce new design for Evaluators based on BaseEvaluator that accept external predictions. And utility adapters. Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> * Factor PredictionProvider out of dataset builder, many fixes on DatasetRecord Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * Sketch example for file-directory prediction provider Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * chore: Fix typing hints Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> * chore: Update poetry to doclign-core 2.24.0 Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> * feat: WIP: Introduce the FilePredictionProvider that reads files with predictions from the disk - It currently supports doctags, markdown, json, yaml formats. - We still need to improve the returned type so that it allows for no DoclingDocument but only for the source data (e.g. in case of markdown). Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> * Add DocLayNetV2DatasetBuilder Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * Added TableDatasetBuilder and test, update TableFormerPredictionProvider Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * chore: Update MyPy configuration in toml Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> * feat: Refactor the BasePredictionProvider.predict() to return DatasetRecordWithPrediction Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> * Fixes Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * fix: Fix the FilePredictionProvider. Return None in the predicted document in case of Markdown. Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> * fix: Remove the kwargs from all PredictonProvider classes and introduce provider specific initialization arguments Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> * feat: Introduce the parameter "ignore_missing_files" in FilePredictionProvider Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> * Add do_visualization to PredictionProvider Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * Move next-gen API to main source tree, re-organize module paths Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * Fixes Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * Cleanup, change path handling Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * Cleanup, change path handling Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * More module removal and renaming Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * Small test fixes Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * fix: Add the "prediction_format" in the serialization of DatasetRecordWithPrediction Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> * feat: Refactor the MarkdownTextEvaluator to support the new classes design. Add unit test. Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> * fix: Improve the new design of MarkdownEvaluator to move common functionalities into the base class Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> * feat: Refactor the LayoutEvaluator to use the new class design. Add unit test. Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> * fix: Clean up LayoutEvaluator code Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> * chore: Implementation cleanup and fixes for new class design (#52) * More module removal and renaming Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * Small test fixes Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * Small test fixes Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * Cleanup of tests and more fixes Signed-off-by: Christoph Auer <cau@zurich.ibm.com> --------- Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * Add visualization for tables Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * Add visualization for all tests Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * Fixes for test files, FilePredictionProvider changes Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * Put new CLI Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * Cleanup Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * Rename CLI Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * Update all README with new commands. Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * Remove old examples Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * Several Fixes Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * README updates Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * Add gt_dir arg to create-eval, README fixes Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * Fixes, pass tests Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * feat: Refactor the TableEvaluator to use the new class design. Move common evaluator code to BaseEvaluator. Add more unit tests. Introduce pytest dependencies. Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> * Update lockfile Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * Update lockfile Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * Make pytest CI output more verbose Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * feat: Refactor the ReadingOrderEvaluator to use the new class design. Remove the BaseReadingOrderEvaluator. Add unit test. Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> * Optimize GT downloading behaviour Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * Add file sources Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * Allow pytest output on CI Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * Disable tests in CI Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * Reenable tests in CI Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * Add correct @pytest.mark.dependency() Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * feat: Introduce TypeVars for the UnitEvaluation and DatasetEvaluation used by the BaseEvaluator. Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> * Minimize tests in CI Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * feat: Refactor BboxTestEvaluator to use the new design. Introduce unit test. Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> * Remove streaming in DocLaynet v1 Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * Add back test dependency Signed-off-by: Christoph Auer <cau@zurich.ibm.com> --------- Signed-off-by: Peter Staar <taa@zurich.ibm.com> Signed-off-by: Christoph Auer <cau@zurich.ibm.com> Signed-off-by: Saidgurbuz <said.gurbuz@epfl.ch> Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> Co-authored-by: Peter Staar <taa@zurich.ibm.com> Co-authored-by: Saidgurbuz <said.gurbuz@epfl.ch> Co-authored-by: Nikos Livathinos <nli@zurich.ibm.com>	2025-04-01 13:04:03 +02:00
Maxim Lysak	ff2c9c5936	chore: Readme picture (#49 ) * Added picture to the README Signed-off-by: Maksym Lysak <mly@zurich.ibm.com> * Adjusted picture size in readme Signed-off-by: Maksym Lysak <mly@zurich.ibm.com> * Slightly rotated picture for better alignment Signed-off-by: Maksym Lysak <mly@zurich.ibm.com> --------- Signed-off-by: Maksym Lysak <mly@zurich.ibm.com> Co-authored-by: Maksym Lysak <mly@zurich.ibm.com>	2025-03-24 13:07:41 +01:00
Nikos Livathinos	dd4920a34d	feat: Externally provided predictions for LayoutEvaluator, TableEvaluator. Support --begin-index, --end-index, --debug parameters. Code clean up. (#39 ) * feat: Refactor main to remove --max-items and replace with --begin-index, --end-index. Add --debug. Provide implementation for create_dlnv1_e2e_dataset() Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> * docs: Update Readme Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> * fix: DLNv1 create: Reduce shard size to 100. Fix debug error when not in SmolDocling. Improve logging Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> * chore: Delete unused file Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> * chore: Working on logging Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> * chore: Code clean up Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> * feat: Extend the LayoutEvaluator to accept external dictionary with DoclingDocuments Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> * fix: DLNv1 create: Introduce hardcoded list of blacklisted doc_ids Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> * feat: Extend the TabelEvalautor to accept optional dict with predicted DoclingDocument objects keyed by the doc_id Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> * fix: Fix DatsetStatistics:to_table() in case there is no data, to avoid division by zero. Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> * fix: Fix bug in the visualizations:save_comparison_html_with_clusters() Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> * Add label_mapping feature, penalize empty predictions instead of skipping. Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * chore: Pin docling to dev/new-tf-weights Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> * fix: Update the FTN-OTSL and P1M-OTSL to v1.1 from our HF datasets Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> * fix: Ensure that the table bboxes are not negative and within the page size Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> * fix: In case pred_dict is given to TableEvaluator remove the ".png", ".jpg" suffixes from doc_id before trying to match Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> * fix: Protect the prediction in TableEvaluator within try..except Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> * feat: Introduce the parameter structure_only in TableEvaluator Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> * feat: Add optional parameter intermediate_results_dir in TableEvaluator Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> * fix: TableEvaluator: Fixs in save_table_evaluations() Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> * chore: Update to the latest docling 2.26.0 Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> * feat: Ensure that all create methods support the `begin_index`, `end_index` parameters. - DPBench. - OmniDocBench. - DocLayNetv1. - Table datasets (FinTabNet, PubTabNet, Pub1M). Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> --------- Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> Signed-off-by: Christoph Auer <cau@zurich.ibm.com> Co-authored-by: Christoph Auer <cau@zurich.ibm.com>	2025-03-18 13:22:56 +01:00
Saidgurbuz	b6f7d0b02f	feat: XFUND Dataset Creation (#44 ) * add XFUND dataset creation script Signed-off-by: Saidgurbuz <said.gurbuz@epfl.ch> * add create_xfund.py Signed-off-by: Saidgurbuz <said.gurbuz@epfl.ch> * update implementation with split Signed-off-by: Saidgurbuz <said.gurbuz@epfl.ch> * fix the bug in download Signed-off-by: Saidgurbuz <said.gurbuz@epfl.ch> * fix download path Signed-off-by: Saidgurbuz <said.gurbuz@epfl.ch> --------- Signed-off-by: Saidgurbuz <said.gurbuz@epfl.ch>	2025-03-17 14:53:26 +01:00
Yusik Kim	cfd91ac10f	feat: Add Doclaynet v2 dataset creation (#12 ) * doclaynet v2 Signed-off-by: Yusik Kim <kmyusk@gmail.com> * True doc bounding boxes to bottom left origin Signed-off-by: Yusik Kim <kmyusk@gmail.com> * add reading order to benchmark script Signed-off-by: Yusik Kim <kmyusk@gmail.com> * make true and pred doc names the same Signed-off-by: Yusik Kim <kmyusk@gmail.com> * bound mem usage through limiting single shard size Signed-off-by: Yusik Kim <kmyusk@gmail.com> * add key-value items to the DoclingDocument Signed-off-by: Saidgurbuz <said.gurbuz@epfl.ch> Signed-off-by: Yusik Kim <kmyusk@gmail.com> * adopt v1 format to v2 Signed-off-by: Yusik Kim <yusik.kim@ibm.com> Signed-off-by: Yusik Kim <kmyusk@gmail.com> * update implementation of kv items based on structure change, and add prov item Signed-off-by: Saidgurbuz <said.gurbuz@epfl.ch> Signed-off-by: Yusik Kim <kmyusk@gmail.com> * fix version and format issues Signed-off-by: Saidgurbuz <said.gurbuz@epfl.ch> Signed-off-by: Yusik Kim <kmyusk@gmail.com> * handle empty prov, update benchmark script Signed-off-by: Yusik Kim <kmyusk@gmail.com> * adapt create.py to KeyValue structure change Signed-off-by: Saidgurbuz <said.gurbuz@epfl.ch> Signed-off-by: Yusik Kim <kmyusk@gmail.com> * add rule-based method, classify_cells, to assign labels to GraphCells Signed-off-by: Saidgurbuz <said.gurbuz@epfl.ch> Signed-off-by: Yusik Kim <kmyusk@gmail.com> * add visualize_docling_document method to generate highlighted image for all item types with options Signed-off-by: Saidgurbuz <said.gurbuz@epfl.ch> Signed-off-by: Yusik Kim <kmyusk@gmail.com> * Revert "add visualize_docling_document method to generate highlighted image for all item types with options" This reverts commit 267918b59f892513ca633bbd1e7a31a0a45acc60. Signed-off-by: Saidgurbuz <said.gurbuz@epfl.ch> Signed-off-by: Yusik Kim <kmyusk@gmail.com> * update method name Signed-off-by: Saidgurbuz <said.gurbuz@epfl.ch> Signed-off-by: Yusik Kim <kmyusk@gmail.com> * fix: max_items loop break Signed-off-by: Yusik Kim <yusik.kim@ibm.com> Signed-off-by: Yusik Kim <kmyusk@gmail.com> * fix tqdm total Signed-off-by: Yusik Kim <yusik.kim@ibm.com> Signed-off-by: Yusik Kim <kmyusk@gmail.com> * rebase main utils.py Signed-off-by: Saidgurbuz <said.gurbuz@epfl.ch> Signed-off-by: Yusik Kim <kmyusk@gmail.com> * chore: rebased and refactored Signed-off-by: Yusik Kim <yusik.kim@ibm.com> Signed-off-by: Yusik Kim <kmyusk@gmail.com> * fix: modalities for DLN Signed-off-by: Yusik Kim <kmyusk@gmail.com> * remove unrelated code Signed-off-by: Saidgurbuz <said.gurbuz@epfl.ch> Signed-off-by: Yusik Kim <kmyusk@gmail.com> * fix: ocr options, use image converter Signed-off-by: Yusik Kim <kmyusk@gmail.com> * change max_items back to 1000 Signed-off-by: Yusik Kim <yusik.kim@ibm.com> * precommit black Signed-off-by: Yusik Kim <yusik.kim@ibm.com> --------- Signed-off-by: Yusik Kim <kmyusk@gmail.com> Signed-off-by: Saidgurbuz <said.gurbuz@epfl.ch> Signed-off-by: Yusik Kim <yusik.kim@ibm.com> Co-authored-by: Saidgurbuz <said.gurbuz@epfl.ch> Co-authored-by: Yusik Kim <yusik.kim@ibm.com>	2025-03-17 14:53:08 +01:00
Nikos Livathinos	ddae1ec966	fix: Fix the modalities for DPBench, OmniDocBench, DLNv1. Switch to new settings in SmolDocling API. Improve the documentation. (#37 ) * chore: Change the pinning of docling Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> * fix: Fix the modalities supported for DPBench, OmniDocBench, DLNv1. Clean up code. Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> * docs: Update documentation to have all benchmarks in separate md files and place links in Readme. Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> * fix: Change the initialization of the create_smol_docling_converter() to allow flash-attn Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> * docs: List benchmarks in the main readme with short description. Fix broken links in the documentation. Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> * docs: Fix broken link in Readme. Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> * chore: Update lock file Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> * fix: Add debug code to dump the predicted text in create_dlnv1_e2e_dataset() Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> * chore: Update toml to pin docling with branch and extras Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> * fix: Disable the generation of VLM text debugging files for DLNv1 benchmark Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> * chore: Update toml to docling v2.25.0 with vln extra Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> --------- Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>	2025-02-26 15:50:02 +01:00
Nikos Livathinos	df7e403b86	fix: Refactor docling-eval to improve the overall codebase structure (#36 ) * chore: Rename `docling/` dir as converters. Introduce `visualization/` dir. Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> * chore: Remove unused imports and other code formatting Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> * chore: Remove the `utils/` dir, delete unused files and move used code in appropriate locations Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> * chore: Introduce the file visualisation/visualisations.py and move there functions from benchmarks/utils.py Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> * chore: Update MyPy configuration in toml to override tqdm module Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> * chore: Clean up commented code Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> * chore: Add CONVERTER_TYPE and MODALITIES columns to all produced datasets Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> * chore: Update pinning of docling Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> * fix: Code refactoring: - Move converters/teds.py into evaluators/teds.py - Move all functions from converters/utils.py into benchmarks/utils.py. - Rename create_xxx_converter() functions. - Rename BenchMarkColumns.DOCLING_VERSION as BenchMarkColumns.CONVERTER_VERSION Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> --------- Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>	2025-02-25 12:45:51 +01:00
Nikos Livathinos	71789ac20c	feat: Enable the usage of SmolDocling VLM document converter. Introduce CLI parameter --converter_type (#34 ) * feat: Introduce create_vln_converter() that uses SmolDocling. Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> * feat: Introduce the converter_type parameter in the CLI and initialize the converter appropriately to use Docling or SmolDocling. Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> * feat: Add the column CONVERTER_TYPE in the produced dataset. Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> * fix: Fixing the benchmarks/utils/save_comparison_html_with_clusters() to skip docitems without provenances Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> * chore: Rename conversion/create_converter() as create_docling_converter() Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> * chore: Replace prints with logger in doclaynet_v1/create.py Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> * chore: Update pinning of docling to a certain commit for smoldocling Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> * fix: Fixing missing imports. Code formatting. Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> --------- Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>	2025-02-24 15:54:19 +01:00
Christoph Auer	8a6d2fce9d	feat: Add ReadingOrderEvaluator for new reading-order model (#29 ) * correct mpy Signed-off-by: Peter Staar <taa@zurich.ibm.com> * reformatting Signed-off-by: Peter Staar <taa@zurich.ibm.com> * adding the script to make an initial dataset from pdf's Signed-off-by: Peter Staar <taa@zurich.ibm.com> * before switching to specific docling-core branch Signed-off-by: Peter Staar <taa@zurich.ibm.com> * rebased on kv-items and updated the create script in CVAT Signed-off-by: Peter Staar <taa@zurich.ibm.com> * fixed the cvat Signed-off-by: Peter Staar <taa@zurich.ibm.com> * added the annotation description on CVAT Signed-off-by: Peter Staar <taa@zurich.ibm.com> * added the annotation description on CVAT (2) Signed-off-by: Peter Staar <taa@zurich.ibm.com> * added the annotation description on CVAT (3) Signed-off-by: Peter Staar <taa@zurich.ibm.com> * feat: Add new reading-order model evaluator, re-factor to BaseReadingOrderEvaluator Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * Rename New Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * fix: Fix calling DoclingDocument methods. Pin latest versions of docling and docling-core. Remove commented out code, remove unused imports, code formatting. Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> * docs: Re-run the evaluations for DPBench and update the docs Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> * docs: Re-run the evaluations for OmniDocBench and update the docs Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> * docs: Re-run the evaluations for DPBench and update the docs Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> * docs: Re-run the evaluations for OmniDocBench and update the docs Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> * docs: Minor improvements in DPBench OmniDocBench documentation Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> --------- Signed-off-by: Peter Staar <taa@zurich.ibm.com> Signed-off-by: Christoph Auer <cau@zurich.ibm.com> Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> Co-authored-by: Peter Staar <taa@zurich.ibm.com> Co-authored-by: Nikos Livathinos <nli@zurich.ibm.com>	2025-02-21 17:18:33 +01:00
Saidgurbuz	a12b47ad11	feat: Funsd Dataset Creation (#31 ) * add doclingdocument version of funsd dataset creation script Signed-off-by: Saidgurbuz <said.gurbuz@epfl.ch> * add base create script for funsd dataset Signed-off-by: Saidgurbuz <said.gurbuz@epfl.ch> * update BenchmarkNames and format Signed-off-by: Saidgurbuz <said.gurbuz@epfl.ch> * fix styling Signed-off-by: Saidgurbuz <said.gurbuz@epfl.ch> --------- Signed-off-by: Saidgurbuz <said.gurbuz@epfl.ch>	2025-02-19 14:32:04 +01:00
Christoph Auer	d3c8a13c2c	Fix old usage of EvaluationModality.TABLEFORMER (#32 ) Signed-off-by: Christoph Auer <cau@zurich.ibm.com>	2025-02-19 11:49:36 +01:00
Peter W. J. Staar	68ea3cd2ca	feat: Improve CVAT dataset builder and add end-to-end documentation (#25 ) * correct mpy Signed-off-by: Peter Staar <taa@zurich.ibm.com> * reformatting Signed-off-by: Peter Staar <taa@zurich.ibm.com> * adding the script to make an initial dataset from pdf's Signed-off-by: Peter Staar <taa@zurich.ibm.com> * before switching to specific docling-core branch Signed-off-by: Peter Staar <taa@zurich.ibm.com> * rebased on kv-items and updated the create script in CVAT Signed-off-by: Peter Staar <taa@zurich.ibm.com> * fixed the cvat Signed-off-by: Peter Staar <taa@zurich.ibm.com> * added the annotation description on CVAT Signed-off-by: Peter Staar <taa@zurich.ibm.com> * added the annotation description on CVAT (2) Signed-off-by: Peter Staar <taa@zurich.ibm.com> * added the annotation description on CVAT (3) Signed-off-by: Peter Staar <taa@zurich.ibm.com> * Update lock Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * Fixes for CVAT Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * Disable Examples in CI Signed-off-by: Christoph Auer <cau@zurich.ibm.com> --------- Signed-off-by: Peter Staar <taa@zurich.ibm.com> Signed-off-by: Christoph Auer <cau@zurich.ibm.com> Co-authored-by: Christoph Auer <cau@zurich.ibm.com>	2025-02-19 10:30:45 +01:00
Nikos Livathinos	cba0e30f69	feat: Extend CLI to allow loading the full split with the -n parameter set to -1. (#26 ) Add the MODALITIES field in constants. Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>	2025-02-13 09:58:46 +01:00
Yusik Kim	3eb4b49361	DocLayNet V1 (#19 ) * doclaynet v1 create.py Signed-off-by: Yusik Kim <kmyusk@gmail.com> * added benchmark example for doclaynet v1 Signed-off-by: Yusik Kim <kmyusk@gmail.com> * parameterize input data split Signed-off-by: Yusik Kim <kmyusk@gmail.com> * fix typo Signed-off-by: Yusik Kim <kmyusk@gmail.com> * use test set for benchmark Signed-off-by: Yusik Kim <kmyusk@gmail.com> * fix benchmark script path Signed-off-by: Yusik Kim <kmyusk@gmail.com> * fix ground truth bbox Signed-off-by: Yusik Kim <kmyusk@gmail.com> * add MARKDOWN_TEXT eval to doclaynet benchmark Signed-off-by: Yusik Kim <kmyusk@gmail.com> * follow benchmark output dir convention Signed-off-by: Yusik Kim <kmyusk@gmail.com> * doclaynet v1 eval results Signed-off-by: Yusik Kim <kmyusk@gmail.com> * fix last shard id Signed-off-by: Yusik Kim <kmyusk@gmail.com> * add tables and debug viz Signed-off-by: Yusik Kim <kmyusk@gmail.com> * use pdf as conversion source Signed-off-by: Yusik Kim <kmyusk@gmail.com> * small fix Signed-off-by: Yusik Kim <kmyusk@gmail.com> * updated the create script for doclaynet-v1 Signed-off-by: Peter Staar <taa@zurich.ibm.com> * chore: Update the version of docling-core in pyproject.toml Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> * feat: Make the visualization of the reading order during the creation of the datasets optional. WIP Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> * fix: Introduce CLI parameter to pass the max-items. Fix the visualizaton on DLNv1 to colorise the clusters and remove the reading-order. Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> * fix: Visualize correctly the true and pred document. Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> * feat: Introduce parameter artifacts_path in CLI to allow passing external files of models. The parameter is propagated to docling PdfPipelineOptions Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> * fix: Fix issues when passing custom artifact paths Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> * feat: Bring back the drawing of the reading order as an optional feature during the creation of the dataset Add DLNv1 create modality in the CLI main. Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> * fix: Use the HF repo to download DLNv1. Remove the main from the `create()` methods. Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> * fix: Fix a split issue in the DLNv1 create Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> * fix: Fix bug in DLNv1 create. Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> * fix: Rename docling-DocLayNet-v1.1 to DocLayNet-v1.2 Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> --------- Signed-off-by: Yusik Kim <kmyusk@gmail.com> Signed-off-by: Peter Staar <taa@zurich.ibm.com> Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> Co-authored-by: Peter Staar <taa@zurich.ibm.com> Co-authored-by: Nikos Livathinos <nli@zurich.ibm.com>	2025-02-10 17:29:43 +01:00
Nikos Livathinos	8dee74c09f	fix: Refactor code to support the TableFormerMode as an input parameter for TableFormerUpdater (#22 ) * fix: Many code refactorings to support TableFormerMode (by default ACCURATE). Additonally: - Clean up the overall implementation of TableFormerUpdater and set the AcceleratorOptions. - Extend CLI to allow the creation of table datasets (PTN, FTN, P1M). Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> * docs: WIP: Updating the Readme with benchmarks for the FTN, PTN, P1M. Move the DP-Bench and OmniDocBench in separate files inside docs/ and provide links in Readme Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> * docs: Add evaluation files inside docs/ with json/png files for FTN, PTN, P1M Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> * docs: Fix the images links for DP-Bench, OmniDocBench docs Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> * docs: Refactor the Readme to have evaluation data and results for the table datasets FTN, PTN, Pub1M Add code snippets how to run the evalutions and visualizations for all datasets. Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> * docs: Fix broken links in Readme Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> --------- Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>	2025-02-03 09:17:57 +01:00
Nikos Livathinos	a08741e80b	feat: Ensure that the split is respected when loading/evaluating the dataset: - Introduce the 'split' optional argument in CLI. - Refactor main.evaluate, main.visualize to receive the split. - Refactor the LayoutEvaluator, TableEvaluator to load the given split. Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>	2025-01-29 15:07:28 +01:00
Nikos Livathinos	c65238d7ef	fix: Fix the docs/examples that create/evaluate/visualize the tableformer datasets for PTN, FTN, P1M Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>	2025-01-29 10:53:51 +01:00
Peter W. J. Staar	d72fa112f8	feat: allow-for-buckets-in-cvat (#11 ) * allow-for-buckets-in-cvat Signed-off-by: Peter Staar <taa@zurich.ibm.com> * refactored the CVAT create script Signed-off-by: Peter Staar <taa@zurich.ibm.com> * refactoring the CVAT pre-annotate and create Signed-off-by: Peter Staar <taa@zurich.ibm.com> * added eval for cvat-annotations Signed-off-by: Peter Staar <taa@zurich.ibm.com> * reformatted the code Signed-off-by: Peter Staar <taa@zurich.ibm.com> --------- Signed-off-by: Peter Staar <taa@zurich.ibm.com>	2025-01-22 09:36:20 +01:00
Nikos Livathinos	d04d016843	feat: Refactor the reading order evaluation to skip document items that have multiple provenances (#10 ) - fix: Refactor/improve the code to save log files with evaluation tables and png files with the plots and ensure to produce all the evaluations/visualizations in the docs/examples/benchmark_xxx.py files - Introduce optional parameter in create methods for DP-Bench and OmniDocBench to generate visualizations. - Update the evaluation files (json/txt/png) in docs/evaluations per dataset. Update Readme. - Update Readme with the OmniDocBench evaluation/visualization files - Poetry: Move to docling 2.15.1 --------- Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>	2025-01-21 14:58:17 +01:00
Nikos Livathinos	3569e75b32	feat: Add ReadingOrder and Markdown text evaluation (#8 ) * chore: Add tqdm in the dependencies * chore: Move the DatasetStats outside of table_evaluator into the utils/stats.py * feat: ReadingOrderEvaluator: Full implementation with Average Relative Distance metric * fix: Add reading_order in the visualise() method of main * fix: utils/stats.py: Add the metric name as a parameter. Clean up code * chore: Add reading_order evaluation and visualization in the examples for dp-bench and omnidocbench Add the doc_id in the evaluation report * feat: MarkdownTextEvaluator: Introduce text evaluation based on markdown export of DoclingDocument. Use BLEU metric * feat: Add ReadingOrderVisualizer and use it in the main * chore: Add pillow lib to the poetry * fix: ReadingOrderEvaluator: Convert the bboxes in bottom-left origin before calling the reading-order * chore: Update poetry lock * fix: Refactor to move the evaluator statistics in a separate file evaluators/stats.py. Decouple the code to draw arrows in a separate function inside utils.py Delete unused code. Fix mypy issues. * chore: Update Readme to include the evaluations and visualizations for the "reading-order" and the "markdown-text" modalities. * fix: Refactor the stats.py:save_historgram() to receive generic name for the plot Generate histogram plots for the reading_order and markdown_text visualizations Update Readme statistics for the reading_order and markdown_text modalities. * feat: ReadingOrder: Implement weighted ARD where the weight is based on the bbox size * chore: Update Readme with ARD and weighted ARD and histograms --------- Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>	2025-01-17 09:45:14 +01:00
Peter W. J. Staar	4bf28de7ed	feat: Adding script to prepopulate CVAT and create GT annotations from CVAT annotation files (#9 ) * adding script to prepopulate CVAT and create GT annotations from CVAT annotation files Signed-off-by: Peter Staar <taa@zurich.ibm.com> * it works Signed-off-by: Peter Staar <taa@zurich.ibm.com> * Fixed the DPBench with the refactoring Signed-off-by: Peter Staar <taa@zurich.ibm.com> * refactored the cvat annotations in preannotate and create script Signed-off-by: Peter Staar <taa@zurich.ibm.com> * updated the code to export layout after re-annotated the DP-Bench dataset Signed-off-by: Peter Staar <taa@zurich.ibm.com> * fixed some nasty bugs Signed-off-by: Peter Staar <taa@zurich.ibm.com> * adding documentation files for CVAT annotation Signed-off-by: Peter Staar <taa@zurich.ibm.com> * updated code Signed-off-by: Peter Staar <taa@zurich.ibm.com> * major updates Signed-off-by: Peter Staar <taa@zurich.ibm.com> * work-in-progress Signed-off-by: Peter Staar <taa@zurich.ibm.com> * working on reformatting and getting mypy alignment Signed-off-by: Peter Staar <taa@zurich.ibm.com> * fixed the mypy errors Signed-off-by: Peter Staar <taa@zurich.ibm.com> * moved the code to cvat_annotation Signed-off-by: Peter Staar <taa@zurich.ibm.com> * updated the png packaging Signed-off-by: Peter Staar <taa@zurich.ibm.com> --------- Signed-off-by: Peter Staar <taa@zurich.ibm.com>	2025-01-15 15:43:56 +01:00
Peter W. J. Staar	6bc9140325	Add omnidocbench, many optimizations (#4 ) * adding the omnidocbench benchmarkl Signed-off-by: Peter Staar <taa@zurich.ibm.com> * added the table-parsing in omnidocbench Signed-off-by: Peter Staar <taa@zurich.ibm.com> * finished the OmniDocBench implementation Signed-off-by: Peter Staar <taa@zurich.ibm.com> * reformatted the code Signed-off-by: Peter Staar <taa@zurich.ibm.com> * updated the README Signed-off-by: Peter Staar <taa@zurich.ibm.com> * updated the README and the cli Signed-off-by: Peter Staar <taa@zurich.ibm.com> * clean up the DP-Bench example Signed-off-by: Peter Staar <taa@zurich.ibm.com> * made the DPBench and OmniDocBench follow the same example code Signed-off-by: Peter Staar <taa@zurich.ibm.com> * reformatted the code Signed-off-by: Peter Staar <taa@zurich.ibm.com> * cleaned up the dp-bench create script Signed-off-by: Peter Staar <taa@zurich.ibm.com> * added the ability to see the clusters and reading order for layout Signed-off-by: Peter Staar <taa@zurich.ibm.com> * reformatted the code Signed-off-by: Peter Staar <taa@zurich.ibm.com> * working on making datasets from pdf collections Signed-off-by: Peter Staar <taa@zurich.ibm.com> * added the package_pdfs example Signed-off-by: Peter Staar <taa@zurich.ibm.com> * added the FinTabNet-OTSL benchmark Signed-off-by: Peter Staar <taa@zurich.ibm.com> * reformatted the code Signed-off-by: Peter Staar <taa@zurich.ibm.com> * added the fintabnet example evaluation Signed-off-by: Peter Staar <taa@zurich.ibm.com> * updated the README fort FinTabNet Signed-off-by: Peter Staar <taa@zurich.ibm.com> * updated the README Signed-off-by: Peter Staar <taa@zurich.ibm.com> * refactored the table evaluations Signed-off-by: Peter Staar <taa@zurich.ibm.com> * added the text inclusion in the table prediction Signed-off-by: Peter Staar <taa@zurich.ibm.com> * fixed the header of the HTML Signed-off-by: Peter Staar <taa@zurich.ibm.com> * reformatted the code Signed-off-by: Peter Staar <taa@zurich.ibm.com> * fix: Formatting and unused code cleanup Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * feat: Extend the CLI to create the OMNIDOCBENCH datasets for the layout and tableformer modalities Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> * Added exit to benchmark end-to-end scripts in case git-lfs is not installed (#5) Signed-off-by: Maksym Lysak <mly@zurich.ibm.com> Co-authored-by: Maksym Lysak <mly@zurich.ibm.com> * fix: Use TableStructureModel from docling, use backends, fix boundingbox coordinates Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * Reinstate layout test on dpbench Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * Comments Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * Comments Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * Remove unused code Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * Remove more unused code Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * Fixes for Omnidoc Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * Fixes for layout eval bounding boxes Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * More fixes for OmniDoc, README updates Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * More fixes for OmniDoc, README updates Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * Replace git-lsf with HF snapshot_download Signed-off-by: Christoph Auer <cau@zurich.ibm.com> --------- Signed-off-by: Peter Staar <taa@zurich.ibm.com> Signed-off-by: Christoph Auer <cau@zurich.ibm.com> Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> Signed-off-by: Maksym Lysak <mly@zurich.ibm.com> Co-authored-by: Christoph Auer <cau@zurich.ibm.com> Co-authored-by: Nikos Livathinos <nli@zurich.ibm.com> Co-authored-by: Maxim Lysak <101627549+maxmnemonic@users.noreply.github.com> Co-authored-by: Maksym Lysak <mly@zurich.ibm.com>	2025-01-07 12:52:29 +01:00
Peter Staar	6337b29baa	forgot the end-2-end script Signed-off-by: Peter Staar <taa@zurich.ibm.com>	2024-12-21 11:29:25 +01:00
Peter Staar	05c42f175d	updated the README (3) Signed-off-by: Peter Staar <taa@zurich.ibm.com>	2024-12-20 14:13:00 +01:00
Peter Staar	0a6829d2ee	updated the README Signed-off-by: Peter Staar <taa@zurich.ibm.com>	2024-12-20 14:09:20 +01:00

47 Commits