Files
Peter W. J. Staar 6bc9140325 Add omnidocbench, many optimizations (#4)
* adding the omnidocbench benchmarkl

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* added the table-parsing in omnidocbench

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* finished the OmniDocBench implementation

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* reformatted the code

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* updated the README

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* updated the README and the cli

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* clean up the DP-Bench example

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* made the DPBench and OmniDocBench follow the same example code

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* reformatted the code

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* cleaned up the dp-bench create script

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* added the ability to see the clusters and reading order for layout

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* reformatted the code

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* working on making datasets from pdf collections

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* added the package_pdfs example

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* added the FinTabNet-OTSL benchmark

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* reformatted the code

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* added the fintabnet example evaluation

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* updated the README fort FinTabNet

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* updated the README

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* refactored the table evaluations

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* added the text inclusion in the table prediction

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* fixed the header of the HTML

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* reformatted the code

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* fix: Formatting and unused code cleanup

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* feat: Extend the CLI to create the OMNIDOCBENCH datasets for the layout and tableformer modalities

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* Added exit to benchmark end-to-end scripts in case git-lfs is not installed (#5)

Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
Co-authored-by: Maksym Lysak <mly@zurich.ibm.com>

* fix: Use TableStructureModel from docling, use backends, fix boundingbox coordinates

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Reinstate layout test on dpbench

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Comments

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Comments

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Remove unused code

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Remove more unused code

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Fixes for Omnidoc

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Fixes for layout eval bounding boxes

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* More fixes for OmniDoc, README updates

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* More fixes for OmniDoc, README updates

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Replace git-lsf with HF snapshot_download

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

---------

Signed-off-by: Peter Staar <taa@zurich.ibm.com>
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
Co-authored-by: Christoph Auer <cau@zurich.ibm.com>
Co-authored-by: Nikos Livathinos <nli@zurich.ibm.com>
Co-authored-by: Maxim Lysak <101627549+maxmnemonic@users.noreply.github.com>
Co-authored-by: Maksym Lysak <mly@zurich.ibm.com>
2025-01-07 12:52:29 +01:00

0 lines
0 B
Python