mirror of
https://github.com/docling-project/docling-eval.git
synced 2026-05-17 13:10:47 +00:00
ddae1ec966
* chore: Change the pinning of docling Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> * fix: Fix the modalities supported for DPBench, OmniDocBench, DLNv1. Clean up code. Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> * docs: Update documentation to have all benchmarks in separate md files and place links in Readme. Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> * fix: Change the initialization of the create_smol_docling_converter() to allow flash-attn Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> * docs: List benchmarks in the main readme with short description. Fix broken links in the documentation. Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> * docs: Fix broken link in Readme. Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> * chore: Update lock file Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> * fix: Add debug code to dump the predicted text in create_dlnv1_e2e_dataset() Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> * chore: Update toml to pin docling with branch and extras Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> * fix: Disable the generation of VLM text debugging files for DLNv1 benchmark Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> * chore: Update toml to docling v2.25.0 with vln extra Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> --------- Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>
159 lines
9.2 KiB
Plaintext
159 lines
9.2 KiB
Plaintext
DocLayNetV1 size: 4999
|
|
|
|
DocLayNetV1 markdown_text BLEU: mean=0.71 median=0.84 std=0.30
|
|
|
|
| BLEU | prob [%] | acc [%] | 1-acc [%] | total |
|
|
|----------------|------------|-----------|-------------|---------|
|
|
| (0.000, 0.050] | 7.76 | 0 | 100 | 388 |
|
|
| (0.050, 0.100] | 1.1 | 7.76 | 92.24 | 55 |
|
|
| (0.100, 0.150] | 1.3 | 8.86 | 91.14 | 65 |
|
|
| (0.150, 0.200] | 1.08 | 10.16 | 89.84 | 54 |
|
|
| (0.200, 0.250] | 1.48 | 11.24 | 88.76 | 74 |
|
|
| (0.250, 0.300] | 1.4 | 12.72 | 87.28 | 70 |
|
|
| (0.300, 0.350] | 1.22 | 14.12 | 85.88 | 61 |
|
|
| (0.350, 0.400] | 1.34 | 15.34 | 84.66 | 67 |
|
|
| (0.400, 0.450] | 2.04 | 16.68 | 83.32 | 102 |
|
|
| (0.450, 0.500] | 2.76 | 18.72 | 81.28 | 138 |
|
|
| (0.500, 0.550] | 2.3 | 21.48 | 78.52 | 115 |
|
|
| (0.550, 0.600] | 2.56 | 23.78 | 76.22 | 128 |
|
|
| (0.600, 0.650] | 3.08 | 26.35 | 73.65 | 154 |
|
|
| (0.650, 0.700] | 3.62 | 29.43 | 70.57 | 181 |
|
|
| (0.700, 0.750] | 4.8 | 33.05 | 66.95 | 240 |
|
|
| (0.750, 0.800] | 6.38 | 37.85 | 62.15 | 319 |
|
|
| (0.800, 0.850] | 8.62 | 44.23 | 55.77 | 431 |
|
|
| (0.850, 0.900] | 13.36 | 52.85 | 47.15 | 668 |
|
|
| (0.900, 0.950] | 17.22 | 66.21 | 33.79 | 861 |
|
|
| (0.950, 1.000] | 16.56 | 83.44 | 16.56 | 828 |
|
|
|
|
|
|
DocLayNetV1 markdown_text F1: mean=0.91 median=0.96 std=0.15
|
|
|
|
| F1 | prob [%] | acc [%] | 1-acc [%] | total |
|
|
|----------------|------------|-----------|-------------|---------|
|
|
| (0.000, 0.050] | 0.36 | 0 | 100 | 18 |
|
|
| (0.050, 0.100] | 0.38 | 0.36 | 99.64 | 19 |
|
|
| (0.100, 0.150] | 0.36 | 0.74 | 99.26 | 18 |
|
|
| (0.150, 0.200] | 0.34 | 1.1 | 98.9 | 17 |
|
|
| (0.200, 0.250] | 0.46 | 1.44 | 98.56 | 23 |
|
|
| (0.250, 0.300] | 0.26 | 1.9 | 98.1 | 13 |
|
|
| (0.300, 0.350] | 0.2 | 2.16 | 97.84 | 10 |
|
|
| (0.350, 0.400] | 0.4 | 2.36 | 97.64 | 20 |
|
|
| (0.400, 0.450] | 0.4 | 2.76 | 97.24 | 20 |
|
|
| (0.450, 0.500] | 0.48 | 3.16 | 96.84 | 24 |
|
|
| (0.500, 0.550] | 0.66 | 3.64 | 96.36 | 33 |
|
|
| (0.550, 0.600] | 0.76 | 4.3 | 95.7 | 38 |
|
|
| (0.600, 0.650] | 0.78 | 5.06 | 94.94 | 39 |
|
|
| (0.650, 0.700] | 1.16 | 5.84 | 94.16 | 58 |
|
|
| (0.700, 0.750] | 1.1 | 7 | 93 | 55 |
|
|
| (0.750, 0.800] | 2.1 | 8.1 | 91.9 | 105 |
|
|
| (0.800, 0.850] | 4.2 | 10.2 | 89.8 | 210 |
|
|
| (0.850, 0.900] | 8.08 | 14.4 | 85.6 | 404 |
|
|
| (0.900, 0.950] | 21 | 22.48 | 77.52 | 1050 |
|
|
| (0.950, 1.000] | 56.51 | 43.49 | 56.51 | 2825 |
|
|
|
|
|
|
DocLayNetV1 markdown_text precision: mean=0.92 median=0.97 std=0.16
|
|
|
|
| precision | prob [%] | acc [%] | 1-acc [%] | total |
|
|
|----------------|------------|-----------|-------------|---------|
|
|
| (0.000, 0.050] | 0.58 | 0 | 100 | 29 |
|
|
| (0.050, 0.100] | 0.52 | 0.58 | 99.42 | 26 |
|
|
| (0.100, 0.150] | 0.58 | 1.1 | 98.9 | 29 |
|
|
| (0.150, 0.200] | 0.18 | 1.68 | 98.32 | 9 |
|
|
| (0.200, 0.250] | 0.42 | 1.86 | 98.14 | 21 |
|
|
| (0.250, 0.300] | 0.28 | 2.28 | 97.72 | 14 |
|
|
| (0.300, 0.350] | 0.46 | 2.56 | 97.44 | 23 |
|
|
| (0.350, 0.400] | 0.28 | 3.02 | 96.98 | 14 |
|
|
| (0.400, 0.450] | 0.34 | 3.3 | 96.7 | 17 |
|
|
| (0.450, 0.500] | 0.2 | 3.64 | 96.36 | 10 |
|
|
| (0.500, 0.550] | 0.66 | 3.84 | 96.16 | 33 |
|
|
| (0.550, 0.600] | 0.44 | 4.5 | 95.5 | 22 |
|
|
| (0.600, 0.650] | 0.44 | 4.94 | 95.06 | 22 |
|
|
| (0.650, 0.700] | 0.7 | 5.38 | 94.62 | 35 |
|
|
| (0.700, 0.750] | 0.68 | 6.08 | 93.92 | 34 |
|
|
| (0.750, 0.800] | 1.28 | 6.76 | 93.24 | 64 |
|
|
| (0.800, 0.850] | 2 | 8.04 | 91.96 | 100 |
|
|
| (0.850, 0.900] | 4.98 | 10.04 | 89.96 | 249 |
|
|
| (0.900, 0.950] | 15.78 | 15.02 | 84.98 | 789 |
|
|
| (0.950, 1.000] | 69.19 | 30.81 | 69.19 | 3459 |
|
|
|
|
|
|
DocLayNetV1 markdown_text recall: mean=0.92 median=0.96 std=0.12
|
|
|
|
| recall | prob [%] | acc [%] | 1-acc [%] | total |
|
|
|----------------|------------|-----------|-------------|---------|
|
|
| (0.000, 0.050] | 0.14 | 0 | 100 | 7 |
|
|
| (0.050, 0.100] | 0 | 0.14 | 99.86 | 0 |
|
|
| (0.100, 0.150] | 0.04 | 0.14 | 99.86 | 2 |
|
|
| (0.150, 0.200] | 0.08 | 0.18 | 99.82 | 4 |
|
|
| (0.200, 0.250] | 0.2 | 0.26 | 99.74 | 10 |
|
|
| (0.250, 0.300] | 0.14 | 0.46 | 99.54 | 7 |
|
|
| (0.300, 0.350] | 0.22 | 0.6 | 99.4 | 11 |
|
|
| (0.350, 0.400] | 0.28 | 0.82 | 99.18 | 14 |
|
|
| (0.400, 0.450] | 0.36 | 1.1 | 98.9 | 18 |
|
|
| (0.450, 0.500] | 0.56 | 1.46 | 98.54 | 28 |
|
|
| (0.500, 0.550] | 0.74 | 2.02 | 97.98 | 37 |
|
|
| (0.550, 0.600] | 0.68 | 2.76 | 97.24 | 34 |
|
|
| (0.600, 0.650] | 0.6 | 3.44 | 96.56 | 30 |
|
|
| (0.650, 0.700] | 1.02 | 4.04 | 95.96 | 51 |
|
|
| (0.700, 0.750] | 1.28 | 5.06 | 94.94 | 64 |
|
|
| (0.750, 0.800] | 2.26 | 6.34 | 93.66 | 113 |
|
|
| (0.800, 0.850] | 4.66 | 8.6 | 91.4 | 233 |
|
|
| (0.850, 0.900] | 9.68 | 13.26 | 86.74 | 484 |
|
|
| (0.900, 0.950] | 20.82 | 22.94 | 77.06 | 1041 |
|
|
| (0.950, 1.000] | 56.23 | 43.77 | 56.23 | 2811 |
|
|
|
|
|
|
DocLayNetV1 markdown_text edit_distance: mean=0.42 median=0.40 std=0.29
|
|
|
|
| edit_distance | prob [%] | acc [%] | 1-acc [%] | total |
|
|
|-----------------|------------|-----------|-------------|---------|
|
|
| (0.000, 0.050] | 10.98 | 0 | 100 | 549 |
|
|
| (0.050, 0.100] | 7.14 | 10.98 | 89.02 | 357 |
|
|
| (0.100, 0.150] | 7 | 18.12 | 81.88 | 350 |
|
|
| (0.150, 0.200] | 5.44 | 25.13 | 74.87 | 272 |
|
|
| (0.200, 0.250] | 5.08 | 30.57 | 69.43 | 254 |
|
|
| (0.250, 0.300] | 4.76 | 35.65 | 64.35 | 238 |
|
|
| (0.300, 0.350] | 4.78 | 40.41 | 59.59 | 239 |
|
|
| (0.350, 0.400] | 4.28 | 45.19 | 54.81 | 214 |
|
|
| (0.400, 0.450] | 5.36 | 49.47 | 50.53 | 268 |
|
|
| (0.450, 0.500] | 4.54 | 54.83 | 45.17 | 227 |
|
|
| (0.500, 0.550] | 5.16 | 59.37 | 40.63 | 258 |
|
|
| (0.550, 0.600] | 4.6 | 64.53 | 35.47 | 230 |
|
|
| (0.600, 0.650] | 4.08 | 69.13 | 30.87 | 204 |
|
|
| (0.650, 0.700] | 4.4 | 73.21 | 26.79 | 220 |
|
|
| (0.700, 0.750] | 4.3 | 77.62 | 22.38 | 215 |
|
|
| (0.750, 0.800] | 6.3 | 81.92 | 18.08 | 315 |
|
|
| (0.800, 0.850] | 5.1 | 88.22 | 11.78 | 255 |
|
|
| (0.850, 0.900] | 3.22 | 93.32 | 6.68 | 161 |
|
|
| (0.900, 0.950] | 1.76 | 96.54 | 3.46 | 88 |
|
|
| (0.950, 1.000] | 1.7 | 98.3 | 1.7 | 85 |
|
|
|
|
|
|
DocLayNetV1 markdown_text meteor: mean=0.80 median=0.87 std=0.21
|
|
|
|
| meteor | prob [%] | acc [%] | 1-acc [%] | total |
|
|
|----------------|------------|-----------|-------------|---------|
|
|
| (0.000, 0.050] | 0.68 | 0 | 100 | 34 |
|
|
| (0.050, 0.100] | 0.5 | 0.68 | 99.32 | 25 |
|
|
| (0.100, 0.150] | 0.62 | 1.18 | 98.82 | 31 |
|
|
| (0.150, 0.200] | 0.96 | 1.8 | 98.2 | 48 |
|
|
| (0.200, 0.250] | 1 | 2.76 | 97.24 | 50 |
|
|
| (0.250, 0.300] | 1.04 | 3.76 | 96.24 | 52 |
|
|
| (0.300, 0.350] | 0.96 | 4.8 | 95.2 | 48 |
|
|
| (0.350, 0.400] | 1.5 | 5.76 | 94.24 | 75 |
|
|
| (0.400, 0.450] | 1.8 | 7.26 | 92.74 | 90 |
|
|
| (0.450, 0.500] | 1.58 | 9.06 | 90.94 | 79 |
|
|
| (0.500, 0.550] | 2.08 | 10.64 | 89.36 | 104 |
|
|
| (0.550, 0.600] | 2.44 | 12.72 | 87.28 | 122 |
|
|
| (0.600, 0.650] | 3.42 | 15.16 | 84.84 | 171 |
|
|
| (0.650, 0.700] | 5.26 | 18.58 | 81.42 | 263 |
|
|
| (0.700, 0.750] | 6.22 | 23.84 | 76.16 | 311 |
|
|
| (0.750, 0.800] | 7.68 | 30.07 | 69.93 | 384 |
|
|
| (0.800, 0.850] | 9.18 | 37.75 | 62.25 | 459 |
|
|
| (0.850, 0.900] | 10.14 | 46.93 | 53.07 | 507 |
|
|
| (0.900, 0.950] | 15.76 | 57.07 | 42.93 | 788 |
|
|
| (0.950, 1.000] | 27.17 | 72.83 | 27.17 | 1358 |
|
|
|
|
|