Files
docling-eval/docs/evaluations/DocLayNetV1/evaluation_DocLayNetV1_markdown_text.txt
Nikos Livathinos ddae1ec966 fix: Fix the modalities for DPBench, OmniDocBench, DLNv1. Switch to new settings in SmolDocling API. Improve the documentation. (#37)
* chore: Change the pinning of docling

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* fix: Fix the modalities supported for DPBench, OmniDocBench, DLNv1. Clean up code.

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* docs: Update documentation to have all benchmarks in separate md files and place links in Readme.

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* fix: Change the initialization of the create_smol_docling_converter() to allow flash-attn

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* docs: List benchmarks in the main readme with short description. Fix broken links in the documentation.

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* docs: Fix broken link in Readme.

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* chore: Update lock file

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* fix: Add debug code to dump the predicted text in create_dlnv1_e2e_dataset()

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* chore: Update toml to pin docling with branch and extras

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* fix: Disable the generation of VLM text debugging files for DLNv1 benchmark

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* chore: Update toml to docling v2.25.0 with vln extra

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

---------

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>
2025-02-26 15:50:02 +01:00

159 lines
9.2 KiB
Plaintext

DocLayNetV1 size: 4999
DocLayNetV1 markdown_text BLEU: mean=0.71 median=0.84 std=0.30
| BLEU | prob [%] | acc [%] | 1-acc [%] | total |
|----------------|------------|-----------|-------------|---------|
| (0.000, 0.050] | 7.76 | 0 | 100 | 388 |
| (0.050, 0.100] | 1.1 | 7.76 | 92.24 | 55 |
| (0.100, 0.150] | 1.3 | 8.86 | 91.14 | 65 |
| (0.150, 0.200] | 1.08 | 10.16 | 89.84 | 54 |
| (0.200, 0.250] | 1.48 | 11.24 | 88.76 | 74 |
| (0.250, 0.300] | 1.4 | 12.72 | 87.28 | 70 |
| (0.300, 0.350] | 1.22 | 14.12 | 85.88 | 61 |
| (0.350, 0.400] | 1.34 | 15.34 | 84.66 | 67 |
| (0.400, 0.450] | 2.04 | 16.68 | 83.32 | 102 |
| (0.450, 0.500] | 2.76 | 18.72 | 81.28 | 138 |
| (0.500, 0.550] | 2.3 | 21.48 | 78.52 | 115 |
| (0.550, 0.600] | 2.56 | 23.78 | 76.22 | 128 |
| (0.600, 0.650] | 3.08 | 26.35 | 73.65 | 154 |
| (0.650, 0.700] | 3.62 | 29.43 | 70.57 | 181 |
| (0.700, 0.750] | 4.8 | 33.05 | 66.95 | 240 |
| (0.750, 0.800] | 6.38 | 37.85 | 62.15 | 319 |
| (0.800, 0.850] | 8.62 | 44.23 | 55.77 | 431 |
| (0.850, 0.900] | 13.36 | 52.85 | 47.15 | 668 |
| (0.900, 0.950] | 17.22 | 66.21 | 33.79 | 861 |
| (0.950, 1.000] | 16.56 | 83.44 | 16.56 | 828 |
DocLayNetV1 markdown_text F1: mean=0.91 median=0.96 std=0.15
| F1 | prob [%] | acc [%] | 1-acc [%] | total |
|----------------|------------|-----------|-------------|---------|
| (0.000, 0.050] | 0.36 | 0 | 100 | 18 |
| (0.050, 0.100] | 0.38 | 0.36 | 99.64 | 19 |
| (0.100, 0.150] | 0.36 | 0.74 | 99.26 | 18 |
| (0.150, 0.200] | 0.34 | 1.1 | 98.9 | 17 |
| (0.200, 0.250] | 0.46 | 1.44 | 98.56 | 23 |
| (0.250, 0.300] | 0.26 | 1.9 | 98.1 | 13 |
| (0.300, 0.350] | 0.2 | 2.16 | 97.84 | 10 |
| (0.350, 0.400] | 0.4 | 2.36 | 97.64 | 20 |
| (0.400, 0.450] | 0.4 | 2.76 | 97.24 | 20 |
| (0.450, 0.500] | 0.48 | 3.16 | 96.84 | 24 |
| (0.500, 0.550] | 0.66 | 3.64 | 96.36 | 33 |
| (0.550, 0.600] | 0.76 | 4.3 | 95.7 | 38 |
| (0.600, 0.650] | 0.78 | 5.06 | 94.94 | 39 |
| (0.650, 0.700] | 1.16 | 5.84 | 94.16 | 58 |
| (0.700, 0.750] | 1.1 | 7 | 93 | 55 |
| (0.750, 0.800] | 2.1 | 8.1 | 91.9 | 105 |
| (0.800, 0.850] | 4.2 | 10.2 | 89.8 | 210 |
| (0.850, 0.900] | 8.08 | 14.4 | 85.6 | 404 |
| (0.900, 0.950] | 21 | 22.48 | 77.52 | 1050 |
| (0.950, 1.000] | 56.51 | 43.49 | 56.51 | 2825 |
DocLayNetV1 markdown_text precision: mean=0.92 median=0.97 std=0.16
| precision | prob [%] | acc [%] | 1-acc [%] | total |
|----------------|------------|-----------|-------------|---------|
| (0.000, 0.050] | 0.58 | 0 | 100 | 29 |
| (0.050, 0.100] | 0.52 | 0.58 | 99.42 | 26 |
| (0.100, 0.150] | 0.58 | 1.1 | 98.9 | 29 |
| (0.150, 0.200] | 0.18 | 1.68 | 98.32 | 9 |
| (0.200, 0.250] | 0.42 | 1.86 | 98.14 | 21 |
| (0.250, 0.300] | 0.28 | 2.28 | 97.72 | 14 |
| (0.300, 0.350] | 0.46 | 2.56 | 97.44 | 23 |
| (0.350, 0.400] | 0.28 | 3.02 | 96.98 | 14 |
| (0.400, 0.450] | 0.34 | 3.3 | 96.7 | 17 |
| (0.450, 0.500] | 0.2 | 3.64 | 96.36 | 10 |
| (0.500, 0.550] | 0.66 | 3.84 | 96.16 | 33 |
| (0.550, 0.600] | 0.44 | 4.5 | 95.5 | 22 |
| (0.600, 0.650] | 0.44 | 4.94 | 95.06 | 22 |
| (0.650, 0.700] | 0.7 | 5.38 | 94.62 | 35 |
| (0.700, 0.750] | 0.68 | 6.08 | 93.92 | 34 |
| (0.750, 0.800] | 1.28 | 6.76 | 93.24 | 64 |
| (0.800, 0.850] | 2 | 8.04 | 91.96 | 100 |
| (0.850, 0.900] | 4.98 | 10.04 | 89.96 | 249 |
| (0.900, 0.950] | 15.78 | 15.02 | 84.98 | 789 |
| (0.950, 1.000] | 69.19 | 30.81 | 69.19 | 3459 |
DocLayNetV1 markdown_text recall: mean=0.92 median=0.96 std=0.12
| recall | prob [%] | acc [%] | 1-acc [%] | total |
|----------------|------------|-----------|-------------|---------|
| (0.000, 0.050] | 0.14 | 0 | 100 | 7 |
| (0.050, 0.100] | 0 | 0.14 | 99.86 | 0 |
| (0.100, 0.150] | 0.04 | 0.14 | 99.86 | 2 |
| (0.150, 0.200] | 0.08 | 0.18 | 99.82 | 4 |
| (0.200, 0.250] | 0.2 | 0.26 | 99.74 | 10 |
| (0.250, 0.300] | 0.14 | 0.46 | 99.54 | 7 |
| (0.300, 0.350] | 0.22 | 0.6 | 99.4 | 11 |
| (0.350, 0.400] | 0.28 | 0.82 | 99.18 | 14 |
| (0.400, 0.450] | 0.36 | 1.1 | 98.9 | 18 |
| (0.450, 0.500] | 0.56 | 1.46 | 98.54 | 28 |
| (0.500, 0.550] | 0.74 | 2.02 | 97.98 | 37 |
| (0.550, 0.600] | 0.68 | 2.76 | 97.24 | 34 |
| (0.600, 0.650] | 0.6 | 3.44 | 96.56 | 30 |
| (0.650, 0.700] | 1.02 | 4.04 | 95.96 | 51 |
| (0.700, 0.750] | 1.28 | 5.06 | 94.94 | 64 |
| (0.750, 0.800] | 2.26 | 6.34 | 93.66 | 113 |
| (0.800, 0.850] | 4.66 | 8.6 | 91.4 | 233 |
| (0.850, 0.900] | 9.68 | 13.26 | 86.74 | 484 |
| (0.900, 0.950] | 20.82 | 22.94 | 77.06 | 1041 |
| (0.950, 1.000] | 56.23 | 43.77 | 56.23 | 2811 |
DocLayNetV1 markdown_text edit_distance: mean=0.42 median=0.40 std=0.29
| edit_distance | prob [%] | acc [%] | 1-acc [%] | total |
|-----------------|------------|-----------|-------------|---------|
| (0.000, 0.050] | 10.98 | 0 | 100 | 549 |
| (0.050, 0.100] | 7.14 | 10.98 | 89.02 | 357 |
| (0.100, 0.150] | 7 | 18.12 | 81.88 | 350 |
| (0.150, 0.200] | 5.44 | 25.13 | 74.87 | 272 |
| (0.200, 0.250] | 5.08 | 30.57 | 69.43 | 254 |
| (0.250, 0.300] | 4.76 | 35.65 | 64.35 | 238 |
| (0.300, 0.350] | 4.78 | 40.41 | 59.59 | 239 |
| (0.350, 0.400] | 4.28 | 45.19 | 54.81 | 214 |
| (0.400, 0.450] | 5.36 | 49.47 | 50.53 | 268 |
| (0.450, 0.500] | 4.54 | 54.83 | 45.17 | 227 |
| (0.500, 0.550] | 5.16 | 59.37 | 40.63 | 258 |
| (0.550, 0.600] | 4.6 | 64.53 | 35.47 | 230 |
| (0.600, 0.650] | 4.08 | 69.13 | 30.87 | 204 |
| (0.650, 0.700] | 4.4 | 73.21 | 26.79 | 220 |
| (0.700, 0.750] | 4.3 | 77.62 | 22.38 | 215 |
| (0.750, 0.800] | 6.3 | 81.92 | 18.08 | 315 |
| (0.800, 0.850] | 5.1 | 88.22 | 11.78 | 255 |
| (0.850, 0.900] | 3.22 | 93.32 | 6.68 | 161 |
| (0.900, 0.950] | 1.76 | 96.54 | 3.46 | 88 |
| (0.950, 1.000] | 1.7 | 98.3 | 1.7 | 85 |
DocLayNetV1 markdown_text meteor: mean=0.80 median=0.87 std=0.21
| meteor | prob [%] | acc [%] | 1-acc [%] | total |
|----------------|------------|-----------|-------------|---------|
| (0.000, 0.050] | 0.68 | 0 | 100 | 34 |
| (0.050, 0.100] | 0.5 | 0.68 | 99.32 | 25 |
| (0.100, 0.150] | 0.62 | 1.18 | 98.82 | 31 |
| (0.150, 0.200] | 0.96 | 1.8 | 98.2 | 48 |
| (0.200, 0.250] | 1 | 2.76 | 97.24 | 50 |
| (0.250, 0.300] | 1.04 | 3.76 | 96.24 | 52 |
| (0.300, 0.350] | 0.96 | 4.8 | 95.2 | 48 |
| (0.350, 0.400] | 1.5 | 5.76 | 94.24 | 75 |
| (0.400, 0.450] | 1.8 | 7.26 | 92.74 | 90 |
| (0.450, 0.500] | 1.58 | 9.06 | 90.94 | 79 |
| (0.500, 0.550] | 2.08 | 10.64 | 89.36 | 104 |
| (0.550, 0.600] | 2.44 | 12.72 | 87.28 | 122 |
| (0.600, 0.650] | 3.42 | 15.16 | 84.84 | 171 |
| (0.650, 0.700] | 5.26 | 18.58 | 81.42 | 263 |
| (0.700, 0.750] | 6.22 | 23.84 | 76.16 | 311 |
| (0.750, 0.800] | 7.68 | 30.07 | 69.93 | 384 |
| (0.800, 0.850] | 9.18 | 37.75 | 62.25 | 459 |
| (0.850, 0.900] | 10.14 | 46.93 | 53.07 | 507 |
| (0.900, 0.950] | 15.76 | 57.07 | 42.93 | 788 |
| (0.950, 1.000] | 27.17 | 72.83 | 27.17 | 1358 |