Files
docling/tests/test_pdf_password.py
Peter W. J. Staar bf417e6d26 feat: Introduce docling-parse v5 and deprecate old docling-parse backends (#2872)
* feat: simplifying towards docling-parse v5

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* working on integrating docling-parse v5

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* added the test_backend_docling_parse

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* Updated the docling-parse to 5.3.0

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* ran the pre-commit

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* fixed the backend_docling_parse

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* ran pre-commit

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* updated the groundtruth to deal with rounding errors

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* updated comments for later docling-parse integrations

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* ran pre-commit

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* Make DoclingParseV2 and DoclingParseV4 backend stubs that route to new backend, emit warning.

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* lock docling-parse

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* updated to 3.5.2

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

---------

Signed-off-by: Peter Staar <taa@zurich.ibm.com>
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
Co-authored-by: Christoph Auer <cau@zurich.ibm.com>
Co-authored-by: Michele Dolfi <dol@zurich.ibm.com>
2026-02-17 20:27:56 +01:00

64 lines
1.7 KiB
Python

from dataclasses import dataclass
from pathlib import Path
from typing import Iterable
import pytest
from docling.backend.docling_parse_backend import DoclingParseDocumentBackend
from docling.backend.pypdfium2_backend import (
PyPdfiumDocumentBackend,
)
from docling.datamodel.backend_options import PdfBackendOptions
from docling.datamodel.base_models import ConversionStatus, InputFormat
from docling.datamodel.pipeline_options import PdfPipelineOptions
from docling.document_converter import DocumentConverter, PdfFormatOption
@pytest.fixture
def test_doc_path():
return Path("./tests/data/pdf_password/2206.01062_pg3.pdf")
@dataclass
class TestOption:
options: PdfFormatOption
name: str
def converter_opts_gen() -> Iterable[TestOption]:
pipeline_options = PdfPipelineOptions(
do_ocr=False,
do_table_structure=False,
)
backend_options = PdfBackendOptions(password="1234")
yield TestOption(
options=PdfFormatOption(
pipeline_options=pipeline_options,
backend=PyPdfiumDocumentBackend,
backend_options=backend_options,
),
name="PyPdfium",
)
yield TestOption(
options=PdfFormatOption(
pipeline_options=pipeline_options,
backend=DoclingParseDocumentBackend,
backend_options=backend_options,
),
name="DoclingParse",
)
@pytest.mark.asyncio
@pytest.mark.parametrize("test_options", converter_opts_gen(), ids=lambda o: o.name)
def test_get_text_from_rect(test_doc_path: Path, test_options: TestOption):
converter = DocumentConverter(
format_options={InputFormat.PDF: test_options.options}
)
res = converter.convert(test_doc_path)
assert res.status == ConversionStatus.SUCCESS