fix(cli): clarify help text
Fix confusing CLI help output by correcting wording and aligning the source argument name.
Signed-off-by: Jefsky <hwj3344@hotmail.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
The `_parse_otsl_output` function used to translate VLM OSTL output
into structured table data incorrectly merges ucel to the left
if they are not in the first column. This causes column spanning
cells to be incorrectly reported as row spanning.
Signed-off-by: Sunny He <sunny_he@apple.com>
* Enable retry on HTTP 502
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Retry on transport failures
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Re-establish websocket channel after transient connectivity failures
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
---------
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
It looks as though the `task_status` of a `TaskStatusResponse` should be the same as the `status` of a `ConversionResuilt` at least for when the task is a conversion. The other type of task is a chunking task, would that have different statuses?
If this `task_status` field can be narrowed from str to an enum, the Docling-Serve API docs will be more useful.
Signed-off-by: Phil Nash <philnash@gmail.com>
ci: unify Python version to 3.10 across CI lanes
Several CI jobs were hardcoded to Python 3.12 while the default matrix
and the project's minimum supported version is 3.10. This caused PR
runs (e.g. #3414) to mix 3.10 and 3.12, hiding 3.10-specific issues
behind 3.12-only checks and adding noise to the CI summary.
Switch the single-version CI lanes (lint, tach, pr-fast-checks,
windows/macOS smoke tests, build-package, test-package) to 3.10 so
they match the default matrix. Release/publishing flows (cd.yml,
pypi.yml) keep their existing 3.12 pin since they target the
release toolchain rather than the CI matrix.
Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Introduce a unified TransformersExtractionModel that supports multiple
prompt styles via an ExtractionPromptStyle enum. This replaces the
need for separate model classes per VLM.
- Add ExtractionPromptStyle enum (NUEXTRACT, GRANITE_VISION)
- Add prompt_utils.py with style-specific prompt builders
- Add TransformersExtractionModel with prompt-style dispatch
- Add GRANITE_VISION_4_1_TRANSFORMERS model spec
- Add extraction_prompt_style field to VlmExtractionPipelineOptions
Signed-off-by: Ben Wiesel <benwiesel@ibm.com>
Co-authored-by: Ben Wiesel <benwiesel@ibm.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Remove eager materialization from docling-service batch submission
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Update lock
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Make convert_all evaluate Iterable input lazily, remove raises_on_error
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Make convert_all use async generator like submit_and_retrieve_many
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* fix mypy fast check
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* upgrade packages
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* update test GT data
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Update test GT from linux machine
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Reset all GT test data and uv.lock
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
---------
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* fix(docx): handle missing chr attribute in groupChr OMML elements
Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com>
* fix(docx): escape spaces in OMML limit text for proper LaTeX rendering
Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com>
* fix(docx): fix inline equation reconstruction to prevent tag corruption
Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com>
* chore(docx): add type hints and docstrings to OMML module
Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com>
* fix(docx): fix genfrac formatting and eliminate grouping function warnings
Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com>
* fix(docx): handle unmapped characters in OMML % formatting
Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com>
---------
Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com>
* feat(table-structure): swap VLM model to granite-vision-4.1-4b
Updates GraniteVisionTableStructureModel to use the 4.1 model. The 4.1
weights are pre-merged, so merge_lora_adapters() is now hasattr-guarded.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Eli Schwartz <eliyahu.schwartz@ibm.com>
* feat(chart-extraction): swap V4 VLM model to granite-vision-4.1-4b
Updates ChartExtractionModelGraniteVisionV4 to use the 4.1 model.
hasattr-guards the merge_lora_adapters() call since 4.1 weights are
pre-merged.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Eli Schwartz <eliyahu.schwartz@ibm.com>
* docs(example): mention granite-vision-4.1-4b in table-structure example
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Eli Schwartz <eliyahu.schwartz@ibm.com>
* docs(catalog): update Granite Vision entry to 4.1-4b
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Eli Schwartz <eliyahu.schwartz@ibm.com>
* feat(chart-extraction): honor cuda_use_flash_attention2 in V4 loader
Mirrors the table-structure loader so ChartExtractionModelGraniteVisionV4
also passes _attn_implementation based on AcceleratorOptions. Without this
the chart model falls back to the transformers SDPA default, which can
hit cuDNN backend failures on some torch/cuDNN stacks while the table
model (which already passed the flag) runs cleanly.
Stores accelerator_options on the base class so subclasses can read it.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Eli Schwartz <eliyahu.schwartz@ibm.com>
* fix(model-downloader): update Granite Vision log message to 4.1
The log message in download_models still mentioned "Granite Vision 4.0"
after the model swap. Correct it to match the current model version.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Eli Schwartz <eliyahu.schwartz@ibm.com>
* fix(chart-extraction): fall back to bare CSV when V4 model omits ```csv``` fence
granite-vision-4.1-4b sometimes emits raw CSV without a ```csv``` code fence
for the <chart2csv> prompt, which caused _extract_csv_to_dataframe to raise
ValueError and drop the chart's tabular_chart metadata. Mirror the tolerant
parsing already used by the v3 class: prefer a fenced block, otherwise strip
any stray backtick prefix/suffix and parse the text as-is.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Eli Schwartz <eliyahu.schwartz@ibm.com>
---------
Signed-off-by: Eli Schwartz <eliyahu.schwartz@ibm.com>
Co-authored-by: Eli Schwartz <eliyahu.schwartz@ibm.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* ci: prototype tach-based modular skipping
Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com>
* ci: modularize ubuntu setup and refine gating
Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com>
* ci: adopt metaxy-inspired governance helpers
- replace custom aggregate check with re-actors/alls-green
- set FORCE_JAVASCRIPT_ACTIONS_TO_NODE24 on every workflow
- keep PR concurrency alive when the graphite:merge label is present
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com>
* ci: tune checks and pin action versions
Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com>
* ci: split CI suites and heavy examples
Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com>
* DCO Remediation Commit for Georg Heiler <georg.kf.heiler@gmail.com>
I, Georg Heiler <georg.kf.heiler@gmail.com>, hereby add my Signed-off-by to this commit: ecaa4777886157d5c2a7b3893c3a820983089dbf
I, Georg Heiler <georg.kf.heiler@gmail.com>, hereby add my Signed-off-by to this commit: d15416f3ca94ac97af2a8317cd6404208db9d896
Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com>
* ci: sharpen tach graph and per-suite path filters
- Split docling.pipeline into per-pipeline tach modules
(asr, vlm, standard_pdf, threaded_standard_pdf, legacy_standard_pdf,
extraction_vlm, base, base_extraction, simple) so pytest --tach-base
impact analysis can attribute changes to a specific pipeline rather
than the whole package.
- Split the asr- and vlm-specific docling.datamodel option files
(asr_model_specs, pipeline_options_asr_model, vlm_engine_options,
vlm_model_specs, pipeline_options_vlm_model, layout_model_specs,
stage_model_specs, backend_options) into their own tach modules so
a narrow spec/options change no longer marks the full datamodel as
impacted.
- Narrow the per-suite pipeline path filters in checks.yml to the
concrete pipeline files relevant to each suite, so editing
vlm_pipeline.py only triggers the vlm matrix cell and editing
asr_pipeline.py only the asr one.
- Rekey the model cache in setup-ubuntu-ci to include runner.os and
hashFiles(uv.lock, pyproject.toml), with ordered restore-keys
fallbacks so a lockfile bump no longer silently stales the cache.
Metaxy parity note: layered tach enforcement (layer = "...") is
blocked by existing backend<->datamodel and utils<->stages cycles;
depot runners, nox dynamic matrices, devenv/nix, dprint and ty are
not applicable to docling's stack. All pinned action SHAs are on
their latest release as of this commit.
Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com>
* ci: introduce pipeline and orchestration tach layers
Earlier notes claimed layers were blocked. That was only true for the
cyclic core (backend<->datamodel, utils<->stages). The boundary
*above* core is clean:
- No module under docling/backend, docling/datamodel, docling/models,
docling/utils, docling/exceptions, or docling/chunking imports
anything from docling.pipeline (verified by grep).
- No module anywhere in docling/ imports from docling.cli,
docling.document_converter, docling.document_extractor, or
docling.service_client (also verified).
So we can introduce two real layers on top of the cyclic core:
- "pipeline" — docling.pipeline and all nine concrete pipelines
(base, simple, base_extraction, asr, vlm,
extraction_vlm, standard_pdf,
threaded_standard_pdf, legacy_standard_pdf).
- "orchestration" — docling.cli, docling.document_converter,
docling.document_extractor, and
docling.experimental.pipeline.
Unlayered modules stay "below" both layers (tach allows them to be
depended on freely) and continue to carry the declared-but-cyclic
backend<->datamodel and utils<->stages edges.
A VLM-only layer was explored but rejected: only
docling.pipeline.vlm_pipeline and docling.pipeline.extraction_vlm_pipeline
could be cleanly layered as "vlm", because the matching datamodel
options (pipeline_options_vlm_model, vlm_engine_options,
vlm_model_specs) and model stages (vlm_convert, vlm_pipeline_models)
sit inside the datamodel/models cycle and cannot be promoted to a
higher layer without first breaking that cycle. Layering only the
two pipeline files is not worth the extra config.
Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com>
* ci: expand tach layers to entrypoints/pipeline/models/core
Follow-up to the two-layer attempt. After verifying via grep that
nothing in datamodel/utils/backend imports from
docling.models.{extraction,factories,plugins,vlm_pipeline_models}
or from the "upper" stages (page_assemble, page_preprocessing,
reading_order, picture_description, vlm_convert), those nine
modules can be promoted out of the cyclic core into a dedicated
"models" layer.
The resulting order (highest first):
- entrypoints — cli, document_converter, document_extractor,
experimental.pipeline
- pipeline — docling.pipeline + the nine concrete pipelines
- models — model factories, extraction, plugins,
vlm_pipeline_models, and the five "upper" stages
- core — datamodel*, backend*, utils, exceptions, chunking,
models (base), models.utils, inference_engines.*,
the six "core stages" that utils cycles with
(chart_extraction, code_formula, layout, ocr,
picture_classifier, table_structure), and the
experimental.* and service_client modules
Rename the previous "orchestration" layer to "entrypoints" to
match the common docling vocabulary. Every module now carries an
explicit layer tag instead of relying on implicit unlayered
behaviour, so future additions must pick a layer deliberately.
A VLM layer, a stand-alone inference-engines layer, and separating
datamodel from backend all remain blocked by the bidirectional
backend<->datamodel and utils<->core-stages edges; those need a
code-level refactor first.
Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com>
* ci: refine tach client and foundation layers
Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com>
* ci: add optional windows and macos smoke lanes
Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com>
* ci: normalize reusable workflow boolean inputs
Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com>
* ci: replace external all-green action
Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com>
* ci: use org-allowed setup-uv action
Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com>
* ci: install compiler toolchain for ML tests
Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com>
* DCO Remediation Commit for Georg Heiler <georg.kf.heiler@gmail.com>
I, Georg Heiler <georg.kf.heiler@gmail.com>, hereby add my Signed-off-by to this commit: bb714afb42cd1b29ab073a7f59cc72874ff2fdcd
I, Georg Heiler <georg.kf.heiler@gmail.com>, hereby add my Signed-off-by to this commit: a1f2761da8f72bfed636bd571ebf77b42c8771b6
Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com>
* DCO Remediation Commit for Georg Heiler <georg.kf.heiler@gmail.com>
I, Georg Heiler <georg.kf.heiler@gmail.com>, hereby add my Signed-off-by to this commit: cc6551b54c5bf4815ae9cd57cf43a98928a74be0
I, Georg Heiler <georg.kf.heiler@gmail.com>, hereby add my Signed-off-by to this commit: b21b0e7ca12b552dbdd54fac1bda113719c286f1
Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com>
* ci: simplify ML pytest suite patterns
Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com>
* ci: gate heavy examples on label, add job timeouts
- ci-heavy-examples: run only on main push, schedule, workflow_dispatch,
or when a PR is labeled tests:full / tests:heavy-examples. Drops the
path-based auto-trigger so that common edits to pyproject.toml,
uv.lock, or .github/actions do not kick off the 45-60min matrix on
every PR push. Collapses the changes job into a job-level if gate and
adds timeout-minutes: 90.
- checks.yml: add timeout-minutes to every job so stuck runners cannot
burn the full 6h default.
Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com>
* ci: tolerate cancelled allowed-skip jobs in check aggregator
Intentional cancellations (manual cancel, concurrency replacement) on
jobs that are already in ALLOWED_SKIPS should not mark the overall
workflow red. Treat `cancelled` the same as `skipped` when the job is
listed as an allowed skip; any unexpected cancellation of a required
job still fails.
Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com>
* docs: make minimal vlm example portable
Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com>
* DCO Remediation Commit for Georg Heiler <georg.kf.heiler@gmail.com>
I, Georg Heiler <georg.kf.heiler@gmail.com>, hereby add my Signed-off-by to this commit: 2135051da3ed73d4b8a9130f584f40b56155af1a
I, Georg Heiler <georg.kf.heiler@gmail.com>, hereby add my Signed-off-by to this commit: 4f6d1d7960f7418d0cde6425ae61538da84fda40
Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com>
* ci: install workspace packages in CI syncs
Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com>
* DCO Remediation Commit for Georg Heiler <georg.kf.heiler@gmail.com>
I, Georg Heiler <georg.kf.heiler@gmail.com>, hereby add my Signed-off-by to this commit: 492fa9883d4de6d98ebcb40fa863eafe2facff3c
I, Georg Heiler <georg.kf.heiler@gmail.com>, hereby add my Signed-off-by to this commit: 3eefae71643f9ca3df0264690c0c6eb1f67f06f1
Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com>
* DCO Remediation Commit for Georg Heiler <georg.kf.heiler@gmail.com>
I, Georg Heiler <georg.kf.heiler@gmail.com>, hereby add my Signed-off-by to this commit: fe8c9689a0ee94f36eb826da8e2177ef87404f5e
I, Georg Heiler <georg.kf.heiler@gmail.com>, hereby add my Signed-off-by to this commit: eabdd24a6734ec873cdaac857718aef2473677e7
Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com>
* ci: remove unused graphite concurrency exception
Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com>
* ci: document test labels and gate cross-platform lanes
Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com>
* ci: select ml tests with pytest markers
Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com>
* ci: fix marker selector typing
Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com>
* ci: simplify ml suite scheduling
Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com>
* ci: mark cross-platform smoke tests
Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com>
* ci: reuse test trigger for ml matrix
Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com>
* ci: tighten full ci aggregation
Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com>
* ci: share required job result check
Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com>
---------
Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* ci: add phase 1 path-based workflow skipping
Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com>
* ci: add fast pull_request_target lint checks
Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com>
* ci: keep pr fast checks cheap
Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com>
* ci: expand full matrix triggers
Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com>
* ci: enable same-repo and merge queue checks
Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com>
* ci: harden pull_request_target fetch inputs
Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com>
* ci: address phase 1 workflow review
Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com>
* ci: grant reusable checks permissions
Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com>
* ci: temporarily enable pr fast checks validation
Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com>
* ci: allow first run of pr fast checks
Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com>
* ci: load pr fast check script for first validation
Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com>
* ci: format pr fast check script
Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com>
* ci: guard temporary pr fast check script fallback
Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com>
* ci: use pr metadata for temporary fast check validation
Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com>
* ci: remove temporary pr fast checks trigger
Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com>
* ci: disable duplicate pull request runs
Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com>
* ci: run fast pr checks without path trigger filter
Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com>
* ci: add job timeouts in checks.yml
Cap every job so a stuck runner cannot burn the 6h default. Limits:
changes=5, lint=20, run-tests-1/2=45, run-examples=60,
test-pip-install-*=30, build/test-package=15.
Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com>
* ci: restore pull request workflow triggers
Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com>
* ci: run lint on pull requests
Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com>
---------
Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com>
* fix(pptx): skip malformed picture shapes instead of aborting conversion
MsPowerpointDocumentBackend._handle_pictures reads embedded image bytes via python-pptx's shape.image accessor. On PPTX files with slightly malformed <p:pic> shapes, shape.image raises three exceptions that the existing (UnidentifiedImageError, OSError, ValueError) clause does not catch, so one bad picture aborts conversion of the entire presentation:
- InvalidXmlError when <p:blipFill> is missing
- KeyError when <a:blip r:embed> points to an unknown relationship
- AttributeError when the embedded part's content-type isn't an image
These files open normally in Keynote and Google Drive, so the backend should handle them as gracefully as it already handles truncated or unreadable image payloads.
This follows the same pattern as #2914, which extended the same except tuple with ValueError to handle linked (external) image references. The three cases above are the remaining shape.image failure modes that still escape.
Extend the except tuple to cover the three cases and log the same warning used for other unreadable images, leaving the rest of the presentation to convert normally. Add a regression fixture with one malformed picture per failure mode plus a focused test.
Fixes#3371
Signed-off-by: pateltejas <tejas226@hotmail.com>
* refactor(pptx): use warnings.warn for malformed picture skips
Address PR review feedback: use Python's warnings module with UserWarning to signal the skip to callers instead of logging.Logger.warning, matching the pattern used in msword_backend for "Skipping external image reference". This makes the skip visible via standard warning filters and catchable in tests.
Update the regression test to assert the warning is emitted via pytest.warns, which also suppresses the message during the test run so it doesn't clutter suite output.
Signed-off-by: pateltejas <tejas226@hotmail.com>
---------
Signed-off-by: pateltejas <tejas226@hotmail.com>
* chore: Update .gitignore with local dirs of AI agents
Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>
* feat: Extend KserveV2OcrModel and kserve_v2_grpc.py to support the new version of Triton-RapidOCR
model where the language is the first input parameter:
- The gRPC client has been extended to encode BYTE input, needed for String types.
- An additional test ensures to have proper BYTE encoding/decoding.
Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>
* feat: Add test for the KServe-Triton integration: WIP
- The test currently supports only the gRPC KServe client
- Extend the ground-truth test data.
Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>
* fix: Simplify code in kserve test
Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>
* chore: Rename test file
Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>
* feat: Extend the kserve_v2 implementation to support binary data in the HTTP interface.
- Decouple functions for binary encoding/decoding inside the kserve_v2_utils.py and share for both HTTP and gRPC.
- Introduce use_binary_data init parameter in KserveV2OptionsMixin
- Improve tests
Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>
* fix: Put back the field grpc_use_binary_data of KserveV2OptionsMixin as a deprecated alias to use_binary_data
Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>
---------
Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>
* fix(docx): handle unsupported limit functions gracefully in OMML conversion
Replace RuntimeError with graceful fallback for unknown limit functions in do_limlow().
Add argmax and argmin to LIM_FUNC dictionary for proper LaTeX rendering.
Fixes conversion failures when Word documents contain mathematical operators
not previously supported in the limit function dictionary.
Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com>
* test(docx): regenerate ground truth files
Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com>
---------
Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com>
* feat(docx): add checkbox parsing support to MsWordDocumentBackend
Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com>
* refactor(docx): remove duplicate code in text element handling
Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com>
* docs(docx): update checkbox method docstrings
Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com>
* refactor(docx): use self._BLIP_NAMESPACES for w14 namespace in checkbox methods
Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com>
---------
Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com>
Replace mutable default arguments (e.g., `options=SomeOptions()`) with
`None` and initialize inside functions to prevent shared state issues.
Affects 15 backend classes across the codebase.
Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com>
Co-authored-by: Panos Vagenas <pva@zurich.ibm.com>
fix(html): preserve fragment-only anchor links during path resolution
Fragment-only hrefs (e.g. href="#section1") were resolved as filesystem
paths when source_uri was set, breaking internal document navigation.
Add '#' to the skip-resolution prefixes in _resolve_relative_path() so
fragment links pass through unchanged.
Partially addresses #2929
Signed-off-by: aatrey56 <aatrey.sahay@gmail.com>
* Add ResponseFormat.DOCLANG and parsing branch in VLM pipeline
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* fix: Remove bogus preamble from VLM chat template
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* fix: Add include_stop_str_in_output in allowed VLLM sampling params
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* allow vllm model_impl to be defined
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* make VLLM model_impl default to auto
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
---------
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* fix: prevent XXE and decompression bomb in METS-GBS processing
Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com>
* refactor: enforce resource limits for METS-GBS tar extraction
Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com>
---------
Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com>