Commit Graph

36 Commits

Author SHA1 Message Date
EliSchwartz 24f2d148d9 feat(vlm): upgrade Granite Vision model to 4.1 for table + chart extraction (#3382)
* feat(table-structure): swap VLM model to granite-vision-4.1-4b

Updates GraniteVisionTableStructureModel to use the 4.1 model. The 4.1
weights are pre-merged, so merge_lora_adapters() is now hasattr-guarded.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Eli Schwartz <eliyahu.schwartz@ibm.com>

* feat(chart-extraction): swap V4 VLM model to granite-vision-4.1-4b

Updates ChartExtractionModelGraniteVisionV4 to use the 4.1 model.
hasattr-guards the merge_lora_adapters() call since 4.1 weights are
pre-merged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Eli Schwartz <eliyahu.schwartz@ibm.com>

* docs(example): mention granite-vision-4.1-4b in table-structure example

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Eli Schwartz <eliyahu.schwartz@ibm.com>

* docs(catalog): update Granite Vision entry to 4.1-4b

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Eli Schwartz <eliyahu.schwartz@ibm.com>

* feat(chart-extraction): honor cuda_use_flash_attention2 in V4 loader

Mirrors the table-structure loader so ChartExtractionModelGraniteVisionV4
also passes _attn_implementation based on AcceleratorOptions. Without this
the chart model falls back to the transformers SDPA default, which can
hit cuDNN backend failures on some torch/cuDNN stacks while the table
model (which already passed the flag) runs cleanly.

Stores accelerator_options on the base class so subclasses can read it.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Eli Schwartz <eliyahu.schwartz@ibm.com>

* fix(model-downloader): update Granite Vision log message to 4.1

The log message in download_models still mentioned "Granite Vision 4.0"
after the model swap. Correct it to match the current model version.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Eli Schwartz <eliyahu.schwartz@ibm.com>

* fix(chart-extraction): fall back to bare CSV when V4 model omits ```csv``` fence

granite-vision-4.1-4b sometimes emits raw CSV without a ```csv``` code fence
for the <chart2csv> prompt, which caused _extract_csv_to_dataframe to raise
ValueError and drop the chart's tabular_chart metadata. Mirror the tolerant
parsing already used by the v3 class: prefer a fenced block, otherwise strip
any stray backtick prefix/suffix and parse the text as-is.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Eli Schwartz <eliyahu.schwartz@ibm.com>

---------

Signed-off-by: Eli Schwartz <eliyahu.schwartz@ibm.com>
Co-authored-by: Eli Schwartz <eliyahu.schwartz@ibm.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 08:36:08 +02:00
EliSchwartz 1569e42f84 feat: implement GraniteVisionTableStructureModel for VLM-based table extraction (#3323)
Add a new table structure model using IBM Granite Vision to extract table
structure from document images via OTSL token generation.

Changes:
- Add `GraniteVisionTableStructureOptions` with configurable model repo,
  device, batch size, and crop padding options
- Implement `GraniteVisionTableStructureModel` that uses a VLM pipeline to
  generate OTSL tokens from cropped table images, then parses them into
  `TableData` with cells, rows, and columns
- Register the model in `table_structure_engines` alongside existing engines
- Add example script `docs/examples/granite_vision_table_structure.py`
- Add tests covering options, model enable/disable, OTSL parsing (including
  self-closing tags xcel/srow/ecel), and invalid-backend error handling
- Update model catalog docs and CI workflow accordingly

Signed-off-by: Eli Schwartz <eli.shw@gmail.com>
2026-04-17 11:02:20 +02:00
geoHeil 8ec14f2c6f docs: fix nanonets_ocr2 runtime support matrix (#3317)
Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com>
2026-04-17 06:24:53 +02:00
geoHeil 9970d1ef94 feat(vlm): add Nanonets OCR2 onboarding (#3274)
* feat(vlm): add nanonets ocr2 onboarding

Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com>

* feat(vlm): add vLLM and API runtimes for Nanonets-OCR2

Extend the Nanonets-OCR2 preset with vLLM + remote API paths so all
standard docling runtimes (Transformers, MLX, vLLM, API, LM Studio,
OpenAI-compatible) work out of the box. Drop the restricted
supported_engines set to match the GLM-OCR / LightOnOCR / Falcon-OCR
pattern, add top-level torch_dtype on the Transformers override, and
register NANONETS_OCR2_VLLM / NANONETS_OCR2_VLLM_API /
NANONETS_OCR2_LMSTUDIO_API legacy specs plus VlmModelType enum entries.

Folds in the remote-API scope that was on the superseded PR #3275.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com>

---------

Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-13 06:51:35 +02:00
Viktor Kuropiatnyk d046390bf4 feat: Switch to the latest version of DocumentFigureClassifier model v2.5 (#3171)
* Switch to the latest version of DocumentFigureClassifier model v2.5

Signed-off-by: Viktor Kuropiatnyk <vku@zurich.ibm.com>

* CI trigger

Signed-off-by: Viktor Kuropiatnyk <vku@zurich.ibm.com>

---------

Signed-off-by: Viktor Kuropiatnyk <vku@zurich.ibm.com>
2026-04-01 11:49:55 +02:00
Tejas Kumar 1321b39cd8 docs: add audio & video processing guide (#3038)
* Update docs for media

* DCO Remediation Commit for Tejas Kumar <tejas.kumar@datastax.com>

I, Tejas Kumar <tejas.kumar@datastax.com>, hereby add my Signed-off-by to this commit: 33089ccd73

Signed-off-by: Tejas Kumar <tejas.kumar@datastax.com>

---------

Signed-off-by: Tejas Kumar <tejas.kumar@datastax.com>
2026-03-01 09:00:48 +01:00
Cesar Berrospi Ramis d276e60561 feat: export to WebVTT format (#3036)
* style(cli): apply python 3.10+ syntax, remove unnecessary imports

Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com>

* feat(vtt): export of DoclingDocument to WebVTT format

Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com>

* build: pin docling-core version 2.66.0

Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com>

---------

Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com>
2026-02-27 14:22:52 +01:00
Cesar Berrospi Ramis 334ba6e51f feat: create a backend parser for XBRL instance reports (#3017)
* build(xbrl): add Arelle as open-source library for XBRL

Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com>

* feat(xbrl): design and implement a backend parser for XBRL documents

Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com>

* test: remove print statements to reduce verbosity

Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com>

* style(XBRL): apply PEP8 naming convention for acronyms

Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com>

* refactor(XBRL): set XBRL dependencies as optional

Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com>

---------

Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com>
2026-02-24 16:52:02 +01:00
Christoph Auer 03532938b5 feat: Unified model-family inference engines (including image-classification) and KServe v2 API support (#2979)
* feat: Inference engines abstraction for image classification model family with HF Transformers and ONNX runtime

Implements runtime abstraction for image classification models with support for both ONNX Runtime and HuggingFace Transformers engines. Users can switch between engines without model retraining, similar to the object detection abstraction (#2959).

Key components:
- BaseImageClassificationEngine with factory pattern
- OnnxRuntimeImageClassificationEngine and TransformersImageClassificationEngine implementations
- Shared HfVisionModelMixin for common HF model utilities
- Engine-specific configuration options
- Test suite and example demonstrating runtime engine switching

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Add missing files and re-export for backward compat

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Don't run with OCR in the example.

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Remove excess onnxruntime related options for inuts and outputs

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* feat: centralize torch compile defaults with DOCLING_INFERENCE_COMPILE_TORCH_MODELS

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* feat: Add Kserve2 API engine for image classifier and object detection models (#2999)

* fix: add failed pages to DoclingDocument for page break consistency (#2939)

* fix: add failed pages to DoclingDocument for page break consistency

When some PDF pages fail to parse, they were not added to
DoclingDocument.pages, causing page break markers to be incorrect
during export. This adds failed/skipped pages with their size info
(if available) to maintain correct page numbering and structure.

- Add _add_failed_pages_to_document() method in StandardPdfPipeline
- Add test cases for failed page handling
- Add test cases for normal page handling (regression test)
- Add test PDF files

Signed-off-by: jhchoi1182 <jhchoi1182@gmail.com>

* fix: ensure resource cleanup and simplify type hints

- Wrap page_backend usage in try-finally to guarantee unload (prevents resource leaks).
- Simplify redundant 'float | None | None' type hint.

Signed-off-by: jhchoi1182 <jhchoi1182@gmail.com>

* fix: add groundtruth for normal_4pages.pdf and exclude failing PDFs from e2e test

Signed-off-by: jhchoi1182 <jhchoi1182@gmail.com>

* fix: ensure correct status assertion for failed pages in tests

Signed-off-by: jhchoi1182 <jhchoi1182@gmail.com>

---------

Signed-off-by: jhchoi1182 <jhchoi1182@gmail.com>

* fix: Use timezone-aware datetime (#2947)

* Use timezone-aware datetime for profiling timestamps

Updated timestamp recording to use timezone-aware datetime.

Signed-off-by: Nikhil Singh <124866156+Ritinikhil@users.noreply.github.com>

* run formatter

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

---------

Signed-off-by: Nikhil Singh <124866156+Ritinikhil@users.noreply.github.com>
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
Co-authored-by: Michele Dolfi <dol@zurich.ibm.com>

* fix(asciidoc): handle commas in image alt text (#2983)

* Fix: Handle commas in AsciiDoc image alt text

  - Modified _parse_picture() to gracefully handle alt text containing commas
  - Commas in alt text are now preserved instead of causing ValueError
  - Added test case with realistic auto-generated alt text
  - split('=', 1) prevents issues when values contain '=' characters

* DCO Remediation Commit for n0rdp0l <n90.w135@gmail.com>

I, n0rdp0l <n90.w135@gmail.com>, hereby add my Signed-off-by to this commit: ee752491fc

Signed-off-by: n0rdp0l <n90.w135@gmail.com>

* style: fix ruff formatting in test_backend_asciidoc.py

Signed-off-by: n0rdp0l <n90.w135@gmail.com>

---------

Signed-off-by: n0rdp0l <n90.w135@gmail.com>
Co-authored-by: Michele Dolfi <dol@zurich.ibm.com>

* chore: bump version to 2.73.1 [skip ci]

* First attempt at establishing API Kserve2 facet

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* refactor: improve KServe v2 engine implementation after code review

- Add comprehensive error handling to KserveV2HttpClient
  - Catch and wrap Timeout, ConnectionError, HTTPError with context
  - Validate response formats with clear error messages

- Refactor URL building to eliminate duplication
  - Extract _build_model_url() helper method
  - Single source of truth for infer_url and model_metadata_url

- Make URL required parameter (remove default localhost:8000)
  - Update ApiKserveV2*EngineOptions to require explicit URL
  - Add preset validation with helpful error messages

- Rename constants for clarity: TRITON_* → KSERVE_V2_*
  - Add comment explaining KServe v2 uses Triton type system

- Improve error messages with actual values
  - Show counts, shapes, and supported types in validation errors

- Document official KServe Python SDK alternative
  - Note async-only requirement and alpha status

- Update tests for required URL parameter

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Cleanup in kserve http helper and options

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Further cleanup

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Fix for remote-services on tablemodel

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* fix: improved deserialization of engine_options (#3008)

* add registry of discriminated subclasses

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* fix detection of engine_type value

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

---------

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* Add options serialization improvements

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

---------

Signed-off-by: jhchoi1182 <jhchoi1182@gmail.com>
Signed-off-by: Nikhil Singh <124866156+Ritinikhil@users.noreply.github.com>
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
Signed-off-by: n0rdp0l <n90.w135@gmail.com>
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
Co-authored-by: jhchoi1182 <jhchoi1182@gmail.com>
Co-authored-by: Nikhil Singh <124866156+Ritinikhil@users.noreply.github.com>
Co-authored-by: Michele Dolfi <dol@zurich.ibm.com>
Co-authored-by: Felix Wente <63914035+n0rdp0l@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Michele Dolfi <97102151+dolfim-ibm@users.noreply.github.com>

* Fixes from review

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* DCO Remediation Commit for Christoph Auer <cau@zurich.ibm.com>

I, Christoph Auer <cau@zurich.ibm.com>, hereby add my Signed-off-by to this commit: 4cdb01e6d3

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* DCO Remediation Commit for Christoph Auer <60343111+cau-git@users.noreply.github.com>

I, Christoph Auer <60343111+cau-git@users.noreply.github.com>, hereby add my Signed-off-by to this commit: e293ba3270

Signed-off-by: Christoph Auer <60343111+cau-git@users.noreply.github.com>
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Add fallback for API variants

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Recreate uv.lock

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

---------

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
Signed-off-by: jhchoi1182 <jhchoi1182@gmail.com>
Signed-off-by: Nikhil Singh <124866156+Ritinikhil@users.noreply.github.com>
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
Signed-off-by: n0rdp0l <n90.w135@gmail.com>
Signed-off-by: Christoph Auer <60343111+cau-git@users.noreply.github.com>
Co-authored-by: jhchoi1182 <jhchoi1182@gmail.com>
Co-authored-by: Nikhil Singh <124866156+Ritinikhil@users.noreply.github.com>
Co-authored-by: Michele Dolfi <dol@zurich.ibm.com>
Co-authored-by: Felix Wente <63914035+n0rdp0l@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Michele Dolfi <97102151+dolfim-ibm@users.noreply.github.com>
2026-02-18 10:49:19 +01:00
Aditya Sasidhar e6ccb8b2c1 feat: added support for parsing LaTeX (.tex) documents (#2890)
* feat: added support for parsing LaTeX (.tex) documents

Signed-off-by: Aditya Sasidhar <telikicherlaadityasasidhar@gmail.com>

* feat: implement PR #2890 feedback for LaTeX backend

- Add text formatting options (bold, italic, underline) for LaTeX macros
- Enhance image embedding with PIL and ImageRef.from_pil()
- Refactor list processing to use GroupItem structure
- Refactor bibliography to use GroupItem structure
- Add nested list test coverage
- All tests passing (39/39), all linters passing

Signed-off-by: Aditya Sasidhar <telikicherlaadityasasidhar@gmail.com>

* DCO Remediation Commit for Aditya Sasidhar <telikicherlaadityasasidhar@gmail.com>

I, Aditya Sasidhar <telikicherlaadityasasidhar@gmail.com>, hereby add my Signed-off-by to this commit: f19f135b431d489cd8bf3982524505a0bbd8696d

Signed-off-by: Aditya Sasidhar <telikicherlaadityasasidhar@gmail.com>

* DCO Remediation Commit for Aditya Sasidhar <telikicherlaadityasasidhar@gmail.com>

I, Aditya Sasidhar <telikicherlaadityasasidhar@gmail.com>, hereby add my Signed-off-by to this commit: f19f135b431d489cd8bf3982524505a0bbd8696d

Signed-off-by: Aditya Sasidhar <telikicherlaadityasasidhar@gmail.com>

* feat: enhance latex backend with robustness fixes and ground truth

- Add custom macro expansion for improved text quality
- Fix preamble filtering to remove metadata garbage
- Support recursive \input{} and \include{} file loading
- Organize test data into subdirectories for complex papers
- Add full end-to-end ground truth for 4 major arXiv papers (Attention, Mistral, DeepSeek, OTSL)
- Pass all 41 unit tests and pre-commit checks

Addresses @cau-git feedback for ground-truth data.

Signed-off-by: Aditya Sasidhar <telikicherlaadityasasidhar@gmail.com>

* fix: minor formatting in test file

Signed-off-by: Aditya Sasidhar <telikicherlaadityasasidhar@gmail.com>

* feat: enhance LaTeX backend with robust math and figure support

- Fixed re.error: bad escape in macro expansion by using lambda in re.sub
- Fixed sentences breaking at inline math ($) by preserving it within paragraphs
- Improved figure environment with proper grouping and structured representation
- Fixed crashes on documents starting with % comments
- Added comprehensive unit tests and updated all ground truth data

Signed-off-by: Aditya Sasidhar <telikicherlaadityasasidhar@gmail.com>

* WIP: saving work for laptop migration

Signed-off-by: Aditya Sasidhar <telikicherlaadityasasidhar@gmail.com>

* got rid of the line breaking issues, still some do exist

Signed-off-by: Aditya Sasidhar <telikicherlaadityasasidhar@gmail.com>

* fix: generalized LaTeX macro parsing and robustness improvements

This commit addresses several issues with LaTeX parsing:
- Correctly handle unknown macros (like \ion{N}{2}) inline to avoid line breaks.
- Fix extraction of structural macros (section, caption, etc.) vs text-only groups.
- Address PR feedback regarding inline math spacing and splitting.
- Regenerate ground truth files reflecting these improvements.

Signed-off-by: Aditya Sasidhar <telikicherlaadityasasidhar@gmail.com>

* style: apply automatic formatting fixes

Signed-off-by: Aditya Sasidhar <telikicherlaadityasasidhar@gmail.com>

* style: fix ruff linter and formatter errors

Signed-off-by: Aditya Sasidhar <telikicherlaadityasasidhar@gmail.com>

* fix: typing issues identified by mypy

Signed-off-by: Aditya Sasidhar <telikicherlaadityasasidhar@gmail.com>

* style: apply formatting fixes to tests

Signed-off-by: Aditya Sasidhar <telikicherlaadityasasidhar@gmail.com>

* fix: update groundtruth files for latex backend

Signed-off-by: Aditya Sasidhar <telikicherlaadityasasidhar@gmail.com>

* fixed the ackward line breaking issue, turns out im stupid at considering text buffer

* i forgot to add the groundtruth so here it is

* DCO Remediation Commit for Aditya Sasidhar <telikicherlaadityasasidhar@gmail.com>

I, Aditya Sasidhar <telikicherlaadityasasidhar@gmail.com>, hereby add my Signed-off-by to this commit: 7e032635ef
I, Aditya Sasidhar <telikicherlaadityasasidhar@gmail.com>, hereby add my Signed-off-by to this commit: aeba688384

Signed-off-by: Aditya Sasidhar <telikicherlaadityasasidhar@gmail.com>

* Ran the precommit as requested

Signed-off-by: Aditya Sasidhar <telikicherlaadityasasidhar@gmail.com>

---------

Signed-off-by: Aditya Sasidhar <telikicherlaadityasasidhar@gmail.com>
2026-02-10 15:13:09 +01:00
Michele Dolfi d4c87133f3 feat: Introduce pluggable VLM runtime system with preset-based configuration (#2919)
* model runtime refactoring

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* add test

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* fix code formula preset

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* batch prediction

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* use presets and new vlm options in CLI

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* use new model settings by default

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* running

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* update examples

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* fixes for running examples

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* keep old stage

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* update model

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* use granite 3.3 and set options

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* revisit init logic and propagate the proper options to the runtimes

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* update all stages with original setup

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* per stage registry

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* use chat template

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* remove duplicated predict() and factor out some utils

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* working picture description examples

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* add granite docling as code formula model

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* rename code formula presets

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* fix running minimal_vlm example

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* add all models to presets and run compare_vlm

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* remove unused repo_id

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* update vlm api model example

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* fix legacy examples

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* add another legacy example

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* fix test

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* avoid automatic fallback to mlx and fix end_of_utterance in codeformula

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* move vlm_convert_model

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* use new vlm runtime class

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* flasg for CI

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* rename runtimes to explicit vlm_runtimes

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* renaming from runtime to inference engine and model families

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* fixes

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* fix test

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* add docs with stages

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* update docs catalog page

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* rename runtime to inference engine

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

---------

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
2026-02-04 17:29:17 +01:00
jspast 2b83fdd0de feat: Add XPU device support for Intel GPUs (#2809)
* feat: Add XPU device support

* feat: Add XPU as supported device on some models

* docs: Add XPU usage to examples

* DCO Remediation Commit for jspast <140563347+jspast@users.noreply.github.com>

I, jspast <140563347+jspast@users.noreply.github.com>, hereby add my Signed-off-by to this commit: f26e8b8c3a
I, jspast <140563347+jspast@users.noreply.github.com>, hereby add my Signed-off-by to this commit: a4a2bf90fa
I, jspast <140563347+jspast@users.noreply.github.com>, hereby add my Signed-off-by to this commit: a2d5dac2e1

Signed-off-by: jspast <140563347+jspast@users.noreply.github.com>

---------

Signed-off-by: jspast <140563347+jspast@users.noreply.github.com>
2026-01-05 17:35:26 +01:00
Michele Dolfi d03439ccc5 docs(gpu): Add benchmarks of standard pipeline with OCR (#2764)
* add results for standard + OCR and more Windows timings

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* fix runtime selection for py 3.14 in CI

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

---------

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
2025-12-10 20:43:20 +01:00
Michele Dolfi b75c6461f4 docs: More GPU results and improvements in the example docs (#2674)
* add more results and improve the example docs

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* 5070 windows timing

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* add reference for cpu-only

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

---------

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
2025-11-24 15:26:08 +01:00
Muhammad Ali Hasan 146b4f0535 docs: fix typo on jobkit page (#2671)
* *kubeflow

* DCO Remediation Commit for Muhammad Ali Hasan <alihxn23@gmail.com>

I, Muhammad Ali Hasan <alihxn23@gmail.com>, hereby add my Signed-off-by to this commit: a007cf07af

Signed-off-by: Muhammad Ali Hasan <alihxn23@gmail.com>

---------

Signed-off-by: Muhammad Ali Hasan <alihxn23@gmail.com>
2025-11-24 09:35:45 +01:00
Michele Dolfi 463a3fd474 fix: Enable GPU for RapidOCR when available (#2659)
* add setting for using gpu

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* add docs

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

---------

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
2025-11-19 17:12:00 +01:00
Yasir Ali 741c44fa45 docs: fix typos (#2546)
docs: fix typos in enrichments.md ('analize' -> 'analyze', 'consise' -> 'concise')

Signed-off-by: Yasir Ali <engr23002@gmail.com>
2025-10-31 10:29:34 +01:00
Michele Dolfi 97aa06bfbc docs: Add details and examples on optimal GPU setup (#2531)
* docs for GPU optimizations

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* improve time reporting and improve execution

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* fix standard pipeline

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* tune examples with batch size 64

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* add benchmark results

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* improve docs

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* typo in excluded tests

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* explicit pipeline in table

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

---------

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
2025-10-30 13:22:05 +01:00
Cesar Berrospi Ramis 9a6fdf936b docs: update opensearch notebook and backend documentation (#2519)
* docs(opensearch): update the example notebook RAG with OpenSearch

Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com>

* docs(uspto): remove direct usage of the backend class for conversion

Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com>

* docs: remove direct usage of backends from documentation

Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com>

---------

Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com>
2025-10-27 10:02:50 +01:00
McGuireMark 86556d8367 docs: fix typo in mcp.md (#2502)
Update mcp.md

Typo fix

Signed-off-by: McGuireMark <mark.mcguire@nimblegravity.com>
2025-10-21 17:31:28 +02:00
Imad Saddik 2a0f56390a docs: fixed a few typos (#2441)
Signed-off-by: Imad Saddik <79410781+ImadSaddik@users.noreply.github.com>
2025-10-13 09:04:50 +02:00
Lucas Morin e6c3b05e63 docs: Jobkit and connectors (#2357)
* feat: create documentation for docling-jobkit

Signed-off-by: Lucas Morin <lucas.morin222@gmail.com>

* small text fixes

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

---------

Signed-off-by: Lucas Morin <lucas.morin222@gmail.com>
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
Co-authored-by: Michele Dolfi <dol@zurich.ibm.com>
2025-10-02 13:46:56 +02:00
Cesar Berrospi Ramis 46efaaefee feat: add a backend parser for WebVTT files (#2288)
* feat: add a backend parser for WebVTT files

Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com>

* docs: update README with VTT support

Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com>

* docs: add description to supported formats

Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com>

* chore: upgrade docling-core to unescape WebVTT in markdown

Pin the new release of docling-core 2.48.2.
Do not escape HTML reserved characters when exporting WebVTT documents to markdown.

Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com>

* test: add missing copyright notice

Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com>

---------

Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com>
2025-09-22 15:24:34 +02:00
Christoph Auer 17afb664d0 feat: Add granite-docling model (#2272)
* adding granite-docling preview

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* updated the model specs

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* typo

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* use granite-docling and add to the model downloader

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* update docs and README

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* Update final repo_ids for GraniteDocling

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Update final repo_ids for GraniteDocling

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Fix model name in CLI usage example

Signed-off-by: Christoph Auer <60343111+cau-git@users.noreply.github.com>

* Fix VLM model name in README.md

Signed-off-by: Christoph Auer <60343111+cau-git@users.noreply.github.com>

---------

Signed-off-by: Peter Staar <taa@zurich.ibm.com>
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
Signed-off-by: Christoph Auer <60343111+cau-git@users.noreply.github.com>
Co-authored-by: Peter Staar <taa@zurich.ibm.com>
Co-authored-by: Michele Dolfi <dol@zurich.ibm.com>
2025-09-17 15:15:49 +02:00
Roy Derks e5cd7020bd docs: Add instructions for using Docling with MCP to README (#2219)
* docs: Add instructions for using Docling with MCP to README

* DCO Remediation Commit for Roy Derks <10717410+royderks@users.noreply.github.com>

Signed-off-by: Roy Derks <roy.derks@ibm.com>

* DCO Remediation Commit for Roy Derks <10717410+royderks@users.noreply.github.com>

I, Roy Derks <10717410+royderks@users.noreply.github.com>, hereby add my Signed-off-by to this commit: 4b9ba1d0ef

Signed-off-by: Roy Derks <roy.derks@ibm.com>

* docs: reorganize documentation on MCP server

Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com>

* docs: align README with documentation index page

Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com>

---------

Signed-off-by: Roy Derks <roy.derks@ibm.com>
Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com>
Co-authored-by: Roy Derks <roy.derks@ibm.com>
Co-authored-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com>
2025-09-10 10:02:28 +02:00
VIktor Kuropiantnyk cdf079dd06 feat(CLI): Option to download arbitrary HuggingFace model (#2123)
* Added option to docling-tools to download arbitrary HuggingFace model

Signed-off-by: Viktor Kuropiatnyk <vku@zurich.ibm.com>

* Added note in documentation

Signed-off-by: Viktor Kuropiatnyk <vku@zurich.ibm.com>

* Removed note on custom artifact path usage from HF download option

Signed-off-by: Viktor Kuropiatnyk <vku@zurich.ibm.com>

* Fixed typo

Signed-off-by: Viktor Kuropiatnyk <vku@zurich.ibm.com>

---------

Signed-off-by: Viktor Kuropiatnyk <vku@zurich.ibm.com>
2025-08-22 15:23:29 +02:00
Panos Vagenas 8996d612aa docs: add Getting Started page (#2113)
* docs: add Getting Started page

Signed-off-by: Panos Vagenas <pva@zurich.ibm.com>

* refactor usage

Signed-off-by: Panos Vagenas <pva@zurich.ibm.com>

* minor renaming

Signed-off-by: Panos Vagenas <pva@zurich.ibm.com>

---------

Signed-off-by: Panos Vagenas <pva@zurich.ibm.com>
2025-08-21 08:44:53 +02:00
stephencox-ict d6d2dbe2f9 docs: Fix typos (#1943)
Fix typos

Signed-off-by: stephencox-ict <scox@ict.co>
2025-07-15 09:51:56 +02:00
Peter W. J. Staar cfdf4cea25 feat: new vlm-models support (#1570)
* feat: adding new vlm-models support

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* fixed the transformers

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* got microsoft/Phi-4-multimodal-instruct to work

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* working on vlm's

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* refactoring the VLM part

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* all working, now serious refacgtoring necessary

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* refactoring the download_model

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* added the formulate_prompt

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* pixtral 12b runs via MLX and native transformers

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* added the VlmPredictionToken

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* refactoring minimal_vlm_pipeline

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* fixed the MyPy

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* added pipeline_model_specializations file

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* need to get Phi4 working again ...

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* finalising last points for vlms support

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* fixed the pipeline for Phi4

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* streamlining all code

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* reformatted the code

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* fixing the tests

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* added the html backend to the VLM pipeline

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* fixed the static load_from_doctags

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* restore stable imports

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* use AutoModelForVision2Seq for Pixtral and review example (including rename)

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* remove unused value

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* refactor instances of VLM models

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* skip compare example in CI

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* use lowercase and uppercase only

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* add new minimal_vlm example and refactor pipeline_options_vlm_model for cleaner import

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* rename pipeline_vlm_model_spec

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* move more argument to options and simplify model init

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* add supported_devices

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* remove not-needed function

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* exclude minimal_vlm

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* missing file

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* add message for transformers version

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* rename to specs

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* use module import and remove MLX from non-darwin

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* remove hf_vlm_model and add extra_generation_args

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* use single HF VLM model class

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* remove torch type

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* add docs for vision models

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

---------

Signed-off-by: Peter Staar <taa@zurich.ibm.com>
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
Co-authored-by: Michele Dolfi <dol@zurich.ibm.com>
2025-06-02 17:01:06 +02:00
Elwin 12dab0a1e8 feat: support image/webp file type (#1415)
* support image/webp file type

Signed-off-by: Elwin <61868295+hzhaoy@users.noreply.github.com>
Signed-off-by: Elwin <hzywong@gmail.com>

* docs: add webp image format in supported_formats.md

Signed-off-by: Elwin <61868295+hzhaoy@users.noreply.github.com>
Signed-off-by: Elwin <hzywong@gmail.com>

* test: add a test case for `image/webp` file

Signed-off-by: Elwin <hzywong@gmail.com>

* style: apply styling

Signed-off-by: Elwin <hzywong@gmail.com>

* test: update test case of converting `image/webp` file with more ocr engines

Signed-off-by: Elwin <hzywong@gmail.com>

* style: apply styling

Signed-off-by: Elwin <hzywong@gmail.com>

* rename test file

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

---------

Signed-off-by: Elwin <61868295+hzhaoy@users.noreply.github.com>
Signed-off-by: Elwin <hzywong@gmail.com>
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
Co-authored-by: Michele Dolfi <dol@zurich.ibm.com>
2025-05-14 09:47:28 +02:00
Emmanuel Ferdman 3afbe6c969 docs: update supported formats guide (#1463)
Signed-off-by: Emmanuel Ferdman <emmanuelferdman@gmail.com>
2025-04-28 08:51:54 +02:00
Maxim Lysak 1c26769785 feat(SmolDocling): Support MLX acceleration in VLM pipeline (#1199)
* Initial implementation to support MLX for VLM pipeline and SmolDocling

Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>

* mlx_model unit

Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>

* Add CLI choices for VLM pipeline and model

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Initial implementation to support MLX for VLM pipeline and SmolDocling

Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>

* mlx_model unit

Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>

* Add CLI choices for VLM pipeline and model

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Updated minimal vlm pipeline example

Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>

* make vlm_pipeline python3.9 compatible

Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>

* Fixed extract_text_from_backend definition

Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>

* Updated README

Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>

* Updated example

Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>

* Updated documentation

Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>

* corrections in the documentation

Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>

* Consmetic changes

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

---------

Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
Co-authored-by: Maksym Lysak <mly@zurich.ibm.com>
Co-authored-by: Christoph Auer <cau@zurich.ibm.com>
2025-03-19 15:38:54 +01:00
serced 7e01798417 docs: fix spelling of picture in usage (#1165)
Signed-off-by: serced <52759935+serced@users.noreply.github.com>
2025-03-17 09:33:51 +01:00
Christoph Auer eb97357b05 feat: Use new TableFormer model weights and default to accurate model version (#1100)
* feat: New tableformer model weights [WIP]

Signed-off-by: Christoph Auer <60343111+cau-git@users.noreply.github.com>

* Updated TF version

Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>

* Updated tests, after merging with Main, Switched to Accurate TF model by default

Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>

---------

Signed-off-by: Christoph Auer <60343111+cau-git@users.noreply.github.com>
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
Co-authored-by: Maksym Lysak <mly@zurich.ibm.com>
2025-03-11 10:53:49 +01:00
Michele Dolfi e1c49ad727 docs: add description of DOCLING_ARTIFACTS_PATH env var (#1124)
add env var in docs

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
2025-03-06 07:30:07 +01:00
Michele Dolfi 357d41cc47 docs: Enrichment models (#1097)
* warning for develop examples

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* add docs for enrichment models

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* minor reorg of top-level docs (#1098)

* minor reorg of top-level docs

Signed-off-by: Panos Vagenas <pva@zurich.ibm.com>

* fix typo [no ci]

Signed-off-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com>

---------

Signed-off-by: Panos Vagenas <pva@zurich.ibm.com>
Signed-off-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com>

* trigger ci

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

---------

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
Signed-off-by: Panos Vagenas <pva@zurich.ibm.com>
Signed-off-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com>
Co-authored-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com>
2025-03-04 14:24:38 +01:00