docling

mirror of https://github.com/docling-project/docling.git synced 2026-05-17 13:10:38 +00:00

Author	SHA1	Message	Date
Maksym Lysak	38354b7d13	Added support of "row_section" semantics of HTML_backend. Improvements on complex rendering example. Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>	2026-05-12 17:08:27 +02:00
geoHeil	5b1df788ef	ci: tighten pre-commit guardrails (#3346 ) * ci: tighten pre-commit guardrails Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com> * ci: validate pre-commit guardrail changes Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com> * ci: switch hook validation to prek Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com> * ci: exempt active slim plan from max-lines Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com> * ci: move max-lines config under github Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com> * ci: fail on uncovered tach modules Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com> * ci: ignore generated docs in max-lines check Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com> * ci: clarify local validation tasks Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com> * docs: refine agent instructions Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com> * ci: replace mypy with ty (cherry picked from commit 382afbde8f00abfaeba95ea9c8e9cc603f27a2d9) Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com> * ci: replace justfile with makefile Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com> --------- Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com>	2026-05-08 15:07:11 +02:00
Michele Dolfi	24af7f6249	docs(security): Add GitHub Private Vulnerability Reporting (#3416 ) docs: update security processes Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>	2026-05-08 10:00:29 +02:00
geoHeil	a4d6683d98	ci: run heavy examples only manually (#3392 ) * ci: run heavy examples only manually Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com> * ci: keep heavy examples label trigger Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com> --------- Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com>	2026-05-06 10:53:22 +02:00
geoHeil	885873ea36	ci: avoid mutable PR merge refs in fast checks (#3397 ) * ci: build stable PR fast-check merge tree Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com> * test: skip PR fast-check tree test on Windows Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com> --------- Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com>	2026-05-06 10:33:16 +02:00
geoHeil	fdca54caf7	ci: clarify Codecov coverage reporting (#3389 ) Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com>	2026-05-06 10:00:00 +02:00
geoHeil	eb4724ee4c	ci: prototype tach-based modular skipping (#3333 ) * ci: prototype tach-based modular skipping Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com> * ci: modularize ubuntu setup and refine gating Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com> * ci: adopt metaxy-inspired governance helpers - replace custom aggregate check with re-actors/alls-green - set FORCE_JAVASCRIPT_ACTIONS_TO_NODE24 on every workflow - keep PR concurrency alive when the graphite:merge label is present Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com> * ci: tune checks and pin action versions Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com> * ci: split CI suites and heavy examples Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com> * DCO Remediation Commit for Georg Heiler <georg.kf.heiler@gmail.com> I, Georg Heiler <georg.kf.heiler@gmail.com>, hereby add my Signed-off-by to this commit: ecaa4777886157d5c2a7b3893c3a820983089dbf I, Georg Heiler <georg.kf.heiler@gmail.com>, hereby add my Signed-off-by to this commit: d15416f3ca94ac97af2a8317cd6404208db9d896 Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com> * ci: sharpen tach graph and per-suite path filters - Split docling.pipeline into per-pipeline tach modules (asr, vlm, standard_pdf, threaded_standard_pdf, legacy_standard_pdf, extraction_vlm, base, base_extraction, simple) so pytest --tach-base impact analysis can attribute changes to a specific pipeline rather than the whole package. - Split the asr- and vlm-specific docling.datamodel option files (asr_model_specs, pipeline_options_asr_model, vlm_engine_options, vlm_model_specs, pipeline_options_vlm_model, layout_model_specs, stage_model_specs, backend_options) into their own tach modules so a narrow spec/options change no longer marks the full datamodel as impacted. - Narrow the per-suite pipeline path filters in checks.yml to the concrete pipeline files relevant to each suite, so editing vlm_pipeline.py only triggers the vlm matrix cell and editing asr_pipeline.py only the asr one. - Rekey the model cache in setup-ubuntu-ci to include runner.os and hashFiles(uv.lock, pyproject.toml), with ordered restore-keys fallbacks so a lockfile bump no longer silently stales the cache. Metaxy parity note: layered tach enforcement (layer = "...") is blocked by existing backend<->datamodel and utils<->stages cycles; depot runners, nox dynamic matrices, devenv/nix, dprint and ty are not applicable to docling's stack. All pinned action SHAs are on their latest release as of this commit. Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com> * ci: introduce pipeline and orchestration tach layers Earlier notes claimed layers were blocked. That was only true for the cyclic core (backend<->datamodel, utils<->stages). The boundary above core is clean: - No module under docling/backend, docling/datamodel, docling/models, docling/utils, docling/exceptions, or docling/chunking imports anything from docling.pipeline (verified by grep). - No module anywhere in docling/ imports from docling.cli, docling.document_converter, docling.document_extractor, or docling.service_client (also verified). So we can introduce two real layers on top of the cyclic core: - "pipeline" — docling.pipeline and all nine concrete pipelines (base, simple, base_extraction, asr, vlm, extraction_vlm, standard_pdf, threaded_standard_pdf, legacy_standard_pdf). - "orchestration" — docling.cli, docling.document_converter, docling.document_extractor, and docling.experimental.pipeline. Unlayered modules stay "below" both layers (tach allows them to be depended on freely) and continue to carry the declared-but-cyclic backend<->datamodel and utils<->stages edges. A VLM-only layer was explored but rejected: only docling.pipeline.vlm_pipeline and docling.pipeline.extraction_vlm_pipeline could be cleanly layered as "vlm", because the matching datamodel options (pipeline_options_vlm_model, vlm_engine_options, vlm_model_specs) and model stages (vlm_convert, vlm_pipeline_models) sit inside the datamodel/models cycle and cannot be promoted to a higher layer without first breaking that cycle. Layering only the two pipeline files is not worth the extra config. Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com> * ci: expand tach layers to entrypoints/pipeline/models/core Follow-up to the two-layer attempt. After verifying via grep that nothing in datamodel/utils/backend imports from docling.models.{extraction,factories,plugins,vlm_pipeline_models} or from the "upper" stages (page_assemble, page_preprocessing, reading_order, picture_description, vlm_convert), those nine modules can be promoted out of the cyclic core into a dedicated "models" layer. The resulting order (highest first): - entrypoints — cli, document_converter, document_extractor, experimental.pipeline - pipeline — docling.pipeline + the nine concrete pipelines - models — model factories, extraction, plugins, vlm_pipeline_models, and the five "upper" stages - core — datamodel, backend, utils, exceptions, chunking, models (base), models.utils, inference_engines., the six "core stages" that utils cycles with (chart_extraction, code_formula, layout, ocr, picture_classifier, table_structure), and the experimental. and service_client modules Rename the previous "orchestration" layer to "entrypoints" to match the common docling vocabulary. Every module now carries an explicit layer tag instead of relying on implicit unlayered behaviour, so future additions must pick a layer deliberately. A VLM layer, a stand-alone inference-engines layer, and separating datamodel from backend all remain blocked by the bidirectional backend<->datamodel and utils<->core-stages edges; those need a code-level refactor first. Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com> * ci: refine tach client and foundation layers Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com> * ci: add optional windows and macos smoke lanes Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com> * ci: normalize reusable workflow boolean inputs Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com> * ci: replace external all-green action Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com> * ci: use org-allowed setup-uv action Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com> * ci: install compiler toolchain for ML tests Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com> * DCO Remediation Commit for Georg Heiler <georg.kf.heiler@gmail.com> I, Georg Heiler <georg.kf.heiler@gmail.com>, hereby add my Signed-off-by to this commit: bb714afb42cd1b29ab073a7f59cc72874ff2fdcd I, Georg Heiler <georg.kf.heiler@gmail.com>, hereby add my Signed-off-by to this commit: a1f2761da8f72bfed636bd571ebf77b42c8771b6 Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com> * DCO Remediation Commit for Georg Heiler <georg.kf.heiler@gmail.com> I, Georg Heiler <georg.kf.heiler@gmail.com>, hereby add my Signed-off-by to this commit: cc6551b54c5bf4815ae9cd57cf43a98928a74be0 I, Georg Heiler <georg.kf.heiler@gmail.com>, hereby add my Signed-off-by to this commit: b21b0e7ca12b552dbdd54fac1bda113719c286f1 Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com> * ci: simplify ML pytest suite patterns Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com> * ci: gate heavy examples on label, add job timeouts - ci-heavy-examples: run only on main push, schedule, workflow_dispatch, or when a PR is labeled tests:full / tests:heavy-examples. Drops the path-based auto-trigger so that common edits to pyproject.toml, uv.lock, or .github/actions do not kick off the 45-60min matrix on every PR push. Collapses the changes job into a job-level if gate and adds timeout-minutes: 90. - checks.yml: add timeout-minutes to every job so stuck runners cannot burn the full 6h default. Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com> * ci: tolerate cancelled allowed-skip jobs in check aggregator Intentional cancellations (manual cancel, concurrency replacement) on jobs that are already in ALLOWED_SKIPS should not mark the overall workflow red. Treat `cancelled` the same as `skipped` when the job is listed as an allowed skip; any unexpected cancellation of a required job still fails. Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com> * docs: make minimal vlm example portable Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com> * DCO Remediation Commit for Georg Heiler <georg.kf.heiler@gmail.com> I, Georg Heiler <georg.kf.heiler@gmail.com>, hereby add my Signed-off-by to this commit: 2135051da3ed73d4b8a9130f584f40b56155af1a I, Georg Heiler <georg.kf.heiler@gmail.com>, hereby add my Signed-off-by to this commit: 4f6d1d7960f7418d0cde6425ae61538da84fda40 Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com> * ci: install workspace packages in CI syncs Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com> * DCO Remediation Commit for Georg Heiler <georg.kf.heiler@gmail.com> I, Georg Heiler <georg.kf.heiler@gmail.com>, hereby add my Signed-off-by to this commit: 492fa9883d4de6d98ebcb40fa863eafe2facff3c I, Georg Heiler <georg.kf.heiler@gmail.com>, hereby add my Signed-off-by to this commit: 3eefae71643f9ca3df0264690c0c6eb1f67f06f1 Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com> * DCO Remediation Commit for Georg Heiler <georg.kf.heiler@gmail.com> I, Georg Heiler <georg.kf.heiler@gmail.com>, hereby add my Signed-off-by to this commit: fe8c9689a0ee94f36eb826da8e2177ef87404f5e I, Georg Heiler <georg.kf.heiler@gmail.com>, hereby add my Signed-off-by to this commit: eabdd24a6734ec873cdaac857718aef2473677e7 Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com> * ci: remove unused graphite concurrency exception Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com> * ci: document test labels and gate cross-platform lanes Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com> * ci: select ml tests with pytest markers Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com> * ci: fix marker selector typing Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com> * ci: simplify ml suite scheduling Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com> * ci: mark cross-platform smoke tests Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com> * ci: reuse test trigger for ml matrix Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com> * ci: tighten full ci aggregation Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com> * ci: share required job result check Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com> --------- Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-30 14:15:35 +02:00
geoHeil	05e0a4daa4	ci: add stable required status checks (#3387 ) * ci: add stable required status checks Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com> * ci: simplify status sentinel checks Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com> --------- Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com>	2026-04-30 12:13:41 +02:00
geoHeil	0c85938e12	ci: diff PR fast checks against merge ref (#3383 ) Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com>	2026-04-29 15:17:51 +02:00
geoHeil	41e9fa7886	ci: implement phase 1 path-based workflow skipping (#3332 ) * ci: add phase 1 path-based workflow skipping Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com> * ci: add fast pull_request_target lint checks Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com> * ci: keep pr fast checks cheap Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com> * ci: expand full matrix triggers Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com> * ci: enable same-repo and merge queue checks Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com> * ci: harden pull_request_target fetch inputs Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com> * ci: address phase 1 workflow review Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com> * ci: grant reusable checks permissions Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com> * ci: temporarily enable pr fast checks validation Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com> * ci: allow first run of pr fast checks Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com> * ci: load pr fast check script for first validation Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com> * ci: format pr fast check script Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com> * ci: guard temporary pr fast check script fallback Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com> * ci: use pr metadata for temporary fast check validation Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com> * ci: remove temporary pr fast checks trigger Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com> * ci: disable duplicate pull request runs Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com> * ci: run fast pr checks without path trigger filter Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com> * ci: add job timeouts in checks.yml Cap every job so a stuck runner cannot burn the 6h default. Limits: changes=5, lint=20, run-tests-1/2=45, run-examples=60, test-pip-install-=30, build/test-package=15. Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com> ci: restore pull request workflow triggers Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com> * ci: run lint on pull requests Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com> --------- Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com>	2026-04-29 10:55:27 +02:00
Michele Dolfi	2be2c38be9	chore: refactor release script to use Python regex for dependency updates (#3379 ) * chore: fix and make release script more generic Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * re-enable git operations Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> --------- Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>	2026-04-29 08:28:28 +02:00
Michele Dolfi	ed32c5e993	feat: Introduce modular docling-slim package (#3285 ) * plans folder structure Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * initial plan Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * updated plan Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * restructure repo for docling and docling-slim Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * transpose package structures Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * add all-packages Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * updated lock and deps Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * align deps Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * more lock like main Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * more locked pinning Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * rename extras Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * add simple README for docling-slim Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * fix scikit-image issue Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * add readme placeholder Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * add all extras in package test Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * cli in docling-slim Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * apply formatting Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * fix testing package Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * override grpcio in no-header test Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * update lock Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * update package description Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * updated extras Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * fix publish scripts Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * update package test Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> --------- Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>	2026-04-24 15:14:57 +02:00
Michele Dolfi	d6e0f881bf	chore: breaking release guards (#3347 ) Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>	2026-04-22 12:28:35 +02:00
EliSchwartz	1569e42f84	feat: implement GraniteVisionTableStructureModel for VLM-based table extraction (#3323 ) Add a new table structure model using IBM Granite Vision to extract table structure from document images via OTSL token generation. Changes: - Add `GraniteVisionTableStructureOptions` with configurable model repo, device, batch size, and crop padding options - Implement `GraniteVisionTableStructureModel` that uses a VLM pipeline to generate OTSL tokens from cropped table images, then parses them into `TableData` with cells, rows, and columns - Register the model in `table_structure_engines` alongside existing engines - Add example script `docs/examples/granite_vision_table_structure.py` - Add tests covering options, model enable/disable, OTSL parsing (including self-closing tags xcel/srow/ecel), and invalid-backend error handling - Update model catalog docs and CI workflow accordingly Signed-off-by: Eli Schwartz <eli.shw@gmail.com>	2026-04-17 11:02:20 +02:00
Christoph Auer	02837e7ffd	chore: Skip chart_extraction in CI (#3268 ) Skip chart_extraction in CI Signed-off-by: Christoph Auer <cau@zurich.ibm.com>	2026-04-11 06:03:45 +02:00
mergify[bot]	e8fb5eaf63	ci(mergify): upgrade configuration to current format (#3228 ) Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>	2026-04-06 09:13:20 +02:00
Maxim Lysak	1c74a9b9c7	feat: Implementation of HTML backend with headless browser (#2969 ) - Implementation of HTML backend that (optionally) uses headless browser (via Playwright) to materialize HTML pages into images, and add provenances with bboxes to all elements in the converted docling document. - Conversion preserves reading order given by HTML DOM tree - Added support for HTML "input" fields: checkboxes, radiobuttons, text inputs, etc. - Added support to Key-Value convention in HTML (i.e. elements with id "key1" and "key1_value1" will be paired as key-values, see test cases as examples) - Heuristic that glues independent inline HTML elements with single-character text in them into larger text blocks - Support for inline styling (bold, italic, etc.) Signed-off-by: Maksym Lysak <mly@zurich.ibm.com> Co-authored-by: Maksym Lysak <mly@zurich.ibm.com>	2026-03-24 14:28:57 +01:00
Cesar Berrospi Ramis	1eb5c21dab	docs: add XBRL conversion example notebook and update feature listings (#3039 ) docs(xbrl): add notebook for XBRL parsing Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com>	2026-02-27 16:09:19 +01:00
Pádraic Slattery	6f1f2a9ffb	chore: Update outdated GitHub Actions versions (#2918 ) Signed-off-by: Padraic Slattery <pgoslatara@gmail.com>	2026-02-18 08:03:24 +01:00
Michele Dolfi	f739c1e81f	chore: Add CI tests for pip install without lock file (#2909 ) * Add CI tests for pip install without lock file Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * Refactor CI tests to run sequentially instead of parallel matrix Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * Add PyTorch CPU extra-index-url and remove --only-binary flag Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * remove python 3.9 Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * use torch-backend option in uv Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * add check that compilation from sources would fail Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * remove dev headers explicitly Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * fix for system python like python 3.12 Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> --------- Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>	2026-01-26 17:00:36 +01:00
Michele Dolfi	7f386587ed	feat: Drop support for Python 3.9 (#2905 ) * chore: drop support for Python 3.9 Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * disable CI for python 3.9 Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * fix: test bump version Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * add chore to the changelog but without bumping the version Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * force newer langchain-core Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * fix linter for 3.10 Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * Add python 3.9 removal notice Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * avoid upgrading docling-core Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * restore semantic release settings Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> --------- Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> Signed-off-by: Christoph Auer <cau@zurich.ibm.com> Co-authored-by: Christoph Auer <cau@zurich.ibm.com>	2026-01-23 10:15:58 +01:00
Maxim Lysak	fa21128138	docs: Example on how to apply external OCR as post processing (#2517 ) * Example on how to apply to Docling Document OCR as a post-processing with "nanonets-ocr2-3b" via LM Studio Signed-off-by: Maksym Lysak <mly@zurich.ibm.com> * Added support of elements with multiple provenances Signed-off-by: Maksym Lysak <mly@zurich.ibm.com> * cleaning up Signed-off-by: Maksym Lysak <mly@zurich.ibm.com> * improved prompt for nanonets-ocr2-3b Signed-off-by: Maksym Lysak <mly@zurich.ibm.com> * cleaning up Signed-off-by: Maksym Lysak <mly@zurich.ibm.com> * excluded example from CI Signed-off-by: Maksym Lysak <mly@zurich.ibm.com> * updated class name Signed-off-by: Maksym Lysak <mly@zurich.ibm.com> * Improved usability of the example, added simple cli, and some helper functions Signed-off-by: Maksym Lysak <mly@zurich.ibm.com> * Fix api_image_request usage Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * Fix pydantic errors Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * Improvements and corrections Signed-off-by: Maksym Lysak <mly@zurich.ibm.com> * Added string sanitation, removing break lines from remote OCR, also preserving original text from json Signed-off-by: Maksym Lysak <mly@zurich.ibm.com> * Added quick and reliable detection of empty image crops (elements, table cells, form items), these are not sent to OCR Signed-off-by: Maksym Lysak <mly@zurich.ibm.com> * Example respects ocr_documents.txt, tuned empty crop detection Signed-off-by: Maksym Lysak <mly@zurich.ibm.com> * cleaning api_image_request Signed-off-by: Maksym Lysak <mly@zurich.ibm.com> --------- Signed-off-by: Maksym Lysak <mly@zurich.ibm.com> Signed-off-by: Christoph Auer <cau@zurich.ibm.com> Co-authored-by: Maksym Lysak <mly@zurich.ibm.com> Co-authored-by: Christoph Auer <cau@zurich.ibm.com>	2025-11-27 11:04:40 +01:00
Harry Ho	b216ad848d	docs: Added documentation to use SuryaOCR via plugin docling-surya (#2533 ) * docs: Added documentation to use SuryaOCR via plugin `docling-surya` Signed-off-by: Harry Ho <kho7@student.umgc.edu> * Add PyPI link for docling-surya package Added a link to the PyPI page for docling-surya. Signed-off-by: Harry Ho <4719770+harrykhh@users.noreply.github.com> * Add licensing note for SuryaOCR integration Added important licensing note regarding SuryaOCR integration. Signed-off-by: Harry Ho <4719770+harrykhh@users.noreply.github.com> * Ran linter to reformat Signed-off-by: Harry Ho <4719770+harrykhh@users.noreply.github.com> --------- Signed-off-by: Harry Ho <kho7@student.umgc.edu> Signed-off-by: Harry Ho <4719770+harrykhh@users.noreply.github.com> Co-authored-by: Harry Ho <kho7@student.umgc.edu>	2025-11-19 15:27:24 +01:00
Christoph Auer	4852d8b4f2	feat(experimental): Layout + VLM model with layout prompt (#2244 ) * adding granite-docling preview Signed-off-by: Peter Staar <taa@zurich.ibm.com> * updated the model specs Signed-off-by: Peter Staar <taa@zurich.ibm.com> * Add Layout+VLM pipeline with prompt injection, ApiVlmModel updates Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * Update layout injection, move to experimental Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * Adjust defaults Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * Map Layout+VLM pipeline to GraniteDoclign Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * Remove base_prompt from layout injection prompt Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * Reinstate custom prompt Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * add demo_layout file that produces with vs without layout injection Signed-off-by: Peter El Hachem <peter.el.hachem@ibm.com> Signed-off-by: ElHachem02 <peterelhachem02@gmail.com> * feat: wrap vlm_inference around process_images Signed-off-by: ElHachem02 <peterelhachem02@gmail.com> * feat: carry input prompt + number of input tokens Signed-off-by: ElHachem02 <peterelhachem02@gmail.com> * fix: adapt example to run on local test file Signed-off-by: ElHachem02 <peterelhachem02@gmail.com> * fix: example now expects single document Signed-off-by: ElHachem02 <peterelhachem02@gmail.com> * feat: add layout example to EXAMPLES_TO_SKIP Signed-off-by: ElHachem02 <peterelhachem02@gmail.com> * feat: address comments on git Signed-off-by: ElHachem02 <peterelhachem02@gmail.com> * feat: add inference wrapper for hf_transformers + carry input prompt Signed-off-by: ElHachem02 <peterelhachem02@gmail.com> * Feat: add track_input_prompt to ApiVlmOptions, and track input prompt as part of api vlm Signed-off-by: ElHachem02 <peterelhachem02@gmail.com> * fix: Ensure backward-compatible build_prompt by adding _internal_page ag Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * fix: Ensure backward-compatible build_prompt by adding _internal_page ag Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * Fixes for demo Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * Typing fixes Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * Restoring lost changes in vllm_model Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * Restoring vlm_pipeline_api_model example Signed-off-by: Christoph Auer <cau@zurich.ibm.com> --------- Signed-off-by: Peter Staar <taa@zurich.ibm.com> Signed-off-by: Christoph Auer <cau@zurich.ibm.com> Signed-off-by: Peter El Hachem <peter.el.hachem@ibm.com> Signed-off-by: ElHachem02 <peterelhachem02@gmail.com> Co-authored-by: Peter Staar <taa@zurich.ibm.com> Co-authored-by: ElHachem02 <peterelhachem02@gmail.com>	2025-11-12 13:42:09 +01:00
Michele Dolfi	97aa06bfbc	docs: Add details and examples on optimal GPU setup (#2531 ) * docs for GPU optimizations Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * improve time reporting and improve execution Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * fix standard pipeline Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * tune examples with batch size 64 Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * add benchmark results Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * improve docs Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * typo in excluded tests Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * explicit pipeline in table Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> --------- Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>	2025-10-30 13:22:05 +01:00
Michele Dolfi	cdffb47b9a	feat: Support for Python 3.14 (#2530 ) * fix dependencies for py314 Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * add metadata and CI tests Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * add back gliner Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * update error message about python 3.14 availability Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * skip tests which cannot run on py 3.14 Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * fix lint Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * remove vllm from py 3.14 deps Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * safe import for vllm Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * update lock Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * remove torch.compile() Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * update checkbox results after docling-core changes Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * cannot run mlx example in CI Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * add test for rapidocr backends and skip onnxruntime on py3.14 Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * fix other occurances of torch.compile() Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * allow torch.compile for Python <3.14. proper support will be introduced with new torch releases Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> --------- Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>	2025-10-28 14:32:15 +01:00
Ken Steele	657ce8b01c	feat(ASR): MLX Whisper Support for Apple Silicon (#2366 ) * add mlx-whisper support * added mlx-whisper example and test. update docling cli to use MLX automatically if present. * fix pre-commit checks and added proper type safety * fixed linter issue * DCO Remediation Commit for Ken Steele <ksteele@gmail.com> I, Ken Steele <ksteele@gmail.com>, hereby add my Signed-off-by to this commit: a979a680e1dc2fee8461401335cfb5dda8cfdd98 I, Ken Steele <ksteele@gmail.com>, hereby add my Signed-off-by to this commit: 9827068382ca946fe1387ed83f747ae509fcf229 I, Ken Steele <ksteele@gmail.com>, hereby add my Signed-off-by to this commit: ebbeb45c7dc266260e1fad6bdb54a7041f8aeed4 I, Ken Steele <ksteele@gmail.com>, hereby add my Signed-off-by to this commit: 2f6fd3cf46c8ca0bb98810191578278f1df87aa3 Signed-off-by: Ken Steele <ksteele@gmail.com> * fix unit tests and code coverage for CI * DCO Remediation Commit for Ken Steele <ksteele@gmail.com> I, Ken Steele <ksteele@gmail.com>, hereby add my Signed-off-by to this commit: 5e61bf11139a2133978db2c8d306be6289aed732 Signed-off-by: Ken Steele <ksteele@gmail.com> * fix CI example test - mlx_whisper_example.py defaults to tests/data/audio/sample_10s.mp3 if no args specified. Signed-off-by: Ken Steele <ksteele@gmail.com> * refactor: centralize audio file extensions and MIME types in base_models.py - Move audio file extensions from CLI hardcoded set to FormatToExtensions[InputFormat.AUDIO] - Add support for additional audio formats: m4a, aac, ogg, flac, mp4, avi, mov - Update FormatToMimeType mapping to include MIME types for all audio formats - Update CLI auto-detection to use centralized FormatToExtensions mapping - Add comprehensive tests for audio file auto-detection and pipeline selection - Ensure explicit pipeline choices are not overridden by auto-detection Fixes issue where only .mp3 and .wav files were processed as audio despite CLI auto-detection working for all formats. The document converter now properly recognizes all audio formats through MIME type detection. Addresses review comments: - Centralizes audio extensions in base_models.py as suggested - Maintains existing auto-detection behavior while using centralized data - Adds proper test coverage for the audio detection functionality All examples and tests pass with the new centralized approach. All audio formats (mp3, wav, m4a, aac, ogg, flac, mp4, avi, mov) now work correctly. Signed-off-by: Ken Steele <ksteele@gmail.com> * feat: address reviewer feedback - improve CLI auto-detection and add explicit model options Review feedback addressed: 1. Fix CLI auto-detection to only switch to ASR pipeline when ALL files are audio - Previously switched if ANY file was audio, now requires ALL files to be audio - Added warning for mixed file types with guidance to use --pipeline asr 2. Add explicit WHISPER_X_MLX and WHISPER_X_NATIVE model options - Users can now force specific implementations if desired - Auto-selecting models (WHISPER_BASE, etc.) still choose best for hardware - Added 12 new explicit model options: _MLX and _NATIVE variants for each size CLI now supports: - Auto-selecting: whisper_tiny, whisper_base, etc. (choose best for hardware) - Explicit MLX: whisper_tiny_mlx, whisper_base_mlx, etc. (force MLX) - Explicit Native: whisper_tiny_native, whisper_base_native, etc. (force native) Addresses reviewer comments from @dolfim-ibm Signed-off-by: Ken Steele <ksteele@gmail.com> * DCO Remediation Commit for Ken Steele <ksteele@gmail.com> I, Ken Steele <ksteele@gmail.com>, hereby add my Signed-off-by to this commit: `c60e72d2b5` I, Ken Steele <ksteele@gmail.com>, hereby add my Signed-off-by to this commit: `94803317a3` I, Ken Steele <ksteele@gmail.com>, hereby add my Signed-off-by to this commit: `21905e8ace` I, Ken Steele <ksteele@gmail.com>, hereby add my Signed-off-by to this commit: `96c669d155` I, Ken Steele <ksteele@gmail.com>, hereby add my Signed-off-by to this commit: `8371c060ea` Signed-off-by: Ken Steele <ksteele@gmail.com> * test(asr): add coverage for MLX options, pipeline helpers, and VLM prompts - tests/test_asr_mlx_whisper.py: verify explicit MLX options (framework, repo ids) - tests/test_asr_pipeline.py: cover _has_text/_determine_status and backend support with proper InputDocument/NoOpBackend wiring - tests/test_interfaces.py: add BaseVlmPageModel.formulate_prompt tests (RAW/NONE/CHAT, invalid style), with minimal InlineVlmOptions scaffold Improves reliability of ASR and VLM components by validating configuration paths and helper logic. Signed-off-by: Ken Steele <ksteele@gmail.com> * test(asr): broaden coverage for model selection, pipeline flows, and VLM prompts - tests/test_asr_mlx_whisper.py - Add MLX/native selector coverage across all Whisper sizes - Validate repo_id choices under MLX and Native paths - Cover fallback path when MPS unavailable and mlx_whisper missing - tests/test_asr_pipeline.py - Relax silent-audio assertion to accept PARTIAL_SUCCESS or SUCCESS - Force CPU native path in helper tests to avoid torch in device selection - Add language handling tests for native/MLX transcribe - Cover native run success (BytesIO) and failure (exception) branches - Cover MLX run success/failure branches with mocked transcribe - Add init path coverage with artifacts_path - tests/test_interfaces.py - Add focused VLM prompt tests (NONE/CHAT variants) Result: all tests passing with significantly improved coverage for ASR model selectors, pipeline execution paths, and VLM prompt formulation. Signed-off-by: Ken Steele <ksteele@gmail.com> * simplify ASR model settings (no pipeline detection needed) Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * clean up disk space in runners Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> --------- Signed-off-by: Ken Steele <ksteele@gmail.com> Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> Co-authored-by: Michele Dolfi <dol@zurich.ibm.com>	2025-10-21 08:05:59 +02:00
Michele Dolfi	a5af082d82	chore: fix parsing of release body message (#2498 ) Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>	2025-10-20 13:41:35 +02:00
Michele Dolfi	5be856fbc0	chore: add action posting to discord (#2486 ) * add action posting to discord Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * test on push Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * with icon Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * remove testing Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> --------- Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>	2025-10-17 16:31:57 +02:00
Rafael Teixeira de Lima	16829939cf	feat(docx): Process drawingml objects in docx (#2453 ) * Export of DrawingML figures into docling document * Adding libreoffice env var and libreoffice to checks image Signed-off-by: Rafael Teixeira de Lima <Rafael.td.lima@gmail.com> * DCO Remediation Commit for Rafael Teixeira de Lima <Rafael.td.lima@gmail.com> I, Rafael Teixeira de Lima <Rafael.td.lima@gmail.com>, hereby add my Signed-off-by to this commit: `9518fffcad` Signed-off-by: Rafael Teixeira de Lima <Rafael.td.lima@gmail.com> * Enforcing apt get update Signed-off-by: Rafael Teixeira de Lima <Rafael.td.lima@gmail.com> * Only display drawingml warning once per document Signed-off-by: Rafael Teixeira de Lima <Rafael.td.lima@gmail.com> * add util to test libreoffice and exclude files from test when not found Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * check libreoffice only once Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * Only initialise converter if needed Signed-off-by: Rafael Teixeira de Lima <Rafael.td.lima@gmail.com> --------- Signed-off-by: Rafael Teixeira de Lima <Rafael.td.lima@gmail.com> Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> Co-authored-by: Michele Dolfi <dol@zurich.ibm.com>	2025-10-15 10:58:08 +02:00
Rui Dias Gomes	68230fe7e5	ci: split workflow to speedup CI runtime (#2313 ) * split workflow Signed-off-by: rmdg88 <rmdg88@gmail.com> * split workflow Signed-off-by: rmdg88 <rmdg88@gmail.com> * enable test_e2e_pdfs_conversions Co-authored-by: Michele Dolfi <97102151+dolfim-ibm@users.noreply.github.com> Signed-off-by: Rui Dias Gomes <66125272+rmdg88@users.noreply.github.com> * split workflow Signed-off-by: rmdg88 <rmdg88@gmail.com> * split workflow Signed-off-by: rmdg88 <rmdg88@gmail.com> * split workflow Signed-off-by: rmdg88 <rmdg88@gmail.com> * split workflow Signed-off-by: rmdg88 <rmdg88@gmail.com> * split workflow Signed-off-by: rmdg88 <rmdg88@gmail.com> * fix conflict files Signed-off-by: rmdg88 <rmdg88@gmail.com> --------- Signed-off-by: rmdg88 <rmdg88@gmail.com> Signed-off-by: Rui Dias Gomes <66125272+rmdg88@users.noreply.github.com> Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> Co-authored-by: Michele Dolfi <97102151+dolfim-ibm@users.noreply.github.com> Co-authored-by: Michele Dolfi <dol@zurich.ibm.com>	2025-10-03 11:16:38 +02:00
Christoph Auer	1e9dc43b72	feat: Repetition-based StoppingCriteria for GraniteDocling (#2323 ) * Experimental code for repetition detection, VLLM Streaming Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * Update VLLM Streaming Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * Update VLLM inference code, CLI and VLM specs Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * Fix generation and decoder args for HF model Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * Fix vllm device args Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * Cleanup Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * Bugfixes Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * Remove streaming VLLM for the moment Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * Add repetition StoppingCriteria for GraniteDocling/SmolDocling Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * Make GenerationStopper base class and port for MLX Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * Add streaming support and custom GenerationStopper support for ApiVlmModel Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * Fixes for ApiVlmModel Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * Fixes for ApiVlmModel Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * Fix api_image_request_streaming when GenerationStopper triggers. Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * Move DocTagsRepetitionStopper to utility unit, update examples Signed-off-by: Christoph Auer <cau@zurich.ibm.com> --------- Signed-off-by: Christoph Auer <cau@zurich.ibm.com>	2025-09-30 15:26:09 +02:00
Michele Dolfi	d32d2c97e1	chore: PR approval reminder (#2132 ) PR approval reminder Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>	2025-08-25 15:08:37 +02:00
Peter W. J. Staar	f3ae3029b8	docs: update readme and add ASR example (#1836 ) * updated the README Signed-off-by: Peter Staar <taa@zurich.ibm.com> * added minimal_asr_pipeline Signed-off-by: Peter Staar <taa@zurich.ibm.com> * Updated README and added ASR example Signed-off-by: Peter Staar <taa@zurich.ibm.com> * Updated docs.index.md Signed-off-by: Peter Staar <taa@zurich.ibm.com> * updated CI and mkdocs Signed-off-by: Peter Staar <taa@zurich.ibm.com> * added link tp existing audio file Signed-off-by: Peter Staar <taa@zurich.ibm.com> * added link tp existing audio file Signed-off-by: Peter Staar <taa@zurich.ibm.com> * reformatting Signed-off-by: Peter Staar <taa@zurich.ibm.com> --------- Signed-off-by: Peter Staar <taa@zurich.ibm.com>	2025-06-23 18:55:16 +02:00
Peter W. J. Staar	1557e7ce3e	feat: Support audio input (#1763 ) * scaffolding in place Signed-off-by: Peter Staar <taa@zurich.ibm.com> * doing scaffolding for audio pipeline Signed-off-by: Peter Staar <taa@zurich.ibm.com> * WIP: got first transcription working Signed-off-by: Peter Staar <taa@zurich.ibm.com> * all working, time to start cleaning up Signed-off-by: Peter Staar <taa@zurich.ibm.com> * first working ASR pipeline Signed-off-by: Peter Staar <taa@zurich.ibm.com> * added openai-whisper as a first transcription model Signed-off-by: Peter Staar <taa@zurich.ibm.com> * updating with asr_options Signed-off-by: Peter Staar <taa@zurich.ibm.com> * finalised the first working ASR pipeline with Whisper Signed-off-by: Peter Staar <taa@zurich.ibm.com> * use whisper from the latest git commit Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * Update docling/datamodel/pipeline_options.py Co-authored-by: Michele Dolfi <97102151+dolfim-ibm@users.noreply.github.com> Signed-off-by: Peter W. J. Staar <91719829+PeterStaar-IBM@users.noreply.github.com> * Update docling/datamodel/pipeline_options.py Co-authored-by: Michele Dolfi <97102151+dolfim-ibm@users.noreply.github.com> Signed-off-by: Peter W. J. Staar <91719829+PeterStaar-IBM@users.noreply.github.com> * updated comment Signed-off-by: Peter Staar <taa@zurich.ibm.com> * AudioBackend -> DummyBackend Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * file rename Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * Rename to NoOpBackend, add test for ASR pipeline Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * Support every format in NoOpBackend Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * Add missing audio file and test Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * Install ffmpeg system dependency for ASR test Signed-off-by: Christoph Auer <cau@zurich.ibm.com> --------- Signed-off-by: Peter Staar <taa@zurich.ibm.com> Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> Signed-off-by: Peter W. J. Staar <91719829+PeterStaar-IBM@users.noreply.github.com> Signed-off-by: Christoph Auer <cau@zurich.ibm.com> Co-authored-by: Michele Dolfi <dol@zurich.ibm.com> Co-authored-by: Michele Dolfi <97102151+dolfim-ibm@users.noreply.github.com> Co-authored-by: Christoph Auer <cau@zurich.ibm.com>	2025-06-23 14:47:26 +02:00
Michele Dolfi	c2ef69718a	chore: dco advisor (#1795 ) Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>	2025-06-17 09:45:56 +02:00
Michele Dolfi	cdd401847a	feat: simplify dependencies, switch to uv (#1700 ) * refactor with uv Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * constraints for onnxruntime Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * more constraints Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> --------- Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>	2025-06-03 15:18:54 +02:00
Peter W. J. Staar	cfdf4cea25	feat: new vlm-models support (#1570 ) * feat: adding new vlm-models support Signed-off-by: Peter Staar <taa@zurich.ibm.com> * fixed the transformers Signed-off-by: Peter Staar <taa@zurich.ibm.com> * got microsoft/Phi-4-multimodal-instruct to work Signed-off-by: Peter Staar <taa@zurich.ibm.com> * working on vlm's Signed-off-by: Peter Staar <taa@zurich.ibm.com> * refactoring the VLM part Signed-off-by: Peter Staar <taa@zurich.ibm.com> * all working, now serious refacgtoring necessary Signed-off-by: Peter Staar <taa@zurich.ibm.com> * refactoring the download_model Signed-off-by: Peter Staar <taa@zurich.ibm.com> * added the formulate_prompt Signed-off-by: Peter Staar <taa@zurich.ibm.com> * pixtral 12b runs via MLX and native transformers Signed-off-by: Peter Staar <taa@zurich.ibm.com> * added the VlmPredictionToken Signed-off-by: Peter Staar <taa@zurich.ibm.com> * refactoring minimal_vlm_pipeline Signed-off-by: Peter Staar <taa@zurich.ibm.com> * fixed the MyPy Signed-off-by: Peter Staar <taa@zurich.ibm.com> * added pipeline_model_specializations file Signed-off-by: Peter Staar <taa@zurich.ibm.com> * need to get Phi4 working again ... Signed-off-by: Peter Staar <taa@zurich.ibm.com> * finalising last points for vlms support Signed-off-by: Peter Staar <taa@zurich.ibm.com> * fixed the pipeline for Phi4 Signed-off-by: Peter Staar <taa@zurich.ibm.com> * streamlining all code Signed-off-by: Peter Staar <taa@zurich.ibm.com> * reformatted the code Signed-off-by: Peter Staar <taa@zurich.ibm.com> * fixing the tests Signed-off-by: Peter Staar <taa@zurich.ibm.com> * added the html backend to the VLM pipeline Signed-off-by: Peter Staar <taa@zurich.ibm.com> * fixed the static load_from_doctags Signed-off-by: Peter Staar <taa@zurich.ibm.com> * restore stable imports Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * use AutoModelForVision2Seq for Pixtral and review example (including rename) Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * remove unused value Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * refactor instances of VLM models Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * skip compare example in CI Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * use lowercase and uppercase only Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * add new minimal_vlm example and refactor pipeline_options_vlm_model for cleaner import Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * rename pipeline_vlm_model_spec Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * move more argument to options and simplify model init Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * add supported_devices Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * remove not-needed function Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * exclude minimal_vlm Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * missing file Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * add message for transformers version Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * rename to specs Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * use module import and remove MLX from non-darwin Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * remove hf_vlm_model and add extra_generation_args Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * use single HF VLM model class Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * remove torch type Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * add docs for vision models Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> --------- Signed-off-by: Peter Staar <taa@zurich.ibm.com> Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> Co-authored-by: Michele Dolfi <dol@zurich.ibm.com>	2025-06-02 17:01:06 +02:00
Cesar Berrospi Ramis	fa7fc9e63d	fix(codecov): fix codecov argument and yaml file (#1399 ) * fix(codecov): fix codecov argument and yaml file Signed-off-by: Cesar Berrospi Ramis <75900930+ceberam@users.noreply.github.com> * ci: set the codecov status to success even if the CI fails Signed-off-by: Cesar Berrospi Ramis <75900930+ceberam@users.noreply.github.com> --------- Signed-off-by: Cesar Berrospi Ramis <75900930+ceberam@users.noreply.github.com>	2025-04-15 18:12:57 +02:00
Michele Dolfi	06227e9970	ci: sign pypi packages (#1392 ) Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>	2025-04-15 08:59:16 +02:00
Michele Dolfi	5458a88464	ci: add coverage and ruff (#1383 ) * add coverage calculation and push Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * new codecov version and usage of token Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * enable ruff formatter instead of black and isort Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * apply ruff lint fixes Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * apply ruff unsafe fixes Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * add removed imports Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * runs 1 on linter issues Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * finalize linter fixes Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * Update pyproject.toml Co-authored-by: Cesar Berrospi Ramis <75900930+ceberam@users.noreply.github.com> Signed-off-by: Michele Dolfi <97102151+dolfim-ibm@users.noreply.github.com> --------- Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> Signed-off-by: Michele Dolfi <97102151+dolfim-ibm@users.noreply.github.com> Co-authored-by: Cesar Berrospi Ramis <75900930+ceberam@users.noreply.github.com>	2025-04-14 18:01:26 +02:00
Michele Dolfi	293c28ca7c	docs(security): more statements about secure development (#1381 ) docs: more statement about secure development Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>	2025-04-14 13:53:26 +02:00
Gabe Goodhart	c605edd8e9	feat: OllamaVlmModel for Granite Vision 3.2 (#1337 ) * build: Add ollama sdk dependency Branch: OllamaVlmModel Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> * feat: Add option plumbing for OllamaVlmOptions in pipeline_options Branch: OllamaVlmModel Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> * feat: Full implementation of OllamaVlmModel Branch: OllamaVlmModel Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> * feat: Connect "granite_vision_ollama" pipeline option to CLI Branch: OllamaVlmModel Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> * Revert "build: Add ollama sdk dependency" After consideration, we're going to use the generic OpenAI API instead of the Ollama-specific API to avoid duplicate work. This reverts commit bc6b366468cdd66b52540aac9c7d8b584ab48ad0. Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> * refactor: Move OpenAI API call logic into utils.utils This will allow reuse of this logic in a generic VLM model NOTE: There is a subtle change here in the ordering of the text prompt and the image in the call to the OpenAI API. When run against Ollama, this ordering makes a big difference. If the prompt comes before the image, the result is terse and not usable whereas the prompt coming after the image works as expected and matches the non-OpenAI chat API. Branch: OllamaVlmModel Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> * refactor: Refactor from Ollama SDK to generic OpenAI API Branch: OllamaVlmModel Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> * fix: Linting, formatting, and bug fixes The one bug fix was in the timeout arg to openai_image_request. Otherwise, this is all style changes to get MyPy and black passing cleanly. Branch: OllamaVlmModel Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> * remove model from download enum Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * generalize input args for other API providers Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * rename and refactor Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * add example Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * require flag for remote services Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * disable example from CI Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * add examples to docs Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> --------- Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> Co-authored-by: Michele Dolfi <dol@zurich.ibm.com>	2025-04-10 18:03:04 +02:00
Michele Dolfi	fa16b12316	chore: move to docling-project org (#1160 ) * chore: rename org Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * Update docs/faq/index.md Co-authored-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com> Signed-off-by: Michele Dolfi <97102151+dolfim-ibm@users.noreply.github.com> * update github pages Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * revert test content Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> --------- Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> Signed-off-by: Michele Dolfi <97102151+dolfim-ibm@users.noreply.github.com> Co-authored-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com>	2025-03-14 12:35:29 +01:00
Michele Dolfi	0c1e9391de	chore: use gh cache for huggingface models (#1096 ) * use gh cache for huggingface models Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * increase hf timeout Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * more timeout Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * use different cache key in each job Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> --------- Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>	2025-03-03 00:13:47 +01:00
Christoph Auer	3c9fe76b70	feat: [Experimental] Introduce VLM pipeline using HF AutoModelForVision2Seq, featuring SmolDocling model (#1054 ) * Skeleton for SmolDocling model and VLM Pipeline Signed-off-by: Christoph Auer <cau@zurich.ibm.com> Signed-off-by: Maksym Lysak <mly@zurich.ibm.com> * wip smolDocling inference and vlm pipeline Signed-off-by: Maksym Lysak <mly@zurich.ibm.com> * WIP, first working code for inference of SmolDocling, and vlm pipeline assembly code, example included. Signed-off-by: Maksym Lysak <mly@zurich.ibm.com> * Fixes to preserve page image and demo export to html Signed-off-by: Maksym Lysak <mly@zurich.ibm.com> * Enabled figure support in vlm_pipeline Signed-off-by: Maksym Lysak <mly@zurich.ibm.com> * Fix for table span compute in vlm_pipeline Signed-off-by: Maksym Lysak <mly@zurich.ibm.com> * Properly propagating image data per page, together with predicted tags in VLM pipeline. This enables correct figure extraction and page numbers in provenances Signed-off-by: Maksym Lysak <mly@zurich.ibm.com> * Cleaned up logs, added pages to vlm_pipeline, basic timing per page measurement in smol_docling models Signed-off-by: Maksym Lysak <mly@zurich.ibm.com> * Replaced hardcoded otsl tokens with the ones from docling-core tokens.py enum Signed-off-by: Maksym Lysak <mly@zurich.ibm.com> * Added tokens/sec measurement, improved example Signed-off-by: Maksym Lysak <mly@zurich.ibm.com> * Added capability for vlm_pipeline to grab text from preconfigured backend Signed-off-by: Maksym Lysak <mly@zurich.ibm.com> * Exposed "force_backend_text" as pipeline parameter Signed-off-by: Maksym Lysak <mly@zurich.ibm.com> * Flipped keep_backend to True for vlm_pipeline assembly to work Signed-off-by: Maksym Lysak <mly@zurich.ibm.com> * Updated vlm pipeline assembly and smol docling model code to support updated doctags Signed-off-by: Maksym Lysak <mly@zurich.ibm.com> * Fixing doctags starting tag, that broke elements on first line during assembly Signed-off-by: Maksym Lysak <mly@zurich.ibm.com> * Introduced SmolDoclingOptions to configure model parameters (such as query and artifacts path) via client code, see example in minimal_smol_docling. Provisioning for other potential vlm all-in-one models. Signed-off-by: Maksym Lysak <mly@zurich.ibm.com> * Moved artifacts_path for SmolDocling into vlm_options instead of global pipeline option Signed-off-by: Maksym Lysak <mly@zurich.ibm.com> * New assembly code for latest model revision, updated prompt and parsing of doctags, updated logging Signed-off-by: Maksym Lysak <mly@zurich.ibm.com> * Updated example of Smol Docling usage Signed-off-by: Maksym Lysak <mly@zurich.ibm.com> * Added captions for the images for SmolDocling assembly code, improved provenance definition for all elements Signed-off-by: Maksym Lysak <mly@zurich.ibm.com> * Update minimal smoldocling example Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * Fix repo id Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * Cleaned up unnecessary logging Signed-off-by: Maksym Lysak <mly@zurich.ibm.com> * More elegant solution in removing the input prompt Signed-off-by: Maksym Lysak <mly@zurich.ibm.com> * removed minimal_smol_docling example from CI checks Signed-off-by: Maksym Lysak <mly@zurich.ibm.com> * Removed special html code wrapping when exporting to docling document, cleaned up comments Signed-off-by: Maksym Lysak <mly@zurich.ibm.com> * Addressing PR comments, added enabled property to SmolDocling, and related VLM pipeline option, few other minor things Signed-off-by: Maksym Lysak <mly@zurich.ibm.com> * Moved keep_backend = True to vlm pipeline Signed-off-by: Maksym Lysak <mly@zurich.ibm.com> * removed pipeline_options.generate_table_images from vlm_pipeline (deprecated in the pipelines) Signed-off-by: Maksym Lysak <mly@zurich.ibm.com> * Added example on how to get original predicted doctags in minimal_smol_docling Signed-off-by: Maksym Lysak <mly@zurich.ibm.com> * removing changes from base_pipeline Signed-off-by: Maksym Lysak <mly@zurich.ibm.com> * Replaced remaining strings to appropriate enums Signed-off-by: Maksym Lysak <mly@zurich.ibm.com> * Updated poetry.lock Signed-off-by: Maksym Lysak <mly@zurich.ibm.com> * re-built poetry.lock Signed-off-by: Maksym Lysak <mly@zurich.ibm.com> * Generalize and refactor VLM pipeline and models Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * Rename example Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * Move imports Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * Expose control over using flash_attention_2 Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * Fix VLM example exclusion in CI Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * Add back device_map and accelerate Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * Make drawing code resilient against bad bboxes Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * chore: clean up code and comments Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * chore: more cleanup Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * chore: fix leftover .to(device) Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * fix: add proper table provenance Signed-off-by: Christoph Auer <cau@zurich.ibm.com> --------- Signed-off-by: Christoph Auer <cau@zurich.ibm.com> Signed-off-by: Maksym Lysak <mly@zurich.ibm.com> Co-authored-by: Maksym Lysak <mly@zurich.ibm.com>	2025-02-26 14:43:26 +01:00
Michele Dolfi	4cc6e3ea5e	feat: Describe pictures using vision models (#259 ) * draft for picture description models Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * vlm description using AutoModelForVision2Seq Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * add generation options Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * update vlm API Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * allow only localhost traffic Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * rename model Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * do not run with vlm api Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * more renaming Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * fix examples path Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * apply CLI download login Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * fix name of cli argument Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * use with_smolvlm in models download Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> --------- Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>	2025-02-07 16:30:42 +01:00
Michele Dolfi	ed74fe2ec0	feat: new artifacts path and CLI utility (#876 ) * fix artifacts path Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * add docling-models utility Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * missing formatting Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * rename utility to docling-tools Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * rename download methods and deprecation warnings Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * propagate artifacts path usage for ocr models Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * move function to utils Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * remove unused file Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * update docs Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * simplify downloading specific model(s) Signed-off-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com> * minor refactor Signed-off-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com> --------- Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> Signed-off-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com> Co-authored-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com>	2025-02-06 15:46:32 +01:00
Panos Vagenas	17448163e7	chore: fix docs search (#880 ) Signed-off-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com>	2025-02-04 11:35:34 +01:00
Nikos Livathinos	6d3fea0196	docs: Introduce example with custom models for RapidOCR (#874 ) * docs: Introduce example with custom models for RapidOCR Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> * chore: Exclude the example with custom RapidOCR models from the examples to run in github actions Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> --------- Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>	2025-02-04 10:07:00 +01:00

1 2

75 Commits