* docs: add agent skill bundle with convert/evaluate helpers
- Add docs/examples/agent_skill/docling-document-intelligence/ with
SKILL.md, pipelines.md, EXAMPLE.md, improvement-log template, and
scripts/docling-convert.py + docling-evaluate.py (standard/vlm-local/vlm-api).
- Document InputFormat.PDF + PdfFormatOption for explicit PdfPipelineOptions.
- Link from examples index and mkdocs nav.
Made-with: Cursor
* docs: align agent skill README and EXAMPLE with Cursor bundle
- Document both ~/.cursor/skills and docs/examples paths.
- README notes repo parity for PRs and local installs.
Made-with: Cursor
* DCO Remediation Commit for jehlum11 <jehlum11@gmail.com>
I, jehlum11 <jehlum11@gmail.com>, hereby add my Signed-off-by to this commit: 2d268ffb6f
I, jehlum11 <jehlum11@gmail.com>, hereby add my Signed-off-by to this commit: 041e709c66
Signed-off-by: jehlum11 <jehlum11@gmail.com>
Made-with: Cursor
* docs: refactor agent skill to use docling CLI for conversion
Address maintainer feedback: the custom docling-convert.py script was
largely redundant with the existing docling CLI. This commit:
- Removes scripts/docling-convert.py (redundant with `docling` CLI)
- Refactors SKILL.md (v1.4 → v2.0) to use `docling` CLI for all
conversion tasks, reserving the Python API only for features the
CLI does not expose (chunking, VLM API endpoint config,
force_backend_text hybrid mode)
- Updates docling-evaluate.py recommended_actions to reference
`docling` CLI flags instead of the removed script
- Updates README.md, EXAMPLE.md, pipelines.md to use `docling` CLI
examples throughout
- Simplifies requirements.txt (removes packaging dependency)
The only custom script retained is docling-evaluate.py, which provides
heuristic quality evaluation — functionality the CLI does not cover.
Signed-off-by: jehlum11 <jehlum11@gmail.com>
Made-with: Cursor
* docs: fix ruff format on docling-evaluate.py
Signed-off-by: jehlum11 <jehlum11@gmail.com>
Made-with: Cursor
---------
Signed-off-by: jehlum11 <jehlum11@gmail.com>
* docs: add docstrings for DocumentConverter
Signed-off-by: Julia Pap <papjuli@gmail.com>
* Apply suggestions from code review
Improve docstrings in DocumentConverter
Co-authored-by: Cesar Berrospi Ramis <75900930+ceberam@users.noreply.github.com>
Signed-off-by: Julia Pap <papjuli@gmail.com>
* docs: improve docstring formatting and wording in DocumentConverter
* docs: show init method in document converter reference
* docs: change back indents to 4x in DocumentConverter docstrings
griffe was issuing warnings of confusing indentation
* docs: clarify `max_num_pages` and `page_range` args in `DocumentConverter` methods
* docs: fix some Yields and Returns in DocumentConverter docstrings
* DCO Remediation Commit for Julia Pap <papjuli@gmail.com>
I, Julia Pap <papjuli@gmail.com>, hereby add my Signed-off-by to this commit: cf2ea4e0f0
I, Julia Pap <papjuli@gmail.com>, hereby add my Signed-off-by to this commit: 57446af168
I, Julia Pap <papjuli@gmail.com>, hereby add my Signed-off-by to this commit: 5d613edb8c
I, Julia Pap <papjuli@gmail.com>, hereby add my Signed-off-by to this commit: b195281f56
I, Julia Pap <papjuli@gmail.com>, hereby add my Signed-off-by to this commit: 5d4a3af5d5
Signed-off-by: Julia Pap <papjuli@gmail.com>
* docs: ignore init description, rephrased docstrings
Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com>
---------
Signed-off-by: Julia Pap <papjuli@gmail.com>
Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com>
Co-authored-by: Cesar Berrospi Ramis <75900930+ceberam@users.noreply.github.com>
Co-authored-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com>
* add example processing parquet file of images
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* vlm using vllm api
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* use openvino and add more docs
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* add default input file
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* change default to standard for running in CI
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* use simple rapidocr without openvino in the CI example
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
---------
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* docs: add an example of RAG with OpeanSearch
Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com>
* chore: pin latest docling-core and update uv.lock
Pin latest version release of docling-core in pyproject.toml
Update the dependencies in uv.lock file
Run the notebook rag_opensearch.ipynb to pick up changes from docling-core
Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com>
---------
Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com>
* Notebook showing example on how to use docling transforms in DPK
Signed-off-by: Maroun Touma <touma@us.ibm.com>
* fix HF Token name
Signed-off-by: Maroun Touma <touma@us.ibm.com>
* use %pip instead of pip install jupyter lab
Signed-off-by: Maroun Touma <touma@us.ibm.com>
* run formatter
Signed-off-by: Maroun Touma <touma@us.ibm.com>
* add example to mkdocs and fix typo
Signed-off-by: Maroun Touma <touma@us.ibm.com>
---------
Signed-off-by: Maroun Touma <touma@us.ibm.com>
* updated the README
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* added minimal_asr_pipeline
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* Updated README and added ASR example
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* Updated docs.index.md
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* updated CI and mkdocs
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* added link tp existing audio file
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* added link tp existing audio file
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* reformatting
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
---------
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* feat: adding new vlm-models support
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* fixed the transformers
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* got microsoft/Phi-4-multimodal-instruct to work
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* working on vlm's
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* refactoring the VLM part
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* all working, now serious refacgtoring necessary
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* refactoring the download_model
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* added the formulate_prompt
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* pixtral 12b runs via MLX and native transformers
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* added the VlmPredictionToken
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* refactoring minimal_vlm_pipeline
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* fixed the MyPy
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* added pipeline_model_specializations file
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* need to get Phi4 working again ...
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* finalising last points for vlms support
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* fixed the pipeline for Phi4
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* streamlining all code
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* reformatted the code
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* fixing the tests
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* added the html backend to the VLM pipeline
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* fixed the static load_from_doctags
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* restore stable imports
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* use AutoModelForVision2Seq for Pixtral and review example (including rename)
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* remove unused value
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* refactor instances of VLM models
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* skip compare example in CI
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* use lowercase and uppercase only
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* add new minimal_vlm example and refactor pipeline_options_vlm_model for cleaner import
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* rename pipeline_vlm_model_spec
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* move more argument to options and simplify model init
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* add supported_devices
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* remove not-needed function
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* exclude minimal_vlm
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* missing file
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* add message for transformers version
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* rename to specs
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* use module import and remove MLX from non-darwin
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* remove hf_vlm_model and add extra_generation_args
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* use single HF VLM model class
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* remove torch type
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* add docs for vision models
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
---------
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
Co-authored-by: Michele Dolfi <dol@zurich.ibm.com>
* build: Add ollama sdk dependency
Branch: OllamaVlmModel
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
* feat: Add option plumbing for OllamaVlmOptions in pipeline_options
Branch: OllamaVlmModel
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
* feat: Full implementation of OllamaVlmModel
Branch: OllamaVlmModel
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
* feat: Connect "granite_vision_ollama" pipeline option to CLI
Branch: OllamaVlmModel
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
* Revert "build: Add ollama sdk dependency"
After consideration, we're going to use the generic OpenAI API instead
of the Ollama-specific API to avoid duplicate work.
This reverts commit bc6b366468cdd66b52540aac9c7d8b584ab48ad0.
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
* refactor: Move OpenAI API call logic into utils.utils
This will allow reuse of this logic in a generic VLM model
NOTE: There is a subtle change here in the ordering of the text prompt and
the image in the call to the OpenAI API. When run against Ollama, this
ordering makes a big difference. If the prompt comes before the image, the
result is terse and not usable whereas the prompt coming after the image
works as expected and matches the non-OpenAI chat API.
Branch: OllamaVlmModel
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
* refactor: Refactor from Ollama SDK to generic OpenAI API
Branch: OllamaVlmModel
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
* fix: Linting, formatting, and bug fixes
The one bug fix was in the timeout arg to openai_image_request. Otherwise,
this is all style changes to get MyPy and black passing cleanly.
Branch: OllamaVlmModel
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
* remove model from download enum
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* generalize input args for other API providers
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* rename and refactor
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* add example
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* require flag for remote services
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* disable example from CI
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* add examples to docs
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
---------
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
Co-authored-by: Michele Dolfi <dol@zurich.ibm.com>