* feat(table-structure): swap VLM model to granite-vision-4.1-4b
Updates GraniteVisionTableStructureModel to use the 4.1 model. The 4.1
weights are pre-merged, so merge_lora_adapters() is now hasattr-guarded.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Eli Schwartz <eliyahu.schwartz@ibm.com>
* feat(chart-extraction): swap V4 VLM model to granite-vision-4.1-4b
Updates ChartExtractionModelGraniteVisionV4 to use the 4.1 model.
hasattr-guards the merge_lora_adapters() call since 4.1 weights are
pre-merged.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Eli Schwartz <eliyahu.schwartz@ibm.com>
* docs(example): mention granite-vision-4.1-4b in table-structure example
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Eli Schwartz <eliyahu.schwartz@ibm.com>
* docs(catalog): update Granite Vision entry to 4.1-4b
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Eli Schwartz <eliyahu.schwartz@ibm.com>
* feat(chart-extraction): honor cuda_use_flash_attention2 in V4 loader
Mirrors the table-structure loader so ChartExtractionModelGraniteVisionV4
also passes _attn_implementation based on AcceleratorOptions. Without this
the chart model falls back to the transformers SDPA default, which can
hit cuDNN backend failures on some torch/cuDNN stacks while the table
model (which already passed the flag) runs cleanly.
Stores accelerator_options on the base class so subclasses can read it.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Eli Schwartz <eliyahu.schwartz@ibm.com>
* fix(model-downloader): update Granite Vision log message to 4.1
The log message in download_models still mentioned "Granite Vision 4.0"
after the model swap. Correct it to match the current model version.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Eli Schwartz <eliyahu.schwartz@ibm.com>
* fix(chart-extraction): fall back to bare CSV when V4 model omits ```csv``` fence
granite-vision-4.1-4b sometimes emits raw CSV without a ```csv``` code fence
for the <chart2csv> prompt, which caused _extract_csv_to_dataframe to raise
ValueError and drop the chart's tabular_chart metadata. Mirror the tolerant
parsing already used by the v3 class: prefer a fenced block, otherwise strip
any stray backtick prefix/suffix and parse the text as-is.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Eli Schwartz <eliyahu.schwartz@ibm.com>
---------
Signed-off-by: Eli Schwartz <eliyahu.schwartz@ibm.com>
Co-authored-by: Eli Schwartz <eliyahu.schwartz@ibm.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add a new table structure model using IBM Granite Vision to extract table
structure from document images via OTSL token generation.
Changes:
- Add `GraniteVisionTableStructureOptions` with configurable model repo,
device, batch size, and crop padding options
- Implement `GraniteVisionTableStructureModel` that uses a VLM pipeline to
generate OTSL tokens from cropped table images, then parses them into
`TableData` with cells, rows, and columns
- Register the model in `table_structure_engines` alongside existing engines
- Add example script `docs/examples/granite_vision_table_structure.py`
- Add tests covering options, model enable/disable, OTSL parsing (including
self-closing tags xcel/srow/ecel), and invalid-backend error handling
- Update model catalog docs and CI workflow accordingly
Signed-off-by: Eli Schwartz <eli.shw@gmail.com>
* feat(vlm): add nanonets ocr2 onboarding
Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com>
* feat(vlm): add vLLM and API runtimes for Nanonets-OCR2
Extend the Nanonets-OCR2 preset with vLLM + remote API paths so all
standard docling runtimes (Transformers, MLX, vLLM, API, LM Studio,
OpenAI-compatible) work out of the box. Drop the restricted
supported_engines set to match the GLM-OCR / LightOnOCR / Falcon-OCR
pattern, add top-level torch_dtype on the Transformers override, and
register NANONETS_OCR2_VLLM / NANONETS_OCR2_VLLM_API /
NANONETS_OCR2_LMSTUDIO_API legacy specs plus VlmModelType enum entries.
Folds in the remote-API scope that was on the superseded PR #3275.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com>
---------
Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* Switch to the latest version of DocumentFigureClassifier model v2.5
Signed-off-by: Viktor Kuropiatnyk <vku@zurich.ibm.com>
* CI trigger
Signed-off-by: Viktor Kuropiatnyk <vku@zurich.ibm.com>
---------
Signed-off-by: Viktor Kuropiatnyk <vku@zurich.ibm.com>
* build(xbrl): add Arelle as open-source library for XBRL
Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com>
* feat(xbrl): design and implement a backend parser for XBRL documents
Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com>
* test: remove print statements to reduce verbosity
Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com>
* style(XBRL): apply PEP8 naming convention for acronyms
Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com>
* refactor(XBRL): set XBRL dependencies as optional
Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com>
---------
Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com>
* feat: Inference engines abstraction for image classification model family with HF Transformers and ONNX runtime
Implements runtime abstraction for image classification models with support for both ONNX Runtime and HuggingFace Transformers engines. Users can switch between engines without model retraining, similar to the object detection abstraction (#2959).
Key components:
- BaseImageClassificationEngine with factory pattern
- OnnxRuntimeImageClassificationEngine and TransformersImageClassificationEngine implementations
- Shared HfVisionModelMixin for common HF model utilities
- Engine-specific configuration options
- Test suite and example demonstrating runtime engine switching
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Add missing files and re-export for backward compat
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Don't run with OCR in the example.
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Remove excess onnxruntime related options for inuts and outputs
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* feat: centralize torch compile defaults with DOCLING_INFERENCE_COMPILE_TORCH_MODELS
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* feat: Add Kserve2 API engine for image classifier and object detection models (#2999)
* fix: add failed pages to DoclingDocument for page break consistency (#2939)
* fix: add failed pages to DoclingDocument for page break consistency
When some PDF pages fail to parse, they were not added to
DoclingDocument.pages, causing page break markers to be incorrect
during export. This adds failed/skipped pages with their size info
(if available) to maintain correct page numbering and structure.
- Add _add_failed_pages_to_document() method in StandardPdfPipeline
- Add test cases for failed page handling
- Add test cases for normal page handling (regression test)
- Add test PDF files
Signed-off-by: jhchoi1182 <jhchoi1182@gmail.com>
* fix: ensure resource cleanup and simplify type hints
- Wrap page_backend usage in try-finally to guarantee unload (prevents resource leaks).
- Simplify redundant 'float | None | None' type hint.
Signed-off-by: jhchoi1182 <jhchoi1182@gmail.com>
* fix: add groundtruth for normal_4pages.pdf and exclude failing PDFs from e2e test
Signed-off-by: jhchoi1182 <jhchoi1182@gmail.com>
* fix: ensure correct status assertion for failed pages in tests
Signed-off-by: jhchoi1182 <jhchoi1182@gmail.com>
---------
Signed-off-by: jhchoi1182 <jhchoi1182@gmail.com>
* fix: Use timezone-aware datetime (#2947)
* Use timezone-aware datetime for profiling timestamps
Updated timestamp recording to use timezone-aware datetime.
Signed-off-by: Nikhil Singh <124866156+Ritinikhil@users.noreply.github.com>
* run formatter
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
---------
Signed-off-by: Nikhil Singh <124866156+Ritinikhil@users.noreply.github.com>
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
Co-authored-by: Michele Dolfi <dol@zurich.ibm.com>
* fix(asciidoc): handle commas in image alt text (#2983)
* Fix: Handle commas in AsciiDoc image alt text
- Modified _parse_picture() to gracefully handle alt text containing commas
- Commas in alt text are now preserved instead of causing ValueError
- Added test case with realistic auto-generated alt text
- split('=', 1) prevents issues when values contain '=' characters
* DCO Remediation Commit for n0rdp0l <n90.w135@gmail.com>
I, n0rdp0l <n90.w135@gmail.com>, hereby add my Signed-off-by to this commit: ee752491fc
Signed-off-by: n0rdp0l <n90.w135@gmail.com>
* style: fix ruff formatting in test_backend_asciidoc.py
Signed-off-by: n0rdp0l <n90.w135@gmail.com>
---------
Signed-off-by: n0rdp0l <n90.w135@gmail.com>
Co-authored-by: Michele Dolfi <dol@zurich.ibm.com>
* chore: bump version to 2.73.1 [skip ci]
* First attempt at establishing API Kserve2 facet
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* refactor: improve KServe v2 engine implementation after code review
- Add comprehensive error handling to KserveV2HttpClient
- Catch and wrap Timeout, ConnectionError, HTTPError with context
- Validate response formats with clear error messages
- Refactor URL building to eliminate duplication
- Extract _build_model_url() helper method
- Single source of truth for infer_url and model_metadata_url
- Make URL required parameter (remove default localhost:8000)
- Update ApiKserveV2*EngineOptions to require explicit URL
- Add preset validation with helpful error messages
- Rename constants for clarity: TRITON_* → KSERVE_V2_*
- Add comment explaining KServe v2 uses Triton type system
- Improve error messages with actual values
- Show counts, shapes, and supported types in validation errors
- Document official KServe Python SDK alternative
- Note async-only requirement and alpha status
- Update tests for required URL parameter
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Cleanup in kserve http helper and options
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Further cleanup
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Fix for remote-services on tablemodel
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* fix: improved deserialization of engine_options (#3008)
* add registry of discriminated subclasses
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* fix detection of engine_type value
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
---------
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* Add options serialization improvements
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
---------
Signed-off-by: jhchoi1182 <jhchoi1182@gmail.com>
Signed-off-by: Nikhil Singh <124866156+Ritinikhil@users.noreply.github.com>
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
Signed-off-by: n0rdp0l <n90.w135@gmail.com>
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
Co-authored-by: jhchoi1182 <jhchoi1182@gmail.com>
Co-authored-by: Nikhil Singh <124866156+Ritinikhil@users.noreply.github.com>
Co-authored-by: Michele Dolfi <dol@zurich.ibm.com>
Co-authored-by: Felix Wente <63914035+n0rdp0l@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Michele Dolfi <97102151+dolfim-ibm@users.noreply.github.com>
* Fixes from review
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* DCO Remediation Commit for Christoph Auer <cau@zurich.ibm.com>
I, Christoph Auer <cau@zurich.ibm.com>, hereby add my Signed-off-by to this commit: 4cdb01e6d3
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* DCO Remediation Commit for Christoph Auer <60343111+cau-git@users.noreply.github.com>
I, Christoph Auer <60343111+cau-git@users.noreply.github.com>, hereby add my Signed-off-by to this commit: e293ba3270
Signed-off-by: Christoph Auer <60343111+cau-git@users.noreply.github.com>
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Add fallback for API variants
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Recreate uv.lock
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
---------
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
Signed-off-by: jhchoi1182 <jhchoi1182@gmail.com>
Signed-off-by: Nikhil Singh <124866156+Ritinikhil@users.noreply.github.com>
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
Signed-off-by: n0rdp0l <n90.w135@gmail.com>
Signed-off-by: Christoph Auer <60343111+cau-git@users.noreply.github.com>
Co-authored-by: jhchoi1182 <jhchoi1182@gmail.com>
Co-authored-by: Nikhil Singh <124866156+Ritinikhil@users.noreply.github.com>
Co-authored-by: Michele Dolfi <dol@zurich.ibm.com>
Co-authored-by: Felix Wente <63914035+n0rdp0l@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Michele Dolfi <97102151+dolfim-ibm@users.noreply.github.com>
* feat: added support for parsing LaTeX (.tex) documents
Signed-off-by: Aditya Sasidhar <telikicherlaadityasasidhar@gmail.com>
* feat: implement PR #2890 feedback for LaTeX backend
- Add text formatting options (bold, italic, underline) for LaTeX macros
- Enhance image embedding with PIL and ImageRef.from_pil()
- Refactor list processing to use GroupItem structure
- Refactor bibliography to use GroupItem structure
- Add nested list test coverage
- All tests passing (39/39), all linters passing
Signed-off-by: Aditya Sasidhar <telikicherlaadityasasidhar@gmail.com>
* DCO Remediation Commit for Aditya Sasidhar <telikicherlaadityasasidhar@gmail.com>
I, Aditya Sasidhar <telikicherlaadityasasidhar@gmail.com>, hereby add my Signed-off-by to this commit: f19f135b431d489cd8bf3982524505a0bbd8696d
Signed-off-by: Aditya Sasidhar <telikicherlaadityasasidhar@gmail.com>
* DCO Remediation Commit for Aditya Sasidhar <telikicherlaadityasasidhar@gmail.com>
I, Aditya Sasidhar <telikicherlaadityasasidhar@gmail.com>, hereby add my Signed-off-by to this commit: f19f135b431d489cd8bf3982524505a0bbd8696d
Signed-off-by: Aditya Sasidhar <telikicherlaadityasasidhar@gmail.com>
* feat: enhance latex backend with robustness fixes and ground truth
- Add custom macro expansion for improved text quality
- Fix preamble filtering to remove metadata garbage
- Support recursive \input{} and \include{} file loading
- Organize test data into subdirectories for complex papers
- Add full end-to-end ground truth for 4 major arXiv papers (Attention, Mistral, DeepSeek, OTSL)
- Pass all 41 unit tests and pre-commit checks
Addresses @cau-git feedback for ground-truth data.
Signed-off-by: Aditya Sasidhar <telikicherlaadityasasidhar@gmail.com>
* fix: minor formatting in test file
Signed-off-by: Aditya Sasidhar <telikicherlaadityasasidhar@gmail.com>
* feat: enhance LaTeX backend with robust math and figure support
- Fixed re.error: bad escape in macro expansion by using lambda in re.sub
- Fixed sentences breaking at inline math ($) by preserving it within paragraphs
- Improved figure environment with proper grouping and structured representation
- Fixed crashes on documents starting with % comments
- Added comprehensive unit tests and updated all ground truth data
Signed-off-by: Aditya Sasidhar <telikicherlaadityasasidhar@gmail.com>
* WIP: saving work for laptop migration
Signed-off-by: Aditya Sasidhar <telikicherlaadityasasidhar@gmail.com>
* got rid of the line breaking issues, still some do exist
Signed-off-by: Aditya Sasidhar <telikicherlaadityasasidhar@gmail.com>
* fix: generalized LaTeX macro parsing and robustness improvements
This commit addresses several issues with LaTeX parsing:
- Correctly handle unknown macros (like \ion{N}{2}) inline to avoid line breaks.
- Fix extraction of structural macros (section, caption, etc.) vs text-only groups.
- Address PR feedback regarding inline math spacing and splitting.
- Regenerate ground truth files reflecting these improvements.
Signed-off-by: Aditya Sasidhar <telikicherlaadityasasidhar@gmail.com>
* style: apply automatic formatting fixes
Signed-off-by: Aditya Sasidhar <telikicherlaadityasasidhar@gmail.com>
* style: fix ruff linter and formatter errors
Signed-off-by: Aditya Sasidhar <telikicherlaadityasasidhar@gmail.com>
* fix: typing issues identified by mypy
Signed-off-by: Aditya Sasidhar <telikicherlaadityasasidhar@gmail.com>
* style: apply formatting fixes to tests
Signed-off-by: Aditya Sasidhar <telikicherlaadityasasidhar@gmail.com>
* fix: update groundtruth files for latex backend
Signed-off-by: Aditya Sasidhar <telikicherlaadityasasidhar@gmail.com>
* fixed the ackward line breaking issue, turns out im stupid at considering text buffer
* i forgot to add the groundtruth so here it is
* DCO Remediation Commit for Aditya Sasidhar <telikicherlaadityasasidhar@gmail.com>
I, Aditya Sasidhar <telikicherlaadityasasidhar@gmail.com>, hereby add my Signed-off-by to this commit: 7e032635ef
I, Aditya Sasidhar <telikicherlaadityasasidhar@gmail.com>, hereby add my Signed-off-by to this commit: aeba688384
Signed-off-by: Aditya Sasidhar <telikicherlaadityasasidhar@gmail.com>
* Ran the precommit as requested
Signed-off-by: Aditya Sasidhar <telikicherlaadityasasidhar@gmail.com>
---------
Signed-off-by: Aditya Sasidhar <telikicherlaadityasasidhar@gmail.com>
* add results for standard + OCR and more Windows timings
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* fix runtime selection for py 3.14 in CI
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
---------
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* docs(opensearch): update the example notebook RAG with OpenSearch
Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com>
* docs(uspto): remove direct usage of the backend class for conversion
Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com>
* docs: remove direct usage of backends from documentation
Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com>
---------
Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com>
* feat: add a backend parser for WebVTT files
Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com>
* docs: update README with VTT support
Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com>
* docs: add description to supported formats
Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com>
* chore: upgrade docling-core to unescape WebVTT in markdown
Pin the new release of docling-core 2.48.2.
Do not escape HTML reserved characters when exporting WebVTT documents to markdown.
Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com>
* test: add missing copyright notice
Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com>
---------
Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com>
* Added option to docling-tools to download arbitrary HuggingFace model
Signed-off-by: Viktor Kuropiatnyk <vku@zurich.ibm.com>
* Added note in documentation
Signed-off-by: Viktor Kuropiatnyk <vku@zurich.ibm.com>
* Removed note on custom artifact path usage from HF download option
Signed-off-by: Viktor Kuropiatnyk <vku@zurich.ibm.com>
* Fixed typo
Signed-off-by: Viktor Kuropiatnyk <vku@zurich.ibm.com>
---------
Signed-off-by: Viktor Kuropiatnyk <vku@zurich.ibm.com>
* feat: adding new vlm-models support
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* fixed the transformers
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* got microsoft/Phi-4-multimodal-instruct to work
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* working on vlm's
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* refactoring the VLM part
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* all working, now serious refacgtoring necessary
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* refactoring the download_model
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* added the formulate_prompt
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* pixtral 12b runs via MLX and native transformers
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* added the VlmPredictionToken
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* refactoring minimal_vlm_pipeline
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* fixed the MyPy
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* added pipeline_model_specializations file
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* need to get Phi4 working again ...
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* finalising last points for vlms support
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* fixed the pipeline for Phi4
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* streamlining all code
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* reformatted the code
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* fixing the tests
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* added the html backend to the VLM pipeline
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* fixed the static load_from_doctags
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* restore stable imports
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* use AutoModelForVision2Seq for Pixtral and review example (including rename)
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* remove unused value
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* refactor instances of VLM models
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* skip compare example in CI
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* use lowercase and uppercase only
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* add new minimal_vlm example and refactor pipeline_options_vlm_model for cleaner import
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* rename pipeline_vlm_model_spec
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* move more argument to options and simplify model init
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* add supported_devices
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* remove not-needed function
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* exclude minimal_vlm
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* missing file
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* add message for transformers version
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* rename to specs
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* use module import and remove MLX from non-darwin
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* remove hf_vlm_model and add extra_generation_args
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* use single HF VLM model class
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* remove torch type
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* add docs for vision models
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
---------
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
Co-authored-by: Michele Dolfi <dol@zurich.ibm.com>