62 Commits

Author SHA1 Message Date
github-actions[bot] e5273a9bff chore: bump version to 1.18.1 [skip ci] 2026-05-07 11:51:30 +00:00
Michele Dolfi cb14b21246 chore: update lock (#129)
update lock

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
2026-04-24 15:20:08 +02:00
github-actions[bot] bedb47a8bd chore: bump version to 1.18.0 [skip ci] 2026-04-24 11:51:48 +00:00
github-actions[bot] fa70bf3cbb chore: bump version to 1.17.1 [skip ci] 2026-04-17 12:51:17 +00:00
Christoph Auer f147cc0941 fix: Harden Ray dispatcher initialization, recovery, and execution leases (#122)
* Move client SDK to docling

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Cleanup and test shims

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Update to released docling version

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* fix: harden ray dispatcher durability and recovery

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Remove dispatcher_handoff_timeout

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* More cleanup, recover comments

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Bound Ray dispatcher recovery RPCs for liveness

Add dispatcher_rpc_timeout and liveness_fail_after to RayOrchestratorConfig.
Bound both dispatcher health checks and runtime refresh RPCs with
asyncio.wait_for so head-loss cannot wedge the supervisor on an
unbounded await. Track continuous dispatcher unhealthiness and expose
is_liveness_healthy() for bounded liveness decisions.

Also extend Ray hardening tests to cover both get_health and
refresh_runtime timeout paths.

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Make Ray runtime initialization lazy in RayOrchestrator

RayOrchestrator.__init__() previously called ray.init(), serve.start(),
and deploy_processor() synchronously, coupling API pod startup to Ray
head availability. This caused crash-loop restarts when the Ray head
was unavailable and forced compensatory workarounds in docling-serve
(/ready shallow bypass, /livez liveness logic).

Move all Ray init calls into a new _initialize_ray_runtime() async
method invoked from process_queue(), so construction is Ray-free and
the pod can start serving requests before a Ray session is established.
Use asyncio.to_thread for the blocking Ray calls. Wrap the method body
in BaseException (re-raising CancelledError) so any failure, including
SystemExit(15) from Ray internals, raises DispatcherUnavailableError
rather than escaping as an unhandled exception.

Apply the same BaseException / CancelledError pattern to
_refresh_dispatcher_runtime() and ensure_dispatcher_ready(), which
previously used except Exception and therefore missed SystemExit(15)
from Ray, returning HTTP 500 instead of the intended HTTP 503.

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* fix(ray): keep processing key cleanup in complete_task_atomic only

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* ray: replace task heartbeat WATCH loop with Lua and derive stale cutoff from heartbeat interval

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* feat(ray): add replica-owned execution lease helpers to RedisStateManager

Adds write_task_execution_lease(), update_task_execution_heartbeat(), and
get_task_execution_lease() to RedisStateManager. Extends finalize_task_*_atomic()
and complete_task_atomic() to delete task:{id}:execution at terminalization.
These methods are used by the next task (replica heartbeat in serve_deployment.py).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(ray): execution lease cleanup pass 2

- Delete update_task_processing_heartbeat() and its Lua script constant;
  only caller (_maintain_processing_heartbeat) was removed in D2
- Remove heartbeat_at from mark_task_processing() mapping; the dispatch
  key is now a pure admission record, not a heartbeat carrier
- Remove mark_task_processing() call from serve_deployment.py; the
  execution lease is the authoritative "execution has begun" signal
- Rename Redis key task:{id}:processing → task:{id}:dispatch and method
  get_task_processing_state() → get_task_dispatch_state_hash() to reflect
  that this is a dispatcher-written dispatch record, not a replica state
- Wrap _process_convert and _process_chunk with asyncio.to_thread so the
  replica event loop stays free during conversion, allowing the execution
  lease heartbeat to fire throughout long-running tasks (D1b)

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* fix(ray): stringify serve replica id before writing execution leas

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* refactor(ray): remove dead dispatch-state cleanup code

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Move Ray runtime init under continuous supervision

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* DCO Remediation Commit for Christoph Auer <cau@zurich.ibm.com>

I, Christoph Auer <cau@zurich.ibm.com>, hereby add my Signed-off-by to this commit: bd82fa1b7d
I, Christoph Auer <cau@zurich.ibm.com>, hereby add my Signed-off-by to this commit: 9c8e1a5a86

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Skip integration tests in CI

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Small cleanups

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Harden test_local_orchestrator

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

---------

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 14:01:24 +02:00
github-actions[bot] bf7fd1fbca chore: bump version to 1.17.0 [skip ci] 2026-04-14 08:43:19 +00:00
Christoph Auer 2398dba341 feat: Move types required by client SDK to docling (#121)
* Move client SDK to docling

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Cleanup and test shims

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Update to released docling version

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

---------

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
2026-04-14 10:24:16 +02:00
github-actions[bot] 8f32aaef48 chore: bump version to 1.16.0 [skip ci] 2026-04-07 19:52:30 +00:00
github-actions[bot] d5b096a485 chore: bump version to 1.15.0 [skip ci] 2026-04-01 08:49:02 +00:00
Michele Dolfi 65eca34d16 feat: add preset and custom_config for layout, picture classifier and ocr (#114)
* add preset and custom_config for layout, picture classifier and ocr

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* fixes

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* add allowed kinds and don't filter default ocr from presets

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* fix: ocr_custom_config ignored when ocr_preset defaults to "auto"

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Add tests which evaluates proper translation of custom ConvertDocumentsOptions

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* apply linter and formatter

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* update deps

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* fix assert

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

---------

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
Co-authored-by: Christoph Auer <cau@zurich.ibm.com>
2026-03-31 19:16:16 +02:00
Michele Dolfi bc27836811 feat: Control maximum concurrent redis requests to avoid pool exhaustion (#112)
* feat: Control maximum concurrent redis requests to avoid pool exhaustion (#6)

* add initial ray_fair orchestrator

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* implementation with ray serve

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* fix serialization

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* more serialization fixes

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* cannot msgpack the DocumentStream

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* hardening notifier

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* cleanup raydata param and add log level

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* cleanup params and implement object store memory

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* add mtls

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* more logging

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* more logging

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* launch all tasks

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* rename params

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* update docs

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* fix creation of redis pools

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* fix: Watchdog: update the RQ job statusto FAILED and remove it from StartedJobRegistry (#107)

* fix: Watchdog: update the RQ job statusto FAILED and remove it from StartedJobRegistry

Signed-off-by: Pawel Rein <pawel.rein@prezi.com>

* fix formatter/linter

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

---------

Signed-off-by: Pawel Rein <pawel.rein@prezi.com>
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
Co-authored-by: Michele Dolfi <dol@zurich.ibm.com>

* add metadata for orchestrator

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* fixes

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* add dispatch state

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* rename workers to actors

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* rename fair_ray to ray

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* more rename

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* fix dispatch vs running

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* add redis manager to the actors

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* fix running metrics

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* fix setting rtunning

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* actor cleanup

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* feat: Expose classification filters for picture description (#105)

Preserve legacy picture description filters

Signed-off-by: drk <drukpa1455@gmail.com>

* feat: add on_result_fetched() no-op lifecycle hook to BaseOrchestrator

* feat: add consumed_ttl and on_result_fetched() to RQOrchestrator

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat: add consumed_ttl and on_result_fetched() to LocalOrchestrator

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat: add expire_result() to RedisStateManager

This method sets a TTL on an existing result key in Redis, enabling
crash-safe single-use deletion of results after they are fetched.
Implements test-driven development with unit test verification.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat: add consumed_ttl and on_result_fetched() to RayOrchestrator

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: add connect() guard to expire_result matching peer methods

All 20+ other methods in RedisStateManager check `if not self.redis`
before using the client. expire_result was missing this guard and would
raise RuntimeError if called before connection establishment.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* ensure no asyncio.task can be GCed early

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* apply re-formatting

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* add ray actor logging to jobkit

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* fix: run RQ Job.fetch/get_status/get_position in thread pool to avoid blocking the event loop

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Ensure control over max ongoing requests per ray replica

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* refactor: rename consumed_ttl back to result_removal_delay

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* upgrade uv.lock

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* move Redis gating and RQ durable status into jobkit

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

---------

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
Signed-off-by: Pawel Rein <pawel.rein@prezi.com>
Signed-off-by: drk <drukpa1455@gmail.com>
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
Co-authored-by: Michele Dolfi <dol@zurich.ibm.com>
Co-authored-by: Paweł Rein <pawel.rein@prezi.com>
Co-authored-by: drk <136856552+drukpa1455@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix test

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* fix on python 3.14

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

---------

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
Signed-off-by: Pawel Rein <pawel.rein@prezi.com>
Signed-off-by: drk <drukpa1455@gmail.com>
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
Co-authored-by: Christoph Auer <CAU@zurich.ibm.com>
Co-authored-by: Paweł Rein <pawel.rein@prezi.com>
Co-authored-by: drk <136856552+drukpa1455@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-25 16:05:21 +01:00
Michele Dolfi bd2decd360 feat: Add Ray orchestrator with fair scheduling (#110)
* feat: ray orchestrator with fair workload dispatching (#2)

* add initial ray_fair orchestrator

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* implementation with ray serve

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* fix serialization

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* more serialization fixes

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* cannot msgpack the DocumentStream

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* hardening notifier

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* cleanup raydata param and add log level

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* cleanup params and implement object store memory

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* add mtls

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* more logging

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* more logging

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* launch all tasks

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* rename params

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* update docs

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* fix creation of redis pools

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* fix: Watchdog: update the RQ job statusto FAILED and remove it from StartedJobRegistry (#107)

* fix: Watchdog: update the RQ job statusto FAILED and remove it from StartedJobRegistry

Signed-off-by: Pawel Rein <pawel.rein@prezi.com>

* fix formatter/linter

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

---------

Signed-off-by: Pawel Rein <pawel.rein@prezi.com>
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
Co-authored-by: Michele Dolfi <dol@zurich.ibm.com>

* add metadata for orchestrator

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* fixes

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* add dispatch state

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* rename workers to actors

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* rename fair_ray to ray

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* more rename

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* fix dispatch vs running

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* add redis manager to the actors

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* fix running metrics

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* fix setting rtunning

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* actor cleanup

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* feat: Expose classification filters for picture description (#105)

Preserve legacy picture description filters

Signed-off-by: drk <drukpa1455@gmail.com>

---------

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
Signed-off-by: Pawel Rein <pawel.rein@prezi.com>
Signed-off-by: drk <drukpa1455@gmail.com>
Co-authored-by: Paweł Rein <pawel.rein@prezi.com>
Co-authored-by: drk <136856552+drukpa1455@users.noreply.github.com>
Co-authored-by: Christoph Auer <cau@zurich.ibm.com>

* no ray in 3.14

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* move skip before imports

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* don't package venv for testing

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* temp disable test

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

---------

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
Signed-off-by: Pawel Rein <pawel.rein@prezi.com>
Signed-off-by: drk <drukpa1455@gmail.com>
Co-authored-by: Paweł Rein <pawel.rein@prezi.com>
Co-authored-by: drk <136856552+drukpa1455@users.noreply.github.com>
Co-authored-by: Christoph Auer <cau@zurich.ibm.com>
2026-03-24 11:07:29 +01:00
github-actions[bot] 064464b582 chore: bump version to 1.14.0 [skip ci] 2026-03-23 15:28:25 +00:00
github-actions[bot] bc157da058 chore: bump version to 1.13.0 [skip ci] 2026-03-03 14:11:17 +00:00
Michele Dolfi d0d9c737aa feat: Callback system for document conversion progress tracking (#103)
* add callbacks for live task updates

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* missing test file

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* add dev deps

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

---------

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
2026-03-03 14:30:46 +01:00
github-actions[bot] 3120bc76ce chore: bump version to 1.12.1 [skip ci] 2026-02-24 21:33:33 +00:00
github-actions[bot] a4edcc3231 chore: bump version to 1.12.0 [skip ci] 2026-02-24 10:30:23 +00:00
github-actions[bot] bc426e0e64 chore: bump version to 1.11.0 [skip ci] 2026-02-18 12:19:10 +00:00
Michele Dolfi 2835f8cc05 feat: expose layout and table kind options (#95)
* add layout and table kind options

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* run object detection only on py < 3.14

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* use PR of docling and fix custom config for object detection test

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* lock docling release

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

---------

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
2026-02-18 11:40:51 +01:00
Michele Dolfi 091767ff9d feat: Add preset-based configuration system for VLM pipelines with admin controls (#92)
* update lock

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* use new presents for vlm models in various stages

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* simplify redundant getter and validator

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* improve typing

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* add tests

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* use new presets in the tests

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* enable custom config

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* fix usage of custom config

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* test deprecation on creation

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* add validation on assignment

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* fix old test

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* add deprecation in the docs

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

---------

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
2026-02-16 11:36:40 +01:00
github-actions[bot] 7cb252f813 chore: bump version to 1.10.2 [skip ci] 2026-02-13 11:31:32 +00:00
github-actions[bot] ddc38594bc chore: bump version to 1.10.1 [skip ci] 2026-02-06 11:25:57 +00:00
github-actions[bot] 0e8a53b0f3 chore: bump version to 1.10.0 [skip ci] 2026-02-05 12:47:40 +00:00
Michele Dolfi 605d6ee4e9 feat: add charts extraction (#87)
add option for charts extraction

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
2026-02-05 13:37:33 +01:00
github-actions[bot] e33ad00886 chore: bump version to 1.9.1 [skip ci] 2026-01-30 15:42:37 +00:00
Michele Dolfi 097b581fd1 chore: lock updated deps in CI (#84)
lock updated deps in CI

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
2026-01-28 14:55:32 +01:00
github-actions[bot] 43a4c831ca chore: bump version to 1.9.0 [skip ci] 2026-01-28 13:05:55 +00:00
Michele Dolfi 5e257484a2 feat: Python 3.14 support (#71)
* python 3.14 support

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* no ray on python 3.14

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* fix 3.14 deps

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* block import of ray module for python 3.14

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* make ray checks optional

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

---------

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
2026-01-19 12:19:08 +01:00
Michele Dolfi f109e5362b feat: Batching Support for Source Processors (#79)
* batch source processors

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* type and import fixes

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* cleanup

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

---------

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
2026-01-15 12:54:13 +01:00
github-actions[bot] 7fbae327fe chore: bump version to 1.8.1 [skip ci] 2026-01-06 10:56:17 +00:00
Michele Dolfi 81828442a4 chore: update lock (#72)
update lock

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
2025-11-21 09:54:56 +01:00
github-actions[bot] 281fda1187 chore: bump version to 1.8.0 [skip ci] 2025-10-31 16:06:48 +00:00
Michele Dolfi b72f7f8579 feat: Expose new standard pipeline with threads and its parameters (#70)
use features of the new standard pipeline with threads

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
2025-10-31 16:18:52 +01:00
github-actions[bot] 358704b5c6 chore: bump version to 1.7.1 [skip ci] 2025-10-30 12:57:43 +00:00
github-actions[bot] e0bc4eadb8 chore: bump version to 1.7.0 [skip ci] 2025-10-21 09:43:02 +00:00
Michele Dolfi 6d7a4d23a5 feat: use new docling auto-ocr (#64)
use new docling auto-ocr

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
2025-10-13 15:58:24 +02:00
github-actions[bot] a7cb321dde chore: bump version to 1.6.0 [skip ci] 2025-10-03 09:34:18 +00:00
Lucas Morin 08cf0768e9 feat: create connectors to import/export documents from/to Google Drive (#62)
feat: create processors to import/export documents from/to Google Drive

Signed-off-by: Lucas Morin <lucas.morin222@gmail.com>
2025-10-01 17:41:56 +02:00
Michele Dolfi 291b757f50 feat(docling): update docling version with support for GraniteDocling (#63)
update docling version with support for granitedocling

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
2025-09-29 09:43:16 +02:00
Michele Dolfi 69e5085655 refactor: source and target processors to allow more connectors (#60)
* refactor source and target processors to allow more connectors

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* missing files

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* remove unused public interfaces

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

---------

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
2025-09-26 11:05:54 +02:00
github-actions[bot] 73ef4cef27 chore: bump version to 1.5.0 [skip ci] 2025-09-08 15:41:36 +00:00
Michele Dolfi 3b9b11cf9f feat: add chunking task (#54)
* feat: implement document chunking functionality for RAG workflows

Signed-off-by: Magnus Samuelsen <97634880+MagnusS0@users.noreply.github.com>

* feat:  improved cache key generation and seperate md table module

Also improves documentation

Signed-off-by: Magnus Samuelsen <97634880+MagnusS0@users.noreply.github.com>

* add tasktype and options for chunking

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* fix error propagation

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* fix chunking cache management

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* fix test_chunking

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* test for chunking in  RQ

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* use proper chunk typing and contextualize

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* add HierarchicalChunker

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* discriminate chunkers by options

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* fix RQ test

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* use explicit union (for the moment)

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* allow for auto-detecting the max_tokens and remove redundant setting parameter

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* rename and move result classes

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* fix test

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* add type alias

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* add doc_items and captions

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* use annotations

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* add option for including documents in the output

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

---------

Signed-off-by: Magnus Samuelsen <97634880+MagnusS0@users.noreply.github.com>
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
Co-authored-by: Magnus Samuelsen <97634880+MagnusS0@users.noreply.github.com>
2025-09-08 17:33:04 +02:00
github-actions[bot] 11edd0543a chore: bump version to 1.4.1 [skip ci] 2025-08-19 08:32:56 +00:00
github-actions[bot] d8216cd1e4 chore: bump version to 1.4.0 [skip ci] 2025-08-13 14:56:16 +00:00
Tiago Santana d7b1c40943 feat: add rq orchestrator (#44)
* add rq engine

Signed-off-by: Tiago Santana <54704492+SantanaTiago@users.noreply.github.com>

* refactor RQ and use SimpleWorker

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* move process_results from docling-serve and use a redis key to transfer data

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* missing scratch_dir

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* pubsub updates

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* fix serialization and add notifier events

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* add redis service

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* add task_result to kfp

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* launch params

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* cleanup print()

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* run in background

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* add delete_task

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* use redis_url as config (allows for password and tls)

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

---------

Signed-off-by: Tiago Santana <54704492+SantanaTiago@users.noreply.github.com>
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
Co-authored-by: Michele Dolfi <dol@zurich.ibm.com>
2025-08-13 16:49:12 +02:00
github-actions[bot] 0c302863c5 chore: bump version to 1.3.1 [skip ci] 2025-08-12 12:33:37 +00:00
github-actions[bot] c9c8edba27 chore: bump version to 1.3.0 [skip ci] 2025-08-06 16:43:58 +00:00
Michele Dolfi 7d652a53df feat: option to disable shared models between workers (#46)
* option to disable shared models between workers

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* add artifacts_path as fixture

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* limit CI

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

---------

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
2025-08-06 09:02:26 +02:00
github-actions[bot] 82f789c63d chore: bump version to 1.2.0 [skip ci] 2025-07-24 08:41:37 +00:00
github-actions[bot] 2bb087f0b1 chore: bump version to 1.1.1 [skip ci] 2025-07-18 15:23:03 +00:00