Files
Cesar Berrospi Ramis 0602a7cdab feat: webvtt and source tracker (#2787)
* refactor(provenance): account for provenance as union of ProvenanceItem and ProvenanceTrack

Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com>

* refactor(webvtt): update WebVTTDocumentBackend with new docling-core classes

Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com>

* refactor(webvtt): preserve new lines and add helper handlers

Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com>

* refactor(webvtt): set ProvenanceTrack timinings as float type

Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com>

* style(asr): remove unnecessary imports

Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com>

* refactor(asr): use ProvenanceTrack in ASR pipeline

Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com>

* tests(webvtt): add additional tests

Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com>

* chore(webvtt): parse the title of the WEBVTT file

Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com>

* refactor(webvtt): apply refactoring of TrackProvenance from docling-core

Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com>

* style(webvtt): apply X | Y annotation instead of Optional, Union

Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com>

* refactor(webvtt): drop cue span classes, 'lang' and 'c' tags

Drop WebVTT formatting features not covered by Docling across formats.
Only 'u', 'b', 'i', and 'v' are supported and without classes.
Align with docling-core v2.62.0

Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com>

Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com>

* build: pin docling-core 2.62.0

Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com>

---------

Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com>
2026-01-30 17:44:03 +01:00
..