* refactor(provenance): account for provenance as union of ProvenanceItem and ProvenanceTrack
Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com>
* refactor(webvtt): update WebVTTDocumentBackend with new docling-core classes
Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com>
* refactor(webvtt): preserve new lines and add helper handlers
Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com>
* refactor(webvtt): set ProvenanceTrack timinings as float type
Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com>
* style(asr): remove unnecessary imports
Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com>
* refactor(asr): use ProvenanceTrack in ASR pipeline
Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com>
* tests(webvtt): add additional tests
Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com>
* chore(webvtt): parse the title of the WEBVTT file
Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com>
* refactor(webvtt): apply refactoring of TrackProvenance from docling-core
Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com>
* style(webvtt): apply X | Y annotation instead of Optional, Union
Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com>
* refactor(webvtt): drop cue span classes, 'lang' and 'c' tags
Drop WebVTT formatting features not covered by Docling across formats.
Only 'u', 'b', 'i', and 'v' are supported and without classes.
Align with docling-core v2.62.0
Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com>
Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com>
* build: pin docling-core 2.62.0
Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com>
---------
Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com>
* feat: add a backend parser for WebVTT files
Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com>
* docs: update README with VTT support
Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com>
* docs: add description to supported formats
Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com>
* chore: upgrade docling-core to unescape WebVTT in markdown
Pin the new release of docling-core 2.48.2.
Do not escape HTML reserved characters when exporting WebVTT documents to markdown.
Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com>
* test: add missing copyright notice
Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com>
---------
Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com>