Commit Graph

15 Commits

Author SHA1 Message Date
vwe-ibm 9b4b67b23e feat: add signature/stamp html block to DC document (#3251) 2026-04-09 06:13:22 +02:00
Cesar Berrospi Ramis 86eaef5b45 fix(md): handle pipe symbols that are not table markers (#2904)
* fix(md): handle pipe symbols that are not table markers

Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com>

* chore: update uv.lock with latest docling-core 2.60.2

Update uv.lock file with the latest release of docling-core (2.60.2).
Update (fix) ground truth files for testing markdown serialization to be
in sync with the serialization fix (issue 2880).

Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com>

---------

Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com>
2026-01-23 15:19:09 +01:00
Matvei Smirnov ee73ffae15 fix(markdown): Setext heading support (#2359)
Signed-off-by: Matvei Smirnov <vdalekesmirnov@gmail.com>
Co-authored-by: Matvei Smirnov <matvei.smirnov@vkteam.ru>
2025-10-03 10:32:53 +02:00
Lucas Morin 9d67bb9ed6 fix: support escaped characters in markdown backend (#2304)
fix: improve markdown backend to support input documents with escaped characters

Signed-off-by: Lucas Morin <lucas.morin222@gmail.com>
2025-09-23 18:00:16 +02:00
Cesar Berrospi Ramis aec29a7315 fix(markdown): ensure correct parsing of nested lists (#1995)
* fix(markdown): ensure correct parsing of nested lists

Signed-off-by: Cesar Berrospi Ramis <75900930+ceberam@users.noreply.github.com>

* chore: update dependencies in uv.lock file

Signed-off-by: Cesar Berrospi Ramis <75900930+ceberam@users.noreply.github.com>

---------

Signed-off-by: Cesar Berrospi Ramis <75900930+ceberam@users.noreply.github.com>
2025-07-25 15:17:57 +02:00
Michael Honaker e79e4f0ab6 fix(markdown): make parsing of rich table cells valid (#1821)
* fix: update md table classification

Signed-off-by: Michael Honaker <Michael.Honaker@ibm.com>

* Fix ground truth header changes

Signed-off-by: Michael Honaker <Michael.Honaker@ibm.com>

* Fix merge issues

Signed-off-by: Michael Honaker <Michael.Honaker@ibm.com>

* Fix minor ground truth errors

Signed-off-by: Michael Honaker <Michael.Honaker@ibm.com>

---------

Signed-off-by: Michael Honaker <Michael.Honaker@ibm.com>
2025-06-26 19:50:45 +02:00
Panos Vagenas 7c5614a37a fix(markdown): fix single-formatted headings & list items (#1820)
* fix(markdown): fix formatting & inline edge cases (show behavior before change)

Signed-off-by: Panos Vagenas <pva@zurich.ibm.com>

* add change and updated test data

Signed-off-by: Panos Vagenas <pva@zurich.ibm.com>

* update lock

Signed-off-by: Panos Vagenas <pva@zurich.ibm.com>

* improve test case

Signed-off-by: Panos Vagenas <pva@zurich.ibm.com>

---------

Signed-off-by: Panos Vagenas <pva@zurich.ibm.com>
2025-06-25 13:05:06 +02:00
Panos Vagenas 861abcdcb0 feat(markdown): add formatting & improve inline support (#1804)
feat(markdown): support formatting & hyperlinks

Signed-off-by: Panos Vagenas <pva@zurich.ibm.com>
2025-06-18 15:57:57 +02:00
Panos Vagenas 9210812bfa fix: improve HTML layer detection, various MD fixes (#1241)
Markdown fixes:
- properly propagate section header levels
- improve handling of list subroots without text

Signed-off-by: Panos Vagenas <pva@zurich.ibm.com>
2025-03-26 16:07:14 +01:00
Panos Vagenas 90b766e2ae fix(markdown): handle nested lists (#910)
Signed-off-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com>
2025-02-07 12:55:12 +01:00
Panos Vagenas 5ac2887e4a fix(markdown): fix parsing if doc ending with table (#873)
Signed-off-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com>
2025-02-03 14:38:38 +01:00
Panos Vagenas 94751a78f4 fix(markdown): add support for HTML content (#855)
Signed-off-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com>
2025-02-03 12:21:05 +01:00
Panos Vagenas bccb022fc8 fix(markdown): fix empty block handling (#843)
Signed-off-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com>
2025-01-30 16:22:29 +01:00
Panos Vagenas 5aed9f8aeb fix: fix single newline handling in MD backend (#824)
Signed-off-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com>
2025-01-28 19:05:55 +01:00
Panos Vagenas c8ecdd987e feat: expose new hybrid chunker, update docs (#384)
Signed-off-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com>
2024-12-09 08:28:29 +01:00