docling/CHANGELOG.md at dev/complex_html_renderer_example

docling-project/docling

Fork 0

mirror of https://github.com/docling-project/docling.git synced 2026-05-17 13:10:38 +00:00

Files

T

github-actions[bot] 61c37a23a9 chore: bump version to 2.93.0 [skip ci]

2026-05-05 19:53:32 +00:00

170 KiB

Raw Permalink Blame History

v2.93.0 - 2026-05-05

Feature

vlm: Upgrade Granite Vision model to 4.1 for table + chart extraction (#3382) (24f2d14)

Fix

docx: Fix OMML equation handling and improve type safety (#3381) (e00735d)

v2.92.0 - 2026-04-29

Feature

Extend the kserve-triton OCR model to have multi-lingual support (#3368) (8b67fae)
docx: Add checkbox parsing support (#3349) (c455a65)
Introduce modular docling-slim package (#3285) (ed32c5e)
Add ResponseFormat.DOCLANG and parsing branch in VLM pipeline (#3350) (0f6f8d0)

Fix

pptx: Skip malformed picture shapes instead of aborting conversion (#3372) (7294248)
docx: OMML conversion failures for unsupported limit functions (#3359) (3df80e7)
Make VLLM model_impl configurable (#3358) (a6a37ca)

v2.91.0 - 2026-04-23

Feature

docx: Extract VML images with v:imagedata elements (#3343) (2ddaa3b)

Fix

Strengthen input validation for METS‑GBS processing (#3336) (c1dbac2)
EasyOCR model downloading (#3339) (5e161ac)
vlm: Remove bogus preamble from VLM chat template (#3351) (c190ba2)
html: Refine image URL and size handling (#3348) (cd0cb69)
Fixes to html_backend (#3342) (9813190)

v2.90.0 - 2026-04-17

Feature

Implement GraniteVisionTableStructureModel for VLM-based table extraction (#3323) (1569e42)

Fix

latex: Fully unwrap deeply nested formatting macros (#3249) (101233e)
docx: Handle inline formulas in list items (#3304) (c761512)
format: Add MD fallback for .txt files in _guess_from_content (#3311) (3bab6b4)
Strip soft hyphen when joining merged text elements (#3232) (8274892)
pptx: Handle NotImplementedError from shape.shape_type (#3309) (043ed2d)

Documentation

Fix nanonets_ocr2 runtime support matrix (#3317) (8ec14f2)

v2.89.0 - 2026-04-16

Feature

Explicit TikZ environment handling in LaTeX backend (#3187) (a15c16e)

Fix

ocr: Align RapidOCR english assets with 3.8 mobile models (#3291) (251c8b2)
docx: Isolate list state in table cells (#3294) (740c386)
pipeline: Prevent cache miss due to pipeline options mutation during chart extraction (#3300) (5b84911)

Documentation

Add indexed picture placeholder example to serialization notebook (#3293) (cd2e5b6)

Performance

markdown: Avoid eager string formatting in Markdown backend debug logs (#3301) (a64c378)

v2.88.0 - 2026-04-13

Feature

service: Establish client SDK for docling serve (#3264) (42157a3)

Fix

ocr: Support rapidocr 3.8 mobile model naming (#3277) (6b257ec)

Documentation

Add agent skill bundle for coding assistants (SKILL.md, pipelines, convert/evaluate) (#3174) (c23622f)

v2.87.0 - 2026-04-13

Feature

vlm: Add Nanonets OCR2 onboarding (#3274) (9970d1e)

Fix

Transformers v5 compatibility for AUTOMODEL_CAUSALLM VLMs (#3276) (d431224)
vlm: Add explicit MLX support for OCR presets (#3272) (27d3cf4)
markdown: Normalize repeated leading dash markers (#3286) (a6aeddf)
docx: Preserve inline SDT references (#3280) (6cb1bc0)
pptx: Respect page_range during conversion (#3282) (e4fd937)
vlm: Support tool-calling API responses (#3271) (9c3ab93)
pdf: Extend ligature map with Dutch IJ and PUA glyph U+F0A0 (#3254) (ab5254d)

Documentation

Add AG2 multi-agent document analysis example (#3261) (1fed840)

v2.86.0 - 2026-04-10

Feature

Support for GraniteVision v4 (#3217) (fd83420)
Add signature/stamp html block to DC document (#3251) (9b4b67b)
vlm: Add PARTIAL_SUCCESS status for VLM pipeline pages (#3215) (6699642)

Fix

latex: Discard arguments of filtered spacing commands (#3245) (6180925)

Documentation

Chart understanding in README (#3253) (d5af473)

v2.85.0 - 2026-04-07

Feature

Add support for Falcon-OCR (#3237) (d0e19be)
Add support for LightOnOCR-2-1B (#3213) (f2affd7)

Fix

latex: Expand custom macro parameters (#3223) (77a2505)

v2.84.0 - 2026-04-01

Feature

Glm ocr (#3146) (a9265d8)
Switch to the latest version of DocumentFigureClassifier model v2.5 (#3171) (d046390)
Remove the deprecation of extraction (#3220) (e9a39e8)

v2.83.0 - 2026-03-31

Feature

Upgrade to transformers v5 (#3200) (d2c6357)
OCR model for remote KServe v2 API (#3189) (8522b00)

Fix

pdf: Propagate hyperlinks to DoclingDocument text items (#3131) (524edcc)
xlsx: Guard last-row bounds in Excel table scan (#3197) (85ac377)
Parse LaTeX macros in multicolumn/multirow table cells (#3204) (89c68f8)
Handle empty CSV file without crashing (#3196) (f283484)

Documentation

Add line-based chunker documentation and examples (#3210) (3a64f41)

v2.82.0 - 2026-03-25

Feature

Implementation of HTML backend with headless browser (#2969) (1c74a9b)

Fix

omml: Correct LaTeX output for fractions, math operators, and functions (#3122) (e36125b)
Manage PDFium backend resource lifecycles to avoid SIGSEGV/SIGTRAP crashes (#3180) (a0fc3c9)
docx: Split multiple OMML equations into separate formula items (#3123) (90d6dd4)
Let user params override engine defaults in API VLM engine (#3116) (fdf5e20)
vlm: Handle content_filter finish reason in API responses (#3051) (f0e3d1d)
cli: Avoid generating images for non-image exports (#3127) (5473e07)
Honor picture description batching and scale options (#3132) (9abf0fd)

Documentation

Fix Erroneous vLLM VLM pipeline engine option params causing empty/bad responses (#3167) (fffd445)

v2.81.0 - 2026-03-20

Feature

Route plain-text and Quarto/R Markdown files to the Markdown backend (#3161) (96d7c7e)

Fix

docx: Missing list items after numbered header (#2665) (#2678) (2f7c09e)
Avoid thread-unsafe close of pypdfium backend (#3160) (afb4bb6)
Handle external image relationships in MsWordDocumentBackend (#3114) (8ae0974)
Handle PermissionError for directory input on Windows CLI (#3149) (a39317a)
Avoid in-place mutation of pipeline options breaking cache key (#3115) (412af62)
Preserve torch_dtype in get_engine_config and add it to CodeFormulaV2 (#3117) (53a5f80)
Release image backend resources after frame extraction (#3134) (1e841eb)

v2.80.0 - 2026-03-14

Feature

Add the VllmCudaGraphMode (#3125) (f950679)

v2.79.0 - 2026-03-12

Feature

Add fact metadata and linkbase relationships for XBRL (#3084) (7952efe)

Fix

Use OCR cells with TableFormer v2 (#3107) (93f6fee)
Add self-consistency check in the table-structure model (#3105) (2a0e11f)
Correct typos in log messages and add missing error log (#3097) (198d0af)
Don't force cast to float32 in API Kserve v2 inputs (#3101) (fef01f8)

v2.78.0 - 2026-03-10

Feature

Add support for TableFormer v2 (#3013) (4ccd1d4)
Add gRPC transport for KServe v2 API engine (#3074) (3d90778)

Fix

html: Fix broken document tree and quadratic complexity in rich table cells (#3025) (80f75b8)
Loosen dependency for pandas3 (#3095) (5188180)
Add parse timeout to legacy LaTeX documents (#3019) (1192714)
msword: Skip GroupItem targets without comments attribute (#3080) (ee16285)

Documentation

Fix code in rag langchain chunker tokenizer (#2993) (d113e61)
Update code snippet to use modern pipeline options syntax (#3087) (95b759e)
Set HuggingFaceEndpoint task for Mixtral examples (#2945) (5d3ac38)

v2.77.0 - 2026-03-06

Feature

Track vlm_inference time for mlx_model pipeline (#3060) (38c4bb2)
Add configurable graph_optimization_level for ONNX Runtime engines (#3071) (cfc6636)

Fix

docx: Preserve URL fragments and query params in hyperlinks (#3050) (cd9dd10)
Detect Office Open XML formats from ZIP contents when filename has no extension (#3073) (56f06fe)
readingorder: Assign FURNITURE content_layer to footer/header in container groups (#3044) (f7cb304)
docx: Handle list items immediately after numbered headings (#3070) (56eb127)
rapidocr: ORT thread configuration for RapidOCR backend (#3062) (68336c2)

Documentation

Add examples and fix docstring bug in DocumentConverter (#3064) (653940e)
Add docstrings to PipelineOptions classes (#3065) (8b99085)

v2.76.0 - 2026-03-02

Feature

Export to WebVTT format (#3036) (d276e60)

Fix

xlsx: Handle OneCellAnchor images in Excel backend (#3045) (859c302)
Normalize Unicode ligatures in PDF text extraction (#3057) (6198e69)
ocr: Update RapidOCR torch GPU config key (#3049) (477359b)
Convert PIL images to RGB before picture description (#3014) (90ce93d)
msword: Use outlineLvl for heading levels and clamp to minimum 1 (#2916) (a3d2b4b)

Documentation

Add metaxy integration (#3058) (7aacc6c)
Removes merge conflict artifacts (#3055) (672125c)
Add audio & video processing guide (#3038) (1321b39)
Add XBRL conversion example notebook and update feature listings (#3039) (1eb5c21)

v2.75.0 - 2026-02-24

Feature

Create a backend parser for XBRL instance reports (#3017) (334ba6e)
Unified model-family inference engines (including image-classification) and KServe v2 API support (#2979) (0353293)

Fix

Skip ASR segments when length is zero (#2998) (6b824f8)
docx: Guard against None hyperlink address in _get_paragraph_elements (#2367) (#3022) (236216e)

v2.74.0 - 2026-02-17

Feature

Introduce docling-parse v5 and deprecate old docling-parse backends (#2872) (bf417e6)

Fix

Security vulnerabilities with XML External Entity and related attacks (#3009) (576bada)
csv: Set default delimiter by default (#3005) (a1b0e3f)
Improved deserialization of engine_options (#3008) (dbba6ea)

v2.73.1 - 2026-02-13

Fix

asciidoc: Handle commas in image alt text (#2983) (86b6912)
Use timezone-aware datetime (#2947) (e2870f9)
Add failed pages to DoclingDocument for page break consistency (#2939) (1f91482)

v2.73.0 - 2026-02-11

Feature

Inference engines abstraction for object detection model family with HF Transformers and ONNX runtime (#2959) (14e474c)
Added support for parsing LaTeX (.tex) documents (#2890) (e6ccb8b)
Introduce pluggable VLM runtime system with preset-based configuration (#2919) (d4c8713)

Fix

Restore expected behavior for artifacts_path and accelerator_options in VLM engines (#2961) (9721321)
Allow offline chart extraction model artifacts (#2957) (ae4fdbb)

Documentation

Add LaTeX and WebVTT as supported types (#2974) (704ef0a)

v2.72.0 - 2026-02-03

Feature

Add chart extraction models (#2848) (fe45c71)

Fix

backend: Improve Excel table bounds detection and flatten merged cells (#2778) (3110c43)
pptx: Handle picture shapes with external image references (#2914) (5e452a2)

Documentation

Add granite vision for charts (#2946) (a5ad8f2)

v2.71.0 - 2026-01-30

Feature

Webvtt and source tracker (#2787) (0602a7c)
Add support for Word document comments extraction (#2834) (b6ca094)

Fix

Allow newer typer versions (#2930) (6f205ae)
rapidocr: Use new model links for RapidOCR (#2928) (82b7982)
Presets for ollama (#2926) (4a269de)

v2.70.0 - 2026-01-23

Feature

Drop support for Python 3.9 (#2905) (7f38658)

Fix

md: Handle pipe symbols that are not table markers (#2904) (86eaef5)
Remove direct vllm dependency (#2910) (7a1952a)
PPTX parsing: bullet points not grouped correctly under subheadings (#2663) (#2855) (999dbb2)

Documentation

Add comprehensive docstrings to PdfPipelineOptions (#2827) (ab91786)

v2.69.1 - 2026-01-21

Fix

Off-by-one error for page indexing in vlm_pipeline (#2902) (08f49e2)

v2.69.0 - 2026-01-20

Feature

New picture classifier v2.0 (#2889) (43badc3)
Add classification filters for picture description (#2836) (ac16a26)

Fix

Torch compatibility for xpu (#2894) (00273f6)
Standardize page_no to 1-based indexing (#2847) (1b4d82d)
Usage of direct logging (#2884) (2fe9def)
Relax pypdfium2 version constraint, support 5.x (#2873) (daf2bc6)

Documentation

Correct broken link to supported formats (#2878) (16e88d5)

v2.68.0 - 2026-01-13

Feature

Support for DeepSeek-OCR in VLM pipeline (#2798) (19af03f)

Fix

logging: Include page numbers in preprocess error messages (#2858) (89bea24)
docx: Handle grouped pictures (#2861) (5c1f8f0)

Documentation

Fix Colab badge links and Weaviate typo in docs examples (#2871) (72851cc)
example: Fix update sample image path to be relative (#2864) (211c759)
Add Semantica integration (#2860) (bf80e32)

v2.67.0 - 2026-01-09

Feature

Enrichment annotations in the new meta format (#2859) (aab3ff5)
Add XPU device support for Intel GPUs (#2809) (2b83fdd)
Add option to report timings details (#2772) (cbc6537)

Fix

Lock new deps and update python 3.14 warnings (#2844) (d9295df)
Correct type hint for table_structure_options usage (#2823) (a0530a2)
Transformers models lazy-loaded (#2826) (3ef4525)
Font download by passing font_path to RapidOcr (#2822) (ffafe58)
cli: Add Layout and Table models to --show-external-plugins (#2832) (ed57089)

v2.66.0 - 2025-12-24

Feature

Add preset for using granite-docling via vllm and other apis (#2792) (241d19e)

Fix

docx: Handle tables with merged cells causing IndexError (#2813) (faff935)
markdown: Allow text before headers also in mixed markdown and html (#2801) (595115d)

Documentation

RTX: Guidelines for best performance on RTX GPUs (#2765) (be085c0)
Add docstrings to DocumentConverter #2748 (#2782) (cc5e3ce)
style: Fix link visibility in dark mode (#2804) (150fe90)

v2.65.0 - 2025-12-15

Feature

Add YAML output format to CLI (#2768) (da7678a)

Fix

rapidocr: Use correct parameter name for rec_keys_path (#2762) (1d78418)
docx: Handle missing value in paragraph style name (#2761) (a97d950)

Documentation

Add Pydantic field documentation for PipelineOptions (#2771) (7c24b01)
gpu: Add benchmarks of standard pipeline with OCR (#2764) (d03439c)

v2.64.1 - 2025-12-09

Fix

Clear word/char cells when force_full_page_ocr is used (#2738) (1df0560)
Add missing font download in the rapidocr artifacts (#2735) (edbabfc)
Ensure proper image_scale for generated page images in VLM pipelines (#2728) (609069d)
html: Tackle paragraphs with block-level elements (#2720) (d007ba0)
html: Prevent hierarchy reset in rich table cells (#2716) (aebe25c)
docx: Parse integrals as n-ary objects without chr element (#2712) (c97715f)

v2.64.0 - 2025-12-02

Feature

experimental: Add experimental TableCropsLayoutModel (#2669) (1344362)
Factory and plugin-capability for Layout and Table models (#2637) (ad97e52)

Fix

InputFormat.IMAGE must have correct pipeline (#2707) (6ef4ffd)
Do not consider singleton cells in xlsx as TableItems but rather TextItems (#2589) (54cd6d7)
docx: Missing list items after numbered header (#2665) (e580554)

Documentation

Example on how to apply external OCR as post processing (#2517) (fa21128)
More GPU results and improvements in the example docs (#2674) (b75c646)
Fix typo on jobkit page (#2671) (146b4f0)

v2.63.0 - 2025-11-20

Feature

Add save and load for conversion result (#2648) (b559813)

Fix

Respect document_timeout in new threaded StandardPdfPipeline (#2653) (2087c6b)
In DocumentConverter.convert_string() make nullable name parameter optional (#2660) (6fb9a5f)
Enable GPU for RapidOCR when available (#2659) (463a3fd)
Remove py3.14 requirement for default rapidocr (#2639) (da4c2e9)

Documentation

Add Hector as compatible AI agent platform integration (#2662) (ce5a099)
Added documentation to use SuryaOCR via plugin docling-surya (#2533) (b216ad8)
Fix broken homepage links (#2651) (03e7c7d)
examples: Processing parquet file of images (#2641) (8af228f)
Move Installation and Quickstart (Usage) under Getting started (#2644) (d549445)
Add redirection from getting started page (#2640) (ac9fc58)
examples: Remove deprecation warnings with export_to_dataframe (#2638) (f552862)

v2.62.0 - 2025-11-17

Feature

Add the Image backend (#2627) (3495b73)
experimental: Layout + VLM model with layout prompt (#2244) (4852d8b)

Fix

Correct the model-repo name (#2624) (14b436d)
docx: Parse page headers and footers (#2599) (054c4a6)

Documentation

Combine Home and Getting Started pages (#2600) (ae30373)

v2.61.2 - 2025-11-10

Fix

Default to EasyOCR in Python 3.14 (#2605) (5c27567)

v2.61.1 - 2025-11-06

Fix

docx: Slow table parsing (#2553) (ef623ff)
html: Slow table parsing (#2582) (0ba8d5d)

Documentation

Make navigation menus collapse and expand (#2573) (8da3d28)

v2.61.0 - 2025-11-06

Feature

vlm: Track generated tokens and stop reasons for VLM models (#2543) (6a04e27)

Fix

Temporarily pin NuExtract to working revision (#2588) (fa92574)
ocr: Use PSM integer values directly instead of constructor (#2578) (1a5146a)

v2.60.1 - 2025-11-04

Fix

Extract response from api_image_request in picture description (#2571) (8360aa5)

v2.60.0 - 2025-10-31

Feature

Use threading in the standard pipeline and move old behavior to legacy (#2452) (268d027)

Fix

pdf: Threadsafe for pypdfium2 backend (#2527) (a51275d)

Documentation

Update link to Open WebUI docs (#2549) (01577e9)
Update installation options with extras and review FAQ (#2548) (cb10043)
Fix typos (#2546) (741c44f)

v2.59.0 - 2025-10-30

Feature

vlm: Add num_tokens as attribtue for VlmPrediction (#2489) (b6c892b)
Support for Python 3.14 (#2530) (cdffb47)

Fix

Xlsx cell parsing, now returning values instead of formulas (#2520) (d9c90eb)

Documentation

Add details and examples on optimal GPU setup (#2531) (97aa06b)
Update opensearch notebook and backend documentation (#2519) (9a6fdf9)

v2.58.0 - 2025-10-22

Feature

pdf: Support for password-protected PDF documents (#2499) (bbe82a6)
backend: Add generic options support and HTML image handling modes (#2011) (a30e6a7)
ASR: MLX Whisper Support for Apple Silicon (#2366) (657ce8b)

Fix

markdown: Set the correct discriminator in md backend options (#2501) (4227fcc)
xlsx: Speed up by detecting the true last non-empty row/column (#2404) (b66624b)

Documentation

Fix typo in mcp.md (#2502) (86556d8)
Discord badge with join link (#2473) (dd03b53)

Performance

Use docling-parse-v4 as default (#2503) (89820d0)

v2.57.0 - 2025-10-15

Feature

docx: Process drawingml objects in docx (#2453) (1682993)

Fix

Use proper page concatentation in VLM pipeline MD/HTML conversion (#2458) (cd7f7ba)

Documentation

Example on PII obfuscation (#2459) (3e6da2c)

v2.56.1 - 2025-10-13

Fix

Avoid downloading easyocr models by default (#2454) (688a7df)

v2.56.0 - 2025-10-13

Feature

AutoOCR model selecting the best OCR model available and deprecating the usage of EasyOCR (#2391) (f7244a4)
Add Tesseract PSM options support (#2411) (f11f8c0)

Fix

asr: Implement robust status check in AsrPipeline (#2442) (db985bb)
Deal with chartsheets in workbooks (#2433) (cce18b2)
Skip temporary docx files (#2413) (ee55013)
AsrPipeline to handle absolute paths and BytesIO streams correctly (#2407) (b5f7fef)
Enrichment of documents without pages metadata (pptx and xlsx) (#2401) (0610d01)
Proper heading support in rich tables for HTML backend (#2394) (9705f40)

Documentation

Remove deprecated call in custom_convert.py (#2447) (9020044)
Fixed a few typos (#2441) (2a0f563)
Add MongoDB + VoyageAI (#2382) (f2854b2)
Add RAG example with MongoDB Atlas Vector Search and VoyageAI embeddings (#2341) (8a4b946)

v2.55.1 - 2025-10-03

Fix

markdown: Setext heading support (#2359) (ee73ffa)
docs: Fixed the color scheme (#2371) (246de77)
Empty table handling (#2365) (ca2be7f)
Add table raw content when no table structure model is used (#1815) (4f295ed)

Documentation

Example using Hashicorp Vault PII transform (#2373) (a975a79)
Jobkit and connectors (#2357) (e6c3b05)

v2.55.0 - 2025-09-30

Feature

Repetition-based StoppingCriteria for GraniteDocling (#2323) (1e9dc43)
Rich tables support for HTML backend (#2324) (c803abe)

Fix

Pin wider range of typer (#2309) (68ae7cc)
Update Transformers & VLLM inference code, CLI and VLM specs (#2322) (654c70f)
Support escaped characters in markdown backend (#2304) (9d67bb9)

Documentation

styling: Update color scheme (#2154) (325877a)
vlm: Update SmolDocling to GraniteDocling references (#2315) (a873200)

v2.54.0 - 2025-09-22

Feature

Rich tables for MSWord backend (#2291) (e2482a2)
Add a backend parser for WebVTT files (#2288) (46efaae)

Fix

Correct y-axis scaling in draw_table_cells (#2287) (b5628f1)

Documentation

Update API VLM example with granite-docling (#2294) (8b7e83a)
Fix examples rendering (#2281) (8322c2e)

v2.53.0 - 2025-09-17

Feature

Add granite-docling model (#2272) (17afb66)
RapidOcr: Support generic extra arguments for RapidOcr (#2266) (0e95171)

Fix

Handle empty result from RapidOCR to avoid crash (#2264) (609d902)

Documentation

Describe examples (#2262) (ff351fd)

v2.52.0 - 2025-09-11

Feature

Enrichment steps on all convert pipelines (incl docx, html, etc) (#2251) (2c91234)

Fix

Add missing features in ThreadedStandardPdfPipeline (#2252) (0700af2)
Address deprecation warnings of dependencies (#2237) (c696549)

Documentation

Add an example of RAG with OpenSearch (#2238) (f8cc545)
Add instructions for using Docling with MCP to README (#2219) (e5cd702)
Document VLM support requirement in extraction example (#2231) (55f5f37)

v2.51.0 - 2025-09-05

Feature

Updating default parameters to get better performance with docling-parse (#2208) (b49d1ad)
Updated the backend for new docling-parse (#2187) (b3d7542)

Documentation

Add information extraction example (#2199) (a9f41b0)

v2.50.0 - 2025-09-03

Feature

Heron layout model as new default (#1971) (e38aa0f)

Fix

html: Access to variable not yet declared (#2171) (293e81b)

v2.49.0 - 2025-09-01

Feature

[Beta] Extraction with schema (#2138) (9f4bc5b)
msexcel: Set ContentLayer.INVISIBLE for invisible sheet (#1876) (a283ccf)

Fix

pypdfium2: Fix OCR bounding box misalignment caused by mismatched rotation metadata (#2039) (4d94e38)
Translation example (#2166) (9f0286b)
Extend offline mode for rapidocr fonts (#2155) (9904d14)

Documentation

Enrich landing pages (#2165) (96cab6b)

v2.48.0 - 2025-08-26

Feature

Upgrade to RapidOCR 3.x (#2088) (3f60a0f)

Fix

html: Preserve code blocks in list items (#2131) (fa3327e)

v2.47.1 - 2025-08-23

Fix

Vllm extra only for linux x86_64 (#2126) (488f6cd)

v2.47.0 - 2025-08-22

Feature

CLI: Option to download arbitrary HuggingFace model (#2123) (cdf079d)
Batching support for VLMs in transformers backend, add initial VLLM backend (#2094) (3c660c0)
html: Support formatting tags in HTML texts (#2111) (94fcc46)

Fix

Improve numbered list detection for msword docs (#2100) (3f03709)

Documentation

DPK pipeline example using docling library (#2112) (e76298c)
Add Getting Started page (#2113) (8996d61)

v2.46.0 - 2025-08-20

Feature

New code formula model (#2042) (d2494da)

Fix

HTML: Parse footer tag as a group in furniture content layer (#2106) (c5f2e2f)

Performance

Clean up resources with docling-parse v4, no parsed_page output by default (#2105) (5f57ff2)
Speed up function _parse_orientation (#1934) (8820b55)

v2.45.0 - 2025-08-18

Feature

Add backend for METS with Google Books profile (#1989) (31087f3)
html: Support in-line anchor tags in HTML texts (#1659) (9687297)
vlm: Ability to preprocess VLM response (#1907) (5f050f9)

Documentation

Add docling Quarkus integration (#2083) (76c1fbd)

v2.44.0 - 2025-08-12

Feature

Add convert_string to document-converter (#2069) (b09033c)

Fix

html: Parse rawspan and colspan when they include non numerical values (#2048) (ed56f2d)
Support new mlx-vlm module (#2001) (0130e3a)
Extend error reporting when verbose logging is enabled (#2017) (2eb760d)
HTML: Replace non-standard Unicode characters (#2006) (86f7012)

Documentation

Add Langflow integration (#2068) (e2cca93)
Add Arconia integration (#2061) (bfda6d3)

v2.43.0 - 2025-07-28

Feature

Threaded PDF pipeline (#1951) (aed772a)

Fix

markdown: Ensure correct parsing of nested lists (#1995) (aec29a7)
HTML: Remove an unnecessary print command (#1988) (945721a)

v2.42.2 - 2025-07-24

Fix

HTML: Concatenation of child strings in table cells and list items (#1981) (5132f06)
docx: Adding plain latex equations to table cells (#1986) (0b83609)
Preserve PARTIAL_SUCCESS status when document timeout hits (#1975) (98e2fcf)
Multi-page image support (tiff) (#1928) (8d50a59)

Documentation

Add chat with dosu (#1984) (7b5f860)

v2.42.1 - 2025-07-22

Fix

Keep formula clusters also when empty (#1970) (67441ca)

Documentation

Enrich existing DoclingDocument (#1969) (90a7cc4)
Add documentation for confidence scores (#1912) (5d98bce)

v2.42.0 - 2025-07-18

Feature

Add option to control empty clusters in layout postprocessing (#1940) (a436be7)

Fix

Safe pipeline init, use device_map in transformers models (#1917) (cca05c4)
Fix HTML table parser and JATS backend bugs (#1948) (e1e3053)
KeyError: 'fPr' when processing latex fractions in DOCX files (#1926) (95e7096)
Change granite vision model URL from preview to stable version (#1925) (c5fb353)

Documentation

Fix typos (#1943) (d6d2dbe)

v2.41.0 - 2025-07-10

Feature

Layout model specification and multiple choices (#1910) (2b8616d)
Enable precision control in float serialization (#1914) (ec588df)
Add image-text-to-text models in transformers (#1772) (a07ba86)
vlm: Dynamic prompts (#1808) (b8813ee)

Fix

ocr-utils: Unit test and fix the rotate_bounding_box function (#1897) (931eb55)
Docs are missing osd packages for tesseract on RHEL (#1905) (e25873d)
Use only backend for picture classifier (#1904) (edd4356)
Typo in asr options (#1902) (dd8fde7)

v2.40.0 - 2025-07-04

Feature

Introduce LayoutOptions to control layout postprocessing behaviour (#1870) (ec6cf6f)
Integrate ListItemMarkerProcessor into document assembly (#1825) (56a0e10)

Fix

Secure torch model inits with global locks (#1884) (598c9c5)
Ensure that TesseractOcrModel does not crash in case OSD is not installed (#1866) (ae39a94)

Performance

msexcel: _find_table_bounds use iter_rows/iter_cols instead of Worksheet.cell (#1875) (13865c0)
Move expensive imports closer to usage (#1863) (3089cf2)

v2.39.0 - 2025-06-27

Feature

Leverage new list modeling, capture default markers (#1856) (0533da1)

Fix

markdown: Make parsing of rich table cells valid (#1821) (e79e4f0)

v2.38.1 - 2025-06-25

Fix

Updated granite vision model version for picture description (#1852) (d337825)
markdown: Fix single-formatted headings & list items (#1820) (7c5614a)
Fix response type of ollama (#1850) (41e8cae)
Handle missing runs to avoid out of range exception (#1844) (4002de1)

v2.38.0 - 2025-06-23

Feature

Support audio input (#1763) (1557e7c)
markdown: Add formatting & improve inline support (#1804) (861abcd)
Maximum image size for Vlm models (#1802) (215b540)

Fix

docx: Ensure list items have a list parent (#1827) (d26dac6)
msword_backend: Identify text in the same line after an image #1425 (#1610) (1350a8d)
Ensure uninitialized pages are removed before assembling document (#1812) (dd7f64f)
Formula conversion with page_range param set (#1791) (dbab30e)

Documentation

Update readme and add ASR example (#1836) (f3ae302)
Support running examples from root or subfolder (#1816) (64ac043)

v2.37.0 - 2025-06-16

Feature

Make Page.parsed_page the only source of truth for text cells, add OCR cells to it (#1745) (7d3302c)
Support xlsm files (#1520) (df14022)

Fix

Pptx line break and space handling (#1664) (f28d23c)
asciidoc: Set default size when missing in image directive (#1769) (b886e4d)
Handle NoneType error in MsPowerpointDocumentBackend (#1747) (7a275c7)
Prov for merged-elems (#1728) (6613b9e)
tesseract: Initialize df_osd to avoid uninitialized variable error (#1718) (e979750)
Allow custom torch_dtype in vlm models (#1735) (f7f3113)
Improve extraction from textboxes in Word docs (#1701) (9dbcb3d)
Add WEBP to the list of image file extensions (#1711) (a2b83fe)

Documentation

Update vlm models api examples with LM Studio (#1759) (0432a31)
Add open webui (#1734) (49b10e7)

v2.36.1 - 2025-06-04

Fix

Remove typer and click constraints (#1707) (8846f1a)

Documentation

Flash-attn usage and install (#1706) (be42b03)

v2.36.0 - 2025-06-03

Feature

Simplify dependencies, switch to uv (#1700) (cdd4018)
New vlm-models support (#1570) (cfdf4ce)

v2.35.0 - 2025-06-02

Feature

Add visualization of bbox on page with html export. (#1663) (b356b33)

Fix

Guess HTML content starting with script tag (#1673) (984cb13)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd0 in position 0: invalid continuation byte (#1665) (51d3450)

Documentation

Fix typo in index.md (#1676) (11ca4f7)

v2.34.0 - 2025-05-22

Feature

ocr: Auto-detect rotated pages in Tesseract (#1167) (45265bf)
Establish confidence estimation for document and pages (#1313) (9087524)

Fix

Fix ZeroDivisionError for cell_bbox.area() (#1636) (c2f595d)
integration: Update the Apify Actor integration (#1619) (14d4f5b)

v2.33.0 - 2025-05-20

Feature

Add textbox content extraction in msword_backend (#1538) (12a0e64)

Fix

Fix issue with detecting docx files, and files with upper case extensions (#1609) (f4d9d41)
Load_from_doctags static usage (#1617) (0e00a26)
Incorrect force_backend_text behaviour for VLM DocTag pipelines (#1371) (f2e9c07)
pypdfium: Resolve overlapping text when merging bounding boxes (#1549) (98b5eeb)

v2.32.0 - 2025-05-14

Feature

Improve parallelization for remote services API calls (#1548) (3a04f2a)
Support image/webp file type (#1415) (12dab0a)

Fix

ocr: Orig field in TesseractOcrCliModel as str (#1553) (9f8b479)
settings: Fix nested settings load via environment variables (#1551) (2efb7a7)

Documentation

Add advanced chunking & serialization example (#1589) (9f28abf)

v2.31.2 - 2025-05-13

Fix

AsciiDoc header identification (#1562) (#1563) (4046d0b)
Restrict click version and update lock file (#1582) (8baa85a)

v2.31.1 - 2025-05-12

Fix

Add smoldocling in download utils (#1577) (127e386)
HTML: Handle row spans in header rows (#1536) (776e7ec)
Mime error in document streams (#1523) (f1658ed)
Usage of hashlib for FIPS (#1512) (7c70573)
Guard against attribute errors in TesseractOcrModel del (#1494) (4ab7e9d)
Enable cuda_use_flash_attention2 for PictureDescriptionVlmModel (#1496) (cc45396)
Updated the time-recorder label for reading order (#1490) (976e92e)
Incorrect scaling of TableModel bboxes when do_cell_matching is False (#1459) (94d66a0)

Documentation

Update links in data_prep_kit (#1559) (844babb)
Add serialization docs, update chunking docs (#1556) (3220a59)
Update supported formats guide (#1463) (3afbe6c)

v2.31.0 - 2025-04-25

Feature

Add tutorial using Milvus and Docling for RAG pipeline (#1449) (a2fbbba)

Fix

html: Handle address, details, and summary tags (#1436) (ed20124)
Treat overflowing -v flags as DEBUG (#1419) (8012a3e)
codecov: Fix codecov argument and yaml file (#1399) (fa7fc9e)

Documentation

Fix wrong output format in example code (#1427) (c2470ed)
Add OpenSSF Best Practices badge (#1430) (64918a8)
Typo fixes in docling_document.md (#1400) (995b3b0)
Updated the [Usage] link in architecture.md (#1416) (88948b0)
ocr: Add docs entry for OnnxTR OCR plugin (#1382) (a7dd59c)
security: More statements about secure development (#1381) (293c28c)
Add testing in the docs (#1379) (01fbfd5)
Add Notes for Installing in Intel macOS (#1377) (a026b4e)

v2.30.0 - 2025-04-14

Feature

cli: Add option for html with split-page mode (#1355) (c0ba88e)
xlsx: Create a page for each worksheet in XLSX backend (#1332) (eef2bde)
OllamaVlmModel for Granite Vision 3.2 (#1337) (c605edd)

Fix

deps: Widen typer upper bound (#1375) (7e40ad3)
Auto-recognize .xlsx, .docx and .pptx files (#1340) (0de70e7)
docx: Declare image_data variable when handling pictures (#1359) (415b877)
Implement PictureDescriptionApiOptions.bitmap_area_threshold (#1248) (2503999)
Properly address page in pipeline _assemble_document when page_range is provided (#1334) (6b696b5)

v2.29.0 - 2025-04-10

Feature

Handle tags as code blocks (#1320) (0499cd1)


docx: Add text formatting and hyperlink support (#630) (bfcab3d)


Fix

docx: Adding new latex symbols, simplifying how equations are added to text (#1295) (14e9c0c)
pptx: Check if picture shape has an image attached (#1316) (dc3bf9c)
docx: Improve text parsing (#1268) (d2d6874)
Tesseract OCR CLI can't process images composed with numbers only (#1201) (b3d111a)

Documentation

Add plugins docs (#1319) (2e99e5a)
Add visual grounding example (#1270) (71148eb)

v2.28.4 - 2025-03-29
Fix

Fixes tables when using OCR (#1261) (7afad7e)

v2.28.3 - 2025-03-28
Fix

Word-level pdf cells for tables (#1238) (8bd71e8)

v2.28.2 - 2025-03-26
Fix

Improve HTML layer detection, various MD fixes (#1241) (9210812)
html: Fix HTML parsed heading level (#1244) (85c4df8)

v2.28.1 - 2025-03-25
Fix

converter: Cache same pipeline class with different options (#1152) (825b226)
debug: Missing translation of bbox to to_bounding_box (#1220) (6df8827)
docx: Identifying numbered headers (#1231) (f739d0e)

Documentation

examples: Batch conversion doc raises_on_error (#1147) (0974ba4)

v2.28.0 - 2025-03-19
Feature

SmolDocling: Support MLX acceleration in VLM pipeline (#1199) (1c26769)
Add PPTX notes slides (#474) (b454aa1)
Updated vlm pipeline (with latest changes from docling-core) (#1158) (2f72167)

Fix

Determine correct page size in DoclingParseV4Backend (#1196) (f5adfb9)
msword: Fixing function return in equations handling (#1194) (0b707d0)

Documentation

Linux Foundation AI & Data (#1183) (1d680b0)
Move apify to docs (#1182) (54a78c3)

v2.27.0 - 2025-03-18
Feature

Add factory for ocr engines via plugins (#1010) (6eaae3c)
Add DoclingParseV4 backend, using high-level docling-parse API (#905) (3960b19)
actor: Docling Actor on Apify infrastructure (#875) (772487f)
Equations to latex in MSWord backend (with inline groups) (#1114) (6eb718f)

Fix

html: Handle nested empty lists (#1154) (f94da44)
Use first table row as col headers (#1156) (0945973)
Pass tests, update docling-core to 2.22.0 (#1150) (aa92a57)

Documentation

Fix spelling of picture in usage (#1165) (7e01798)

v2.26.0 - 2025-03-11
Feature

Use new TableFormer model weights and default to accurate model version (#1100) (eb97357)

Fix

CLI: Fix help message for abort options (#1130) (4d64c4c)

Documentation

Add description of DOCLING_ARTIFACTS_PATH env var (#1124) (e1c49ad)

Performance

New revision code formula model and document picture classifier (#1140) (5e30381)

v2.25.2 - 2025-03-05
Fix

Proper handling of orphan IDs in layout postprocessing (#1118) (c56ab3a)

Documentation

Enrichment models (#1097) (357d41c)

v2.25.1 - 2025-03-03
Fix

Enable locks for threadsafe pdfium (#1052) (8dc0562)
html: Use 'start' attribute when parsing ordered lists from HTML docs (#1062) (de7b963)

Documentation

Improve docs on token limit warning triggered by HybridChunker (#1077) (db3ceef)

v2.25.0 - 2025-02-26
Feature

[Experimental] Introduce VLM pipeline using HF AutoModelForVision2Seq, featuring SmolDocling model (#1054) (3c9fe76)
cli: Add option for downloading all models, refine help messages (#1061) (ab683e4)

Fix

Vlm using artifacts path (#1057) (e197225)
html: Parse text in div elements as TextItem (#1041) (1b0ead6)

Documentation

Extend chunking docs, add FAQ on token limit (#1053) (c84b973)

v2.24.0 - 2025-02-20
Feature

Implement new reading-order model (#916) (c93e369)

v2.23.1 - 2025-02-20
Fix

Runtime error when Pandas Series is not always of string type (#1024) (6796f0a)

Documentation

Revamp picture description example (#1015) (27c0400)

v2.23.0 - 2025-02-17
Feature

Support cuda:n GPU device allocation (#694) (77eb77b)
xml-jats: Parse XML JATS documents (#967) (428b656)

Fix

Revise DocTags, fix iterate_items to output content_layer in items (#965) (6e75f0b)

v2.22.0 - 2025-02-14
Feature

Add support for CSV input with new backend to transform CSV files to DoclingDocument (#945) (00d9405)
Introduce the enable_remote_services option to allow remote connections while processing (#941) (2716c7d)
Allow artifacts_path to be defined as ENV (#940) (5101e25)

Fix

Update Pillow constraints (#958) (af19c03)
Fix the initialization of the TesseractOcrModel (#935) (c47ae70)

Documentation

Update example Dockerfile with download CLI (#929) (7493d5b)
Examples for picture descriptions (#951) (2d66e99)

v2.21.0 - 2025-02-10
Feature

Add content_layer property to items to address body, furniture and other roles (#735) (cf78d5b)

v2.20.0 - 2025-02-07
Feature

Describe pictures using vision models (#259) (4cc6e3e)

Fix

Remove unused httpx (#919) (c18f47c)

v2.19.0 - 2025-02-07
Feature

New artifacts path and CLI utility (#876) (ed74fe2)

Fix

markdown: Handle nested lists (#910) (90b766e)
Test cases for RTL programmatic PDFs and fixes for the formula model (#903) (9114ada)
msword_backend: Handle conversion error in label parsing (#896) (722a6eb)
Enrichment models batch size and expose picture classifier (#878) (5ad6de0)

Documentation

Introduce example with custom models for RapidOCR (#874) (6d3fea0)

v2.18.0 - 2025-02-03
Feature

Expose equation exports (#869) (6a76b49)
Add option to define page range (#852) (70d68b6)
docx: Support of SDTs in docx backend (#853) (d727b04)
Python 3.13 support (#841) (4df085a)

Fix

markdown: Fix parsing if doc ending with table (#873) (5ac2887)
markdown: Add support for HTML content (#855) (94751a7)
docx: Merged table cells not properly converted (#857) (0cd81a8)
Processing of placeholder shapes in pptx that have text but no bbox (#868) (eff16b6)
KeyError in tableformer prediction (#854) (b1cf796)
Fixed docx import with headers that are also lists (#842) (2c037ae)
Use new add_code in html backend and add more typing hints (#850) (2a1f8af)
markdown: Fix empty block handling (#843) (bccb022)
Fix for the crash when encountering WMF images in pptx and docx (#837) (fea0a99)

Documentation

Updated the readme with upcoming features (#831) (d7c0828)
Add example for inspection of picture content (#624) (f9144f2)

v2.17.0 - 2025-01-28
Feature

CLI: Expose code and formula models in the CLI (#820) (6882e6c)
Add platform info to CLI version printout (#816) (95b293a)
ocr: Expose rec_keys_path in RapidOcrOptions to support custom dictionaries (#786) (5332755)
Introduce automatic language detection in TesseractOcrCliModel (#800) (3be2fb5)

Fix

Fix single newline handling in MD backend (#824) (5aed9f8)
Use file extension if filetype fails with PDF (#827) (adf6353)
Parse html with omitted body tag (#818) (a112d7a)

Documentation

Document Docling JSON parsing (#819) (6875913)
Add SSL verification error mitigation (#821) (5139b48)
backend XML: Do not delete temp file in notebook (#817) (4d41db3)
Typo (#814) (8a4ec77)
Added markdown headings to enable TOC in github pages (#808) (b885b2f)
Description of supported formats and backends (#788) (c2ae1cc)

v2.16.0 - 2025-01-24
Feature

New document picture classifier (#805) (16a218d)
Add Docling JSON ingestion (#783) (88a0e66)
Code and equation model for PDF and code blocks in markdown (#752) (3213b24)
Add "auto" language for TesseractOcr (#759) (8543c22)

Fix

Added extraction of byte-images in excel (#804) (a458e29)
Update docling-parse-v2 backend version with new parsing fixes (#769) (670a08b)

Documentation

Fix minor typos (#801) (c58f75d)
Add Azure RAG example (#675) (9020a93)
Fix links between docs pages (#697) (c49b352)
Fix correct Accelerator pipeline options in docs/examples/custom_convert.py (#733) (7686083)
Example to translate documents (#739) (f7e1cbf)

v2.15.1 - 2025-01-10
Fix

Improve OCR results, stricten criteria before dropping bitmap areas (#719) (5a060f2)
Allow earlier requests versions (#716) (e64b5a2)

Documentation

Add pointers to LangChain-side docs (#718) (9a6b5c8)
Add LangChain docs (#717) (4fa8028)

v2.15.0 - 2025-01-08
Feature

Added http header support for document converter and cli (#642) (0ee849e)

Fix

Correct scaling of debug visualizations, tune OCR (#700) (5cb4cf6)
Let BeautifulSoup detect the HTML encoding (#695) (42856fd)
mspowerpoint: Handle invalid images in PowerPoint slides (#650) (d49650c)

Documentation

Specify docstring types (#702) (ead396a)
Add link to rag with granite (#698) (6701f34)
Add integrations, revamp docs (#693) (2d24fae)
Add OpenContracts as an integration (#679) (569038d)
Add Weaviate RAG recipe notebook (#451) (2b591f9)
Document Haystack & Vectara support (#628) (fc645ea)

v2.14.0 - 2024-12-18
Feature

Create a backend to transform PubMed XML files to DoclingDocument (#557) (fd03480)

v2.13.0 - 2024-12-17
Feature

Updated Layout processing with forms and key-value areas (#530) (60dc852)
Create a backend to parse USPTO patents into DoclingDocument (#606) (4e08750)
Add Easyocr parameter recog_network (#613) (3b53bd3)

Documentation

Add Haystack RAG example (#615) (3e599c7)
Fix the path to the run_with_accelerator.py example (#608) (3bb3bf5)

v2.12.0 - 2024-12-13
Feature

Introduce support for GPU Accelerators (#593) (19fad92)

v2.11.0 - 2024-12-12
Feature

Add timeout limit to document parsing job. DS4SD#270 (#552) (3da166e)

Fix

Do not import python modules from deepsearch-glm (#569) (aee9c0b)
Handle no result from RapidOcr reader (#558) (f45499c)
Make enum serializable with human-readable value (#555) (a7df337)

Documentation

Update chunking usage docs, minor reorg (#550) (d0c9e8e)

v2.10.0 - 2024-12-09
Feature

Docling-parse v2 as default PDF backend (#549) (aca57f0)

Fix

Call into docling-core for legacy document transform (#551) (7972d47)
Introduce Image format options in CLI. Silence the tqdm downloading messages. (#544) (78f61a8)

v2.9.0 - 2024-12-09
Feature

Expose new hybrid chunker, update docs (#384) (c8ecdd9)
MS Word backend: Make detection of headers and other styles localization agnostic (#534) (3e073df)

Fix

Correcting DefaultText ID for MS Word backend (#537) (eb7ffcd)
Add py.typed marker file (#531) (9102fe1)
Enable HTML export in CLI and add options for image mode (#513) (0d11e30)
Missing text in docx (t tag) when embedded in a table (#528) (b730b2d)
Restore pydantic version pin after fixes (#512) (c830b92)
Folder input in cli (#511) (8ada0bc)

Documentation

Document new integrations (#532) (e780333)

v2.8.3 - 2024-12-03
Fix

Improve handling of disallowed formats (#429) (34c7c79)

v2.8.2 - 2024-12-03
Fix

ParserError EOF inside string (#470) (#472) (c90c41c)
PermissionError when using tesseract_ocr_cli_model (#496) (d3f84b2)

Documentation

Add styling for faq (#502) (5ba3807)
Typo in faq (#484) (33cff98)
Add automatic api reference (#475) (d487210)
Introduce faq section (#468) (8ccb3c6)

Performance

Prevent temp file leftovers, reuse core type (#487) (051789d)

v2.8.1 - 2024-11-29
Fix

cli: Expose debug options (#467) (dd8de46)
Remove unused deps (#466) (af63818)

Documentation

Extend integration docs & README (#456) (84c46fd)

v2.8.0 - 2024-11-27
Feature

ocr: Added support for RapidOCR engine (#415) (85b2999)

Fix

Use correct image index in word backend (#442) (767563b)
Update tests and examples for docling-core 2.5.1 (#449) (29807a2)

v2.7.1 - 2024-11-26
Fix

Fixes for wordx (#432) (d0a1180)
Force pydantic < 2.10.0 (#407) (d7072b4)

Documentation

Add DocETL, Kotaemon, spaCy integrations; minor docs improvements (#408) (7a45b92)

v2.7.0 - 2024-11-20
Feature

Add support for ocrmac OCR engine on macOS (#276) (6efa96c)

Fix

Python3.9 support (#396) (7b013ab)
Propagate document limits to converter (#388) (32ebf55)

v2.6.0 - 2024-11-19
Feature

Added support for exporting DocItem to an image when page image is available (#379) (3f91e7d)
Expose ocr-lang in CLI (#375) (ed785ea)
Added excel backend (#334) (926dfd2)
Extracting picture data for raster images found in PPTX (#349) (7a97d71)

Fix

Fixing images in the input Word files (#330) (8533039)
Reduce logging by keeping option for more verbose (#323) (8b437ad)

Documentation

Fixed typo in v2 example v2 (#378) (911c3bd)
Add automatic generation of CLI reference (#325) (ca8524e)
Add architecture outline (#341) (25fd149)
Fix parameter in usage.md (#332) (835e077)

v2.5.2 - 2024-11-13
Fix

Skip glm model downloads (#322) (c9341bf)

v2.5.1 - 2024-11-12
Fix

Handling of single-cell tables in DOCX backend (#314) (fb8ba86)

Documentation

Hybrid RAG with Qdrant (#312) (7f5d35e)
Add Data Prep Kit integration (#316) (93fc1be)

v2.5.0 - 2024-11-12
Feature

OCR: Introduce the OcrOptions.force_full_page_ocr parameter that forces a full page OCR scanning (#290) (c6b3763)

Fix

Configure env prefix for docling settings (#315) (5d4a10b)
Added handling of grouped elements in pptx backend (#307) (81c8243)
Allow mps usage for easyocr (#286) (97f214e)

Documentation

Add navigation indices (#305) (1239ade)

v2.4.2 - 2024-11-08
Fix

EasyOcrModel: Support the use_gpu pipeline parameter in EasyOcrModel. Initialize easyocr (#282) (0eb065e)

v2.4.1 - 2024-11-08
Fix

tesserocr: Raise Exception if tesserocr has not loaded any languages (#279) (704d792)
Dockerfile example copy command (#234) (90836db)

Documentation

Update badges & credits (#248) (a84ec27)
Add coming-soon section (#235) (5ce02c5)
Add artifacts-path param to CLI (#233) (d5e65ae)

v2.4.0 - 2024-11-04
Feature

Pdf backend, table mode as options and artifacts path (#203) (40ad987)

Documentation

Add explicit artifacts path example (#224) (eeee3b4)
Update custom convert and dockerfile (#226) (5f5fea9)
Correct spelling of 'individual' (#219) (41acaa9)
Update LlamaIndex docs (#196) (244ca69)

v2.3.1 - 2024-10-30
Fix

Simplify torch dependencies and update pinned docling deps (#190) (eb679cc)
Allow to explicitly initialize the pipeline (#189) (904d24d)

v2.3.0 - 2024-10-30
Feature

Add pipeline timings and toggle visualization, establish debug settings (#183) (2a2c65b)

Fix

Fix duplicate title and heading + add e2e tests for html and docx (#186) (f542460)

v2.2.1 - 2024-10-28
Fix

Fix header levels for DOCX & HTML (#184) (b9f5c74)
Handling of long sequence of unescaped underscore chars in markdown (#173) (94d0729)
HTML backend, fixes for Lists and nested texts (#180) (7d19418)
MD Backend, fixes to properly handle trailing inline text and emphasis in headers (#178) (88c1673)

Documentation

Update LlamaIndex docs for Docling v2 (#182) (2cece27)
Fix batch convert (#177) (189d3c2)
Add export with embedded images (#175) (8d356aa)

v2.2.0 - 2024-10-23
Feature

Update to docling-parse v2 without history (#170) (4116819)
Support AsciiDoc and Markdown input format (#168) (3023f18)

Fix

Set valid=false for invalid backends (#171) (3496b48)

v2.1.0 - 2024-10-18
Feature

Add coverage_threshold to skip OCR for small images (#161) (b346faf)

Fix

Fix legacy doc ref (#162) (63bef59)

Documentation

Typo fix (#155) (f799e77)
Add graphical band in readme (#154) (034a411)
Add use docling (#150) (61c092f)

v2.0.0 - 2024-10-16
Feature

Docling v2 (#117) (7d3be0e)

Breaking

Docling v2 (#117) (7d3be0e)

Documentation

Introduce docs site (#141) (d504432)

v1.20.0 - 2024-10-11
Feature

New experimental docling-parse v2 backend (#131) (5e4944f)

v1.19.1 - 2024-10-11
Fix

Remove stderr from tesseract cli and introduce fuzziness in the text validation of OCR tests (#138) (dae2a3b)

Documentation

Simplify LlamaIndex example using Docling extension (#135) (5f1bd9e)

v1.19.0 - 2024-10-08
Feature

Add options for choosing OCR engines (#118) (f96ea86)

v1.18.0 - 2024-10-03
Feature

New torch-based docling models (#120) (2422f70)

v1.17.0 - 2024-10-03
Feature

Windows support (#122) (d44c62d)

v1.16.1 - 2024-09-27
Fix

Allow usage of opencv 4.6.x (#110) (34bd887)

Documentation

Document chunking (#111) (c05b692)

v1.16.0 - 2024-09-27
Feature

Support tableformer model choice (#90) (d6df76f)

v1.15.0 - 2024-09-24
Feature

Add figure in markdown (#98) (6a03c20)

v1.14.0 - 2024-09-24
Feature

Add URL support to CLI (#99) (3c46e42)

Fix

Fix OCR setting for pypdfium, minor refactor (#102) (d96b96c)

Documentation

Document CLI, minor README revamp (#100) (f8f2303)

v1.13.1 - 2024-09-23
Fix

Updated the render_as_doctags with the new arguments from docling-core (#93) (4794ce4)

v1.13.0 - 2024-09-18
Feature

Add table exports (#86) (f19bd43)

Fix

Bumped the glm version and adjusted the tests (#83) (442443a)

Documentation

Updated Docling logo.png with transparent background (#88) (0da7519)

v1.12.2 - 2024-09-17
Fix

tests: Adjust the test data to match the new version of LayoutPredictor (#82) (fa9699f)

v1.12.1 - 2024-09-16
Fix

CLI compatibility with python 3.10 and 3.11 (#79) (2870fdc)

v1.12.0 - 2024-09-13
Feature

Add docling cli (#75) (9899078)

Documentation

Showcase RAG with LlamaIndex and LangChain (#71) (53569a1)

v1.11.0 - 2024-09-10
Feature

Adding txt and doctags output (#68) (bdfdfbf)

v1.10.0 - 2024-09-10
Feature

Linux arm64 support and reducing dependencies (#69) (27a7a15)

v1.9.0 - 2024-09-03
Feature

Export document pages as multimodal output (#54) (1de2e4f)

Documentation

Update MAINTAINERS.md (#59) (69e5d95)
Mention quackling on README (#58) (85b7348)

v1.8.5 - 2024-08-30
Fix

Add unit tests (#51) (48f4d1b)

v1.8.4 - 2024-08-30
Fix

Propagate row_section in tables (#57) (de85e46)

Documentation

Add instructions for cpu-only installation (#56) (a8a60d5)

v1.8.3 - 2024-08-28
Fix

Table cells overlap and model warnings (#53) (f49ee82)

v1.8.2 - 2024-08-27
Fix

Refine conversion result (#52) (e46a66a)

Documentation

Update interface in README (#50) (fe817b1)

v1.8.1 - 2024-08-26
Fix

Align output formats (#49) (8cc147b)

v1.8.0 - 2024-08-23
Feature

Page-level error reporting from PDF backend, introduce PARTIAL_SUCCESS status (#47) (a294b7e)

v1.7.1 - 2024-08-23
Fix

Better raise exception when a page fails to parse (#46) (8808463)
Upgrade docling-parse to 1.1.1, safety checks for failed parse on pages (#45) (7e84533)

v1.7.0 - 2024-08-22
Feature

Upgrade docling-parse PDF backend and interface to use page-by-page parsing (#44) (a8c6b29)

v1.6.3 - 2024-08-22
Fix

Usage of bytesio with docling-parse (#43) (fac5745)

v1.6.2 - 2024-08-22
Fix

Remove [ocr] extra to fix wheel install (#42) (6995268)

v1.6.1 - 2024-08-21
Fix

Add scipy as dependency (#40) (f19871a)

v1.6.0 - 2024-08-20
Feature

Add adaptive OCR, factor out treatment of OCR areas and cell filtering (#38) (e94d317)

v1.5.0 - 2024-08-20
Feature

Allow computing page images on-demand with scale and cache them (#36) (78347bf)

Documentation

Add technical paper ref (#37) (a13114b)

v1.4.0 - 2024-08-14
Feature

Update parser with bytesio interface and set as new default backend (#32) (90dd676)

Fix

Allow newer torch versions (#34) (349b0e9)

v1.3.0 - 2024-08-12
Feature

Output page images and extracted bbox (#31) (63d80ed)

v1.2.1 - 2024-08-07
Fix

Update (vuln) deps (#29) (79ef8d2)
Type of path_or_stream in PdfDocumentBackend (#28) (794b20a)

Documentation

Improve examples (#27) (9550db8)

v1.2.0 - 2024-08-07
Feature

Introducing docling_backend (#26) (b8f5e38)

v1.1.2 - 2024-07-31
Fix

Set page number using 1-based indexing (#22) (d2d9543)

v1.1.1 - 2024-07-30
Fix

Correct text extraction for table cells (#21) (f4bf3d2)

v1.1.0 - 2024-07-26
Feature

Add simplified single-doc conversion (#20) (d603137)

v1.0.2 - 2024-07-24
Fix

Add easyocr to main deps for valid extra (#19) (54b3dda)

v1.0.1 - 2024-07-24
Fix

Expose ocr as extra (#18) (b0725e0)

v1.0.0 - 2024-07-18
Feature

V1.0.0 release (#16) (71c3a9c)

Breaking

v1.0.0 release (#16) (71c3a9c)

v0.4.0 - 2024-07-17
Feature

Optimize table extraction quality, add configuration options (#11) (e9526bb)

v0.3.1 - 2024-07-17
Fix

Missing type for default values (#12) (d1d1724)

Documentation

Reflect supported Python versions, add badges (#10) (2baa35c)

v0.3.0 - 2024-07-17
Feature

Enable python 3.12 support by updating glm (#8) (fb72688)

Documentation

Add setup with pypi to Readme (#7) (2803222)

v0.2.0 - 2024-07-16
Feature

Build with ci (#6) (b1479cf)

170 KiB Raw Permalink Blame History Unescape Escape

v2.93.0 - 2026-05-05

Feature

Fix

v2.92.0 - 2026-04-29

Feature

Fix

v2.91.0 - 2026-04-23

Feature

Fix

v2.90.0 - 2026-04-17

Feature

Fix

Documentation

v2.89.0 - 2026-04-16

Feature

Fix

Documentation

Performance

v2.88.0 - 2026-04-13

Feature

Fix

Documentation

v2.87.0 - 2026-04-13

Feature

Fix

Documentation

v2.86.0 - 2026-04-10

Feature

Fix

Documentation

v2.85.0 - 2026-04-07

Feature

Fix

v2.84.0 - 2026-04-01

Feature

v2.83.0 - 2026-03-31

Feature

Fix

Documentation

v2.82.0 - 2026-03-25

Feature

Fix

Documentation

v2.81.0 - 2026-03-20

Feature

Fix

v2.80.0 - 2026-03-14

Feature

v2.79.0 - 2026-03-12

Feature

Fix

v2.78.0 - 2026-03-10

Feature

Fix

Documentation

v2.77.0 - 2026-03-06

Feature

Fix

Documentation

v2.76.0 - 2026-03-02

Feature

Fix

Documentation

v2.75.0 - 2026-02-24

Feature

Fix

v2.74.0 - 2026-02-17

Feature

Fix

v2.73.1 - 2026-02-13

Fix

v2.73.0 - 2026-02-11

Feature

Fix

Documentation

v2.72.0 - 2026-02-03

Feature

Fix

Documentation

170 KiB

Raw Permalink Blame History