Files
2026-05-12 14:54:25 +00:00

85 KiB

v2.75.0 - 2026-05-12

Feature

Fix

  • DocLang: Fix chemistry serialization. (#607) (48c5b97)

Documentation

  • security: Document security processes (#606) (6a6512e)

v2.74.1 - 2026-04-22

Fix

v2.74.0 - 2026-04-17

Feature

  • serializer: Add MsExcelMarkdownDocSerializer for sheet-name headings (#587) (9dc882d)
  • DocChunk expansion (#549) (f2a6186)

Fix

  • DocLang: Fix chemistry serialization (#584) (b72af12)
  • Prevent numeric precision loss in Markdown table serialization (#588) (6cbdee9)

v2.73.0 - 2026-04-09

Feature

  • ouline: Extend OutlineDocSerializer with filtering capabilities (#580) (18f5738)
  • Add latex and Tikz as codelabels (#579) (46a9b5a)

Documentation

v2.72.0 - 2026-04-07

Feature

v2.71.0 - 2026-03-30

Feature

Fix

  • Doclang: Improve checkbox serialization & deserialization (#570) (c9b5152)
  • Doclang: Fix serialization order in text items (#571) (a1535bc)
  • Extend validation to address duplicate refs (#565) (0cfb663)
  • Doclang: Fix group serialization (#566) (159eb8f)
  • Repair table children when rich table cells break hierarchy (#563) (b65dd24)

v2.70.2 - 2026-03-20

Fix

  • Doclang: Suppress empty elements in Doclang serialization (#554) (91ee7e2)
  • Expose traverse_pictures in export_to_markdown and export_to_text (#557) (3e030ed)
  • Sync picture classification enums with DocumentFigureClassifier-v2.0 model (#529) (f97ec83)

v2.70.1 - 2026-03-17

Fix

  • markdown: Remove assert statements to support Python optimization mode (#548) (0a3b278)
  • Improve rich table cell validation (#550) (c57e50a)

v2.70.0 - 2026-03-13

Feature

  • Introduce field data model incl. Doclang serialization (#519) (b93d5a3)
  • Make an experimental outline serializer (#415) (8d7859e)
  • Profile a document or collection (#511) (af50f1c)
  • Split html table to headers and body (#532) (b435090)
  • Handle wide table outliers with LineBasedTokenChunker (#536) (e00125c)

v2.69.0 - 2026-03-09

Feature

  • Loosen dependency version constraints (#534) (4eb0d20)

v2.68.0 - 2026-03-07

Feature

Fix

  • Prevent infinite loop in LineBasedTokenChunker with unbreakable tokens (#533) (a661bb1)

v2.67.1 - 2026-03-05

Fix

  • Prevent hang in export_to_markdown() on nested RichTableCells (#525) (2debe08)

v2.67.0 - 2026-03-04

Feature

v2.66.0 - 2026-02-26

Feature

  • Add WebVTT export and save functionality (#523) (b8ef7ba)

Fix

  • Rich table triplet serialization (#425) (c566268)
  • Support single-column table default serialization (#526) (73b0757)

v2.65.2 - 2026-02-23

Fix

  • Accept relative URIs in PdfHyperlink without validation failure (#520) (6032c7c)
  • Shift KV/Form graph cell page numbers during DoclingDocument.concatenate (#521) (6a04db7)
  • chunker: Propagate 'traverse_pictures' parameter to chunker (#518) (a3b6e3f)

v2.65.1 - 2026-02-13

Fix

v2.65.0 - 2026-02-13

Feature

Fix

v2.64.0 - 2026-02-09

Feature

Fix

  • Doclang: Fix image URI serialization (#504) (193c25f)
  • DocTags: Fix deserialization to populate picture meta fields (#505) (8005892)

v2.63.0 - 2026-02-03

Feature

Fix

  • serialization: Add 'traverse_pictures' parameter to serializers (#501) (04cf44b)
  • DocTags: Fix picture classification deserialization (#500) (de2b729)
  • Doclang: Fix checkbox serialization (#503) (1d8b78c)

v2.62.0 - 2026-01-30

Feature

Fix

  • html: Visualize picture meta as html collapsible (#497) (fd27df1)
  • markdown: Add an option to compact table serialization (#495) (3b0b909)
  • IDocTags: Fix default location resolution handling (#492) (549a2f1)

v2.61.0 - 2026-01-26

Feature

  • Added parameter to get_row_bounding_boxes and get_column_bounding_boxes (#490) (577a1a7)
  • IDocTags: Add content wrapping for handling whitespace (#489) (fdcdfd1)

Fix

  • IDocTags: Align code labels with Linguist (#484) (e5c0015)

v2.60.2 - 2026-01-23

Fix

  • Drop python 3.9 and pin built tree-sitter versions (#487) (01373c4)

v2.60.1 - 2026-01-22

Fix

  • serialization: Escape pipe symbol in single cell md serialization (#485) (334575c)

v2.60.0 - 2026-01-20

Feature

  • IDocTags: Add fine-grained content serialization filtering (#476) (17bd21e)

Fix

  • Fix transparency rendering in all visualizers (#481) (a6d2be6)
  • IDocTags: Fix InlineGroup serialization and deserialization (#477) (d9e8d37)

v2.59.0 - 2026-01-12

Feature

Fix

  • Make tree-sitter-java-orchard optional (#475) (980abab)

v2.58.1 - 2026-01-09

Fix

  • deps: Switch to tree-sitter-java-orchard and expand typer compatibility (#473) (8c330ab)

v2.58.0 - 2026-01-08

Feature

  • DocItem: Add comments field for linking annotations to document… (#465) (1d2e0c7)

Fix

  • Skip annotation migration if respective meta present (#469) (51a01a6)

v2.57.0 - 2025-12-18

Feature

  • Enable heading-only chunks for empty-section headings (#461) (8b082e9)

v2.56.0 - 2025-12-17

Feature

  • Idoctags serialization and deserialization matching the iso proposal (#457) (dda9c88)

v2.55.0 - 2025-12-10

Feature

v2.54.1 - 2025-12-08

Fix

  • Switch meta migration logging to info (#452) (f8e09ec)

Documentation

  • Minor update to add extras to reflect main readme. (#448) (b522061)

v2.54.0 - 2025-11-29

Feature

v2.53.0 - 2025-11-27

Feature

  • experimental: Extend IDocTags tokens (#439) (aa5c668)
  • Added the Azure Document Intelligence (#395) (92d60b0)

Fix

v2.52.0 - 2025-11-20

Feature

v2.51.1 - 2025-11-14

Fix

  • Improve meta migration (#422) (bc0e96b)
  • DoclingDocument model validator should deal with any raw input (#419) (56b3c42)

v2.51.0 - 2025-11-12

Feature

Fix

  • Improve meta migration and warning handling (#417) (3d13b02)
  • Fix import handling of extra dependencies for chunking (#418) (567d3ad)

v2.50.1 - 2025-11-04

Fix

v2.50.0 - 2025-10-30

Feature

v2.49.0 - 2025-10-16

Feature

v2.48.4 - 2025-10-01

Fix

v2.48.3 - 2025-09-29

Fix

v2.48.2 - 2025-09-22

Fix

  • Expose escape_html param to DoclingDocument md serialization (#388) (dd6ebc3)

v2.48.1 - 2025-09-11

Fix

  • markdown: Fix single-row table serialization (#385) (9df7208)

v2.48.0 - 2025-09-09

Feature

  • Introduction of fillable TableCell (#384) (b13267f)
  • Add support for heading with inline in HTML & DocTags (#379) (b60ac19)

Fix

  • Add doc param to all export_to_dataframe() calls (#380) (0512f44)
  • Fix handling of generic groups in rich table cells (#383) (2dc57c1)

v2.47.0 - 2025-09-02

Feature

  • Add page filtering to DoclingDocument (#378) (9dc526d)

v2.46.0 - 2025-09-01

Feature

Fix

Performance

  • Cache grid property in HTMLTableSerializer (#373) (339bbd4)

v2.45.0 - 2025-08-20

Feature

Fix

  • Add forward slashes to singleton tags (#369) (23badf2)

v2.44.2 - 2025-08-14

Fix

  • HTML: Fix nested list serialization edge cases (#367) (807d972)

v2.44.1 - 2025-07-30

Fix

  • Referenced artifacts relative to the document location (#361) (5afa99e)

v2.44.0 - 2025-07-28

Feature

v2.43.1 - 2025-07-23

Fix

  • LayoutVisualizer should traverse pictures (#358) (f9b3b49)
  • HTML serialization of nested lists (#359) (5a7883c)

v2.43.0 - 2025-07-16

Feature

Fix

v2.42.0 - 2025-07-09

Feature

  • Extend and expose float serialization control (#353) (c339171)
  • Additional DoclingDocument methods for use in MCP document manipulation (#344) (cb59fd3)

v2.41.0 - 2025-07-09

Feature

  • Enable precision control in float serialization (#352) (baa2cc3)

v2.40.0 - 2025-07-02

Feature

Fix

  • BoundingRectangle angle computation when in CoordOrigin.TOPLEFT (#347) (9fa0c9f)

v2.39.0 - 2025-06-27

Feature

  • Remodel lists, add MD & HTML ser. params, enable unset marker (#339) (14a4fde)
  • Download Google docs and drive files via export url (#335) (3eeb259)

v2.38.2 - 2025-06-25

Fix

  • Add missing mimetypes for asr inputs (#341) (c2fd20f)
  • Add text direction to export_to_textlines (#338) (425b191)

v2.38.1 - 2025-06-20

Fix

  • markdown: Add heading formatting, fix code & formula formatting (#336) (c9374e8)

v2.38.0 - 2025-06-18

Feature

  • viz: Add reading order branch numbering, fix cross-page lists (#334) (78b7962)
  • Add parameter to choose of which pages export the doctags (#290) (0fd3c1c)

Fix

  • Expose base types consistently (#332) (2e14a74)
  • HybridChunker: Improve long heading handling (#333) (5c99722)

v2.37.0 - 2025-06-13

Feature

  • Add improved table serializer and visualizer (#328) (3b99879)

v2.36.0 - 2025-06-11

Feature

v2.35.0 - 2025-06-11

Feature

v2.34.2 - 2025-06-10

Fix

v2.34.1 - 2025-06-08

Fix

  • Warn when adding misplaced ListItem via API (#321) (01b27b5)

v2.34.0 - 2025-06-06

Feature

Fix

v2.33.1 - 2025-06-04

Fix

  • New typer version with new click (#315) (e17eabf)
  • Support section_header levels in doctags deserialization (#313) (defd49e)

v2.33.0 - 2025-06-02

Feature

  • Add BoundingBox methods for overlap and union calculations (#311) (c521766)

v2.32.0 - 2025-05-27

Feature

  • Add annotations in MD & HTML serialization (#295) (f067c51)

Fix

  • HybridChunker: Refine max_tokens auto-detection (#306) (87b72d6)

v2.31.2 - 2025-05-22

Fix

v2.31.1 - 2025-05-20

Fix

  • markdown: Fix case of empty page break string (#298) (c49a50e)

v2.31.0 - 2025-05-18

Feature

  • Provide visualizer option in HTML split view (#294) (6a7eb53)

v2.30.1 - 2025-05-14

Fix

  • Updates for labels and methods to support document GT annotation (#293) (aa957cf)

v2.30.0 - 2025-05-06

Feature

Fix

  • Add unit flags to SegmentedPage (#286) (ad88ecf)
  • Update deserialization for better recovery (#282) (511fb98)
  • Include captions regardless of traverse_pictures flag (#278) (7eb9fa9)
  • Hashlib usage for FIPS (#280) (4b967ab)

v2.29.0 - 2025-05-01

Feature

Fix

  • Fix multi-provenance item visualization (#277) (8677d6e)
  • Added return value for crop_text method in segmentedPdfPage Class (#275) (591fe59)
  • Make load_from_doctags method static (#273) (8f85d05)

v2.28.1 - 2025-04-25

Fix

v2.28.0 - 2025-04-23

Feature

v2.27.0 - 2025-04-16

Feature

  • Chart tabular data serialization for HTML serializer (#258) (caa8aee)

Fix

v2.26.4 - 2025-04-14

Fix

  • Fix page breaking in case page starts with group (#253) (928e5c5)

v2.26.3 - 2025-04-14

Fix

v2.26.2 - 2025-04-14

Fix

  • Fix code handling in HTML serialization (#251) (15d2f2c)

v2.26.1 - 2025-04-11

Performance

v2.26.0 - 2025-04-11

Feature

  • Add HTML serializer (#232) (5d40600)
  • Add serializer provider to chunkers (#239) (23036e1)
  • Integrate serialization API into chunkers (#221) (5e4c0fd)
  • Expose page number in Serialization API (#238) (73b9941)
  • Markdown chart serializer (picture+table) (#235) (0482bac)
  • Support of DocTags charts (serialization and deserialization) (#229) (e9259a5)
  • Added initial delete and insert methods in DoclingDocument (#220) (f2fe1c1)

Fix

  • Fix page filtering issue (#247) (ab78e0b)
  • Propagate HTMLOutputStyle properly through (#246) (587e67f)
  • Better BoundingRectangle.angle and BoundingRectangle.angle_360 computation (#237) (055742c)
  • DocTags import location fix for tables, pictures, captions (#227) (a055e1a)

Performance

v2.25.0 - 2025-03-31

Feature

  • Allow images in doctags deserializer to be optional and support multipage (#225) (e0943d2)

Fix

v2.24.1 - 2025-03-28

Fix

  • Automatic transformation of output cells bbox coord origin defined by input in get_cells_in_bbox (#219) (8e0e9b7)

v2.24.0 - 2025-03-25

Feature

  • Expose MD page break & DocTags minification (#213) (ff13a93)
  • Add document tokens from key value items (#170) (db119f4)
  • Add DocTags serializers (#192) (1f4d57e)
  • Add kv_item support for doctag to docling_document (#188) (2371c11)

Fix

  • Enable caption serialization for all floating items (#216) (e1d0597)
  • Allow captions without holding item (#215) (2efb71a)
  • Add 'text/csv' mimetype to _extra_mimetypes type list (#210) (bc3f5d5)
  • Add handling for str filenames in save/load methods (#205) (75d94ab)
  • Markdown picture item export (#207) (510649e)
  • DocTags support of furniture (#209) (337ff74)

Performance

  • serialization: Cache excluded references (#214) (bcace5d)

v2.23.3 - 2025-03-19

Fix

  • markdown: Fix ordered list numbering (#200) (7ed4d22)

v2.23.2 - 2025-03-18

Fix

  • Add caption to the table in load_from_doctags (#197) (5cee486)

v2.23.1 - 2025-03-17

Fix

v2.23.0 - 2025-03-13

Feature

  • Add serializers, text formatting, update Markdown export (#182) (a7cdc87)
  • Add data model types from docling-parse (#186) (a86a4a3)

v2.22.0 - 2025-03-12

Feature

  • Add DoclingDocument.load_from_doctags method and DocTags data models (#187) (c065c4c)
  • Add document tokens for SMILES (#176) (32398b8)

v2.21.2 - 2025-03-06

Fix

  • Suppress warning for missing fallback case (#184) (ccde54a)
  • doctags: Fix code export (#181) (53f6d09)
  • markdown: Fix escaping in case of nesting (#180) (834db4b)
  • HybridChunker: Remove max_length from tokenization (#178) (419252c)

v2.21.1 - 2025-02-28

Fix

  • markdown: Fix handling of ordered lists (#175) (349f7da)

v2.21.0 - 2025-02-27

Feature

  • Add inline groups, revamp Markdown export incl. list groups (#156) (2abaf9b)

Fix

  • markdown: Fix case of leading list (#174) (c77c59b)
  • Properly handle missing page image case for export_to_html (#166) (4708f93)

v2.20.0 - 2025-02-19

Feature

v2.19.1 - 2025-02-17

Fix

  • Expose included_content_layers arg in export/save methods for MD+HTML (#164) (c46995b)

v2.19.0 - 2025-02-17

Feature

  • Redefine CodeItem as floating object with captions (#160) (916323f)
  • Implementation of doc tags (#138) (f751b45)

Fix

  • Document Tokens (doc tags) clean up, fix iterate_items for content_layer (#161) (58ed6c8)
  • Fix inheritance of CodeItem for backward compatibility (#162) (7267c3f)

v2.18.1 - 2025-02-13

Fix

v2.18.0 - 2025-02-10

Feature

  • Add ContentLayer attribute to designate items to body or furniture (#148) (786f0c6)

v2.17.2 - 2025-02-06

Fix

  • Define LTR/RTL text direction in HTML export (#152) (3cf31cb)

v2.17.1 - 2025-02-03

Fix

  • Image fallback for malformed equations (#149) (eb9b4b3)

v2.17.0 - 2025-02-03

Feature

  • HTML: Fallback showing formulas as images (#146) (23477f7)
  • HTML: Export formulas with mathml (#144) (ed36437)

Fix

  • Add html escape in md export and fix formula escapes (#143) (c6590e8)

v2.16.1 - 2025-01-30

Fix

v2.16.0 - 2025-01-29

Feature

  • Escape underscores that are within latex equations (#137) (0d5cd11)
  • Add escaping_underscores option to markdown export (#135) (c9739b2)
  • Added the geometric operations to BoundingBox (#136) (f02bbae)

v2.15.1 - 2025-01-21

Fix

v2.15.0 - 2025-01-21

Feature

  • Add CodeItem as pydantic type, update export methods and APIs (#129) (c940aa5)

Fix

  • Fix hybrid chunker token constraint (#131) (b741eea)
  • Always return a new bbox when changing origin (#128) (841668f)

v2.14.0 - 2025-01-10

Feature

v2.13.1 - 2025-01-08

Fix

  • Restore proper string serialization of DocItemLabel (#124) (a52bb88)

v2.13.0 - 2025-01-08

Feature

  • Add mapping to colors into DocItemLabel (#123) (639f122)

Fix

  • Quote referenced URIs in markdown and html (#122) (127dd2f)

v2.12.1 - 2024-12-17

Fix

v2.12.0 - 2024-12-17

Feature

  • Added the new label comment_section in the groups (#114) (5101dd8)

Fix

  • Skip labels not included in the allow-list (#113) (d147c25)
  • Always write with utf8 encoding (#111) (268c294)

v2.11.0 - 2024-12-16

Feature

  • Add group labels for form and key-value areas (#110) (aeaf89d)

v2.10.0 - 2024-12-13

Feature

  • Add legacy to DoclingDocument utility (#108) (b31e0a3)
  • Add DoclingDocument viewer to CLI (#99) (9628d19)
  • Add default tokenizer to HybridChunker (#107) (2591c70)

Fix

v2.9.0 - 2024-12-09

Feature

  • Utilities converting document formats (#91) (437c498)

Fix

  • markdown: Preserve underscores in image URLs during markdown export (#98) (fd7529f)

v2.8.0 - 2024-12-06

Feature

v2.7.1 - 2024-12-06

Fix

v2.7.0 - 2024-12-04

Feature

  • Export to OTSL method for docling doc tables (#86) (180e294)

v2.6.1 - 2024-12-02

Fix

v2.6.0 - 2024-12-02

Feature

  • Extend source resolution with streams and workdir (#79) (9a74d13)
  • Simple method to load DoclingDocument from .json files (#71) (fc1cfb0)

Fix

  • Allow all url types in referenced exports (#82) (3bd83bc)
  • Even better style for HTML export (#78) (8422ad4)

v2.5.1 - 2024-11-27

Fix

  • Hotfix for TableItem.export_to_html args (#76) (ae2f131)
  • Artifacts dir double stem (#75) (f93332b)

v2.5.0 - 2024-11-27

Feature

  • Adding HTML export to DoclingDocument, adding export of images in png with links to Markdown & HTML (#69) (ef49fd3)

v2.4.1 - 2024-11-21

Fix

  • Temporarily force pydantic < 2.10 (#70) (289b629)

v2.4.0 - 2024-11-18

Feature

  • Add get_image for all DocItem (#67) (9d7e831)
  • Allow exporting a specific page to md. (#63) (1a201bc)

v2.3.2 - 2024-11-11

Fix

  • Fixed selection logic for a slice of the document (#66) (dfdc76b)

v2.3.1 - 2024-11-01

Fix

  • Include titles to chunk heading metadata (#62) (bfeb2db)

v2.3.0 - 2024-10-29

Feature

  • Added pydantic models to store charts data (pie, bar, stacked bar, line, scatter) (#52) (36b7bea)

v2.2.3 - 2024-10-29

Fix

  • Str representation of enum across python versions (#60) (8528918)
  • Title for export to markdown and add text_width parameter (#59) (4993c34)

v2.2.2 - 2024-10-26

Fix

  • Fix non-string table cell handling in chunker (#58) (b5d07b2)

v2.2.1 - 2024-10-25

Fix

  • Escaping underscore characters in md export (#57) (c344d0f)

v2.2.0 - 2024-10-24

Feature

  • Add headers argument and a custom user-agents for http requests (#53) (44941b5)

Fix

  • Fix resolution in case of URL without path (#55) (2c88e56)

v2.1.0 - 2024-10-22

Feature

  • Improve markdown export of DoclingDocument (#50) (328778e)
  • Extend chunk meta with schema, version, origin (#49) (d09fe7e)

v2.0.1 - 2024-10-18

Fix

v2.0.0 - 2024-10-16

Feature

  • Expose DoclingDocument as main type, move old typing to legacy (#41) (03df97f)

Breaking

  • Expose DoclingDocument as main type, move old typing to legacy (#41) (03df97f)

v1.7.2 - 2024-10-09

Fix

v1.7.1 - 2024-10-07

Fix

  • Make doc metadata keys pure strings (#38) (246627f)
  • Align chunk ref format with one used in Document (#37) (b5592ad)

v1.7.0 - 2024-10-01

Feature

  • (experimental) introduce new document format (#21) (688789e)
  • Add doc metadata extractor and ID generator classes (#34) (b76780c)
  • Support heading as chunk metadata (#36) (4bde515)

v1.6.3 - 2024-09-26

Fix

  • Change order of JSON Schema to search mapper transformations (#32) (a4ddd14)

v1.6.2 - 2024-09-24

Fix

  • Remove duplicate captions in markdown (#31) (a334b9f)

v1.6.1 - 2024-09-24

Fix

  • Remove unnecessary package dependency (#30) (e706d68)

v1.6.0 - 2024-09-23

Feature

v1.5.0 - 2024-09-20

Feature

  • Add export to doctags for document components (#25) (891530f)
  • Add file source resolution utility (#22) (752cbc3)

v1.4.1 - 2024-09-18

Fix

v1.4.0 - 2024-09-18

Feature

v1.3.0 - 2024-09-11

Feature

v1.2.0 - 2024-09-10

Feature

v1.1.4 - 2024-09-06

Fix

  • Validate_model() could be called with other types rather than dict (#14) (235b2cd)

Documentation

v1.1.3 - 2024-08-28

Fix

  • Use same base type for all components (#10) (f450c8c)

v1.1.2 - 2024-07-31

Fix

  • Make page number strictly positive (#8) (ec3cff9)

v1.1.1 - 2024-07-23

Fix

Documentation

  • Revamp installation instructions (#6) (3f77b2e)

v1.1.0 - 2024-07-18

Feature

  • Add document Markdown export (#4) (d0ffc85)

v1.0.0 - 2024-07-17

Feature

Breaking

v0.0.1 - 2024-07-17

Fix

  • Fix definition issues in record type (#2) (656f563)