mirror of
https://github.com/docling-project/docling.git
synced 2026-05-17 13:10:38 +00:00
eceedc2f40
* feat(latex): add asynchronous TikZ rendering via Tectonic engine This commit introduces a high-performance, asynchronous pipeline for rendering TikZ diagrams into images during LaTeX document conversion. Key Changes: - Tectonic Integration (`TectonicEngine`): Compiles `tikzpicture` environments into PDFs using Tectonic, auto-downloading the binary if missing. Rasterizes the PDF to 300 DPI images. - Asynchronous Processing: Utilizes a dynamic `ThreadPoolExecutor` (scaled to `os.cpu_count() - 1`) to render multiple diagrams concurrently without blocking the main document conversion pipeline. - Preamble Extraction: Dynamically parses the main document's preamble and injects it into standalone diagrams to ensure compatibility with complex libraries (e.g., `pgfgantt`, `tikz-cd`, `tkz-euclide`). - Graceful Fallbacks: If Tectonic compilation fails due to LaTeX syntax errors or incompatible packages, the engine gracefully falls back to preserving the raw TikZ source code as a `CodeMetaField` to prevent data loss. - CLI Support: Added `--tikz-engine tectonic` option to enable the backend configuration. Resolves pre-commit hooks (MyPy, Ruff linter/formatter). Signed-off-by: Aditya Sasidhar <arctic@arctic> Signed-off-by: Aditya Sasidhar <telikicherlaadityasasidhar@gmail.com> * feat(latex): add optional Tectonic TikZ rendering with isolated dependency staging Add opt-in TikZ image rendering for the LaTeX backend using Tectonic, while preserving stable fallback behavior when rendering fails. What this changes: - add optional `tikz_engine="tectonic"` backend support for TikZ diagrams - render `tikzpicture` environments asynchronously during LaTeX parsing - preserve raw TikZ code as `PictureMeta.code` whenever rendering fails, times out, or rasterization cannot complete - add Tectonic engine options for: - automatic binary download - per-diagram timeout - shell escape control - make shell escape explicit opt-in via CLI/backend config - sanitize known pdfTeX-only assignment lines in preambles for better Tectonic/XeTeX compatibility - restore file-backed relative TikZ compatibility by staging only explicit local dependencies (`\input`, `\include`, `\includegraphics`) into the temp render directory - block dependency path traversal and avoid ambient source-directory search - rasterize generated PDFs with locking and crop whitespace from output CLI / config updates: - add `--tikz-engine` / `-T` - add `--no-tikz-engine-download` - add `--tikz-engine-timeout` - add `--tikz-shell-escape` Tests: - add focused Tectonic engine tests for download behavior, timeout, preamble sanitization, shell escape toggling, dependency staging, and path traversal blocking - add backend tests for TikZ fallback behavior and file-backed source-root handling Signed-off-by: Aditya Sasidhar <telikicherlaadityasasidhar@gmail.com> * docs: add documentation for pipeline options Signed-off-by: Aditya Sasidhar <telikicherlaadityasasidhar@gmail.com> * fix(latex): keep Tectonic engine within module boundaries Signed-off-by: Aditya Sasidhar <telikicherlaadityasasidhar@gmail.com> * fix(latex): fix mypy error by narrowing backend option types Signed-off-by: Aditya Sasidhar <telikicherlaadityasasidhar@gmail.com> * revert: drop unintended pyproject and uv lock updates Signed-off-by: Aditya Sasidhar <telikicherlaadityasasidhar@gmail.com> * fix(latex): Got rid of the niche latex tikz based tectonic control flags Signed-off-by: Aditya Sasidhar <telikicherlaadityasasidhar@gmail.com> --------- Signed-off-by: Aditya Sasidhar <arctic@arctic> Signed-off-by: Aditya Sasidhar <telikicherlaadityasasidhar@gmail.com>