mirror of
https://github.com/docling-project/docling.git
synced 2026-05-17 13:10:38 +00:00
1192714b53
* a quick 30 second timeout for each file ( this does seem exorbitant but ill have to go over the average parse time of large files and decide upon an upper limit and an average limit, next commit needs individual node ignorance instead of file itself Signed-off-by: Aditya Sasidhar <telikicherlaadityasasidhar@gmail.com> * fix: bypass mypy attr-defined for parse_timeout in late options Signed-off-by: Aditya Sasidhar <telikicherlaadityasasidhar@gmail.com> * feat(latex): SOTA improvements from pandoc — theorems, preamble metadata, math envs, bugfixes Features: - Theorem/proof/lemma/corollary/definition/remark/example/conjecture environments - Proof environment with conditional QED ◻ symbol - \paragraph and \subparagraph as headings (levels 4, 5) - \author, \date, \title extracted from preamble - \href preserves URL as [text](url) - \renewcommand and \providecommand macro extraction - dmath/dgroup/darray/subequations math environments - \input cycle detection with depth limit of 10 - quote/quotation/verse environment handling Bugfixes: - Fixed UnboundLocalError in _extract_custom_macros - Fixed _extract_verbatim_content regex stealing content - Fixed is_valid() rejecting preamble-only fragments - Removed unused deepcopy import - Unified recursion depth limits to 10 Tests: - 7 new tests, 1 updated, ground-truth regenerated Signed-off-by: Aditya Sasidhar <telikicherlaadityasasidhar@gmail.com> * Added some more dangerous macors to the ignore list, the is_valid() function now accpets \documentstyle too and added some essential and primitive layout passes Signed-off-by: Aditya Sasidhar <telikicherlaadityasasidhar@gmail.com> * removed the restrictive nature of the is_valid() function Signed-off-by: Aditya Sasidhar <telikicherlaadityasasidhar@gmail.com> * added test coverage to the added features and got rid of the time formatted parsing test which caused hanging on the python 3.10 during the CI/CD testing pipline Signed-off-by: Aditya Sasidhar <telikicherlaadityasasidhar@gmail.com> --------- Signed-off-by: Aditya Sasidhar <telikicherlaadityasasidhar@gmail.com>