mirror of
https://github.com/docling-project/docling.git
synced 2026-05-17 13:10:38 +00:00
e6ccb8b2c1
* feat: added support for parsing LaTeX (.tex) documents Signed-off-by: Aditya Sasidhar <telikicherlaadityasasidhar@gmail.com> * feat: implement PR #2890 feedback for LaTeX backend - Add text formatting options (bold, italic, underline) for LaTeX macros - Enhance image embedding with PIL and ImageRef.from_pil() - Refactor list processing to use GroupItem structure - Refactor bibliography to use GroupItem structure - Add nested list test coverage - All tests passing (39/39), all linters passing Signed-off-by: Aditya Sasidhar <telikicherlaadityasasidhar@gmail.com> * DCO Remediation Commit for Aditya Sasidhar <telikicherlaadityasasidhar@gmail.com> I, Aditya Sasidhar <telikicherlaadityasasidhar@gmail.com>, hereby add my Signed-off-by to this commit: f19f135b431d489cd8bf3982524505a0bbd8696d Signed-off-by: Aditya Sasidhar <telikicherlaadityasasidhar@gmail.com> * DCO Remediation Commit for Aditya Sasidhar <telikicherlaadityasasidhar@gmail.com> I, Aditya Sasidhar <telikicherlaadityasasidhar@gmail.com>, hereby add my Signed-off-by to this commit: f19f135b431d489cd8bf3982524505a0bbd8696d Signed-off-by: Aditya Sasidhar <telikicherlaadityasasidhar@gmail.com> * feat: enhance latex backend with robustness fixes and ground truth - Add custom macro expansion for improved text quality - Fix preamble filtering to remove metadata garbage - Support recursive \input{} and \include{} file loading - Organize test data into subdirectories for complex papers - Add full end-to-end ground truth for 4 major arXiv papers (Attention, Mistral, DeepSeek, OTSL) - Pass all 41 unit tests and pre-commit checks Addresses @cau-git feedback for ground-truth data. Signed-off-by: Aditya Sasidhar <telikicherlaadityasasidhar@gmail.com> * fix: minor formatting in test file Signed-off-by: Aditya Sasidhar <telikicherlaadityasasidhar@gmail.com> * feat: enhance LaTeX backend with robust math and figure support - Fixed re.error: bad escape in macro expansion by using lambda in re.sub - Fixed sentences breaking at inline math ($) by preserving it within paragraphs - Improved figure environment with proper grouping and structured representation - Fixed crashes on documents starting with % comments - Added comprehensive unit tests and updated all ground truth data Signed-off-by: Aditya Sasidhar <telikicherlaadityasasidhar@gmail.com> * WIP: saving work for laptop migration Signed-off-by: Aditya Sasidhar <telikicherlaadityasasidhar@gmail.com> * got rid of the line breaking issues, still some do exist Signed-off-by: Aditya Sasidhar <telikicherlaadityasasidhar@gmail.com> * fix: generalized LaTeX macro parsing and robustness improvements This commit addresses several issues with LaTeX parsing: - Correctly handle unknown macros (like \ion{N}{2}) inline to avoid line breaks. - Fix extraction of structural macros (section, caption, etc.) vs text-only groups. - Address PR feedback regarding inline math spacing and splitting. - Regenerate ground truth files reflecting these improvements. Signed-off-by: Aditya Sasidhar <telikicherlaadityasasidhar@gmail.com> * style: apply automatic formatting fixes Signed-off-by: Aditya Sasidhar <telikicherlaadityasasidhar@gmail.com> * style: fix ruff linter and formatter errors Signed-off-by: Aditya Sasidhar <telikicherlaadityasasidhar@gmail.com> * fix: typing issues identified by mypy Signed-off-by: Aditya Sasidhar <telikicherlaadityasasidhar@gmail.com> * style: apply formatting fixes to tests Signed-off-by: Aditya Sasidhar <telikicherlaadityasasidhar@gmail.com> * fix: update groundtruth files for latex backend Signed-off-by: Aditya Sasidhar <telikicherlaadityasasidhar@gmail.com> * fixed the ackward line breaking issue, turns out im stupid at considering text buffer * i forgot to add the groundtruth so here it is * DCO Remediation Commit for Aditya Sasidhar <telikicherlaadityasasidhar@gmail.com> I, Aditya Sasidhar <telikicherlaadityasasidhar@gmail.com>, hereby add my Signed-off-by to this commit:7e032635efI, Aditya Sasidhar <telikicherlaadityasasidhar@gmail.com>, hereby add my Signed-off-by to this commit:aeba688384Signed-off-by: Aditya Sasidhar <telikicherlaadityasasidhar@gmail.com> * Ran the precommit as requested Signed-off-by: Aditya Sasidhar <telikicherlaadityasasidhar@gmail.com> --------- Signed-off-by: Aditya Sasidhar <telikicherlaadityasasidhar@gmail.com>
35 lines
580 B
TeX
Vendored
35 lines
580 B
TeX
Vendored
\documentclass{article}
|
|
\begin{document}
|
|
|
|
\section{Math Examples}
|
|
|
|
Inline math: $E = mc^2$
|
|
|
|
Display math:
|
|
$$
|
|
\int_0^\infty e^{-x^2} dx = \frac{\sqrt{\pi}}{2}
|
|
$$
|
|
|
|
\begin{equation}
|
|
\nabla \times \mathbf{E} = -\frac{\partial \mathbf{B}}{\partial t}
|
|
\end{equation}
|
|
|
|
\section{Table Example}
|
|
|
|
\begin{table}[h]
|
|
\begin{tabular}{|c|c|c|}
|
|
\hline
|
|
Name & Age & City \\
|
|
\hline
|
|
Alice & 25 & Boston \\
|
|
Bob & 30 & Seattle \\
|
|
\hline
|
|
\end{tabular}
|
|
\caption{Sample table}
|
|
\end{table}
|
|
|
|
\section{References}
|
|
See \cite{smith2020} for more details. Also refer to Section \ref{sec:math}.
|
|
|
|
\end{document}
|