Files
Smeet Agrawal 101233ebe2 fix(latex): fully unwrap deeply nested formatting macros (#3249)
* fix(latex): fully unwrap deeply nested formatting macros

Two related bugs when formatting macros are nested:

1. `\textcolor{color}{...}` extracted the color name alongside the
   text content because `_nodes_to_text` fell through to the generic
   else branch, which concatenates all arguments. E.g.
   `\section{\textcolor{blue}{\textbf{[SEP]}}}` produced heading text
   "blue [SEP]" instead of "[SEP]".

2. `\textsc`, `\textsf`, `\textrm`, `\textnormal`, `\mbox` and
   `\textcolor`/`\colorbox` are listed in MACROS_STRUCTURAL, so when
   encountered mid-sentence `_process_macro_node_inline` flushed the
   text buffer and called `_process_macro`, which creates a new doc
   node. This broke inline paragraphs into fragments.

Fix:
- Add explicit handlers for MACROS_TEXT_STYLE and textcolor/colorbox
  in `_process_macro_node_inline` (before the MACROS_STRUCTURAL flush
  path) so they are accumulated inline like MACROS_TEXT_FORMATTING.
- Add matching handlers in `_nodes_to_text` so colour names are
  skipped and only the text-content argument is returned.

Fixes #3207

Signed-off-by: Smeet Agrawal <smeetagrawal23@gmail.com>
Signed-off-by: Smeet23 <smeetagrawal2003@gmail.com>

* fix(latex): fully unwrap deeply nested formatting macros

Two related bugs when formatting macros are nested inside each other:

1. `\textcolor{color}{...}` extracted the color name alongside the
   text content because `_nodes_to_text` fell through to the generic
   else branch, which concatenates all arguments. E.g.
   `\section{\textcolor{blue}{\textbf{[SEP]}}}` produced heading text
   "blue [SEP]" instead of "[SEP]".

2. `\textsc`, `\textsf`, `\textrm`, `\textnormal`, `\mbox` and
   `\textcolor`/`\colorbox` are listed in MACROS_STRUCTURAL, so when
   encountered mid-sentence `_process_macro_node_inline` flushed the
   text buffer and called `_process_macro`, which creates a new doc
   node. This broke inline paragraphs into fragments.

Fix:
- Add MACROS_COLOR_INLINE constant for textcolor/colorbox to keep
  all macro classifications in one place (constants.py).
- Add explicit handlers for MACROS_TEXT_STYLE and MACROS_COLOR_INLINE
  in `_process_macro_node_inline` (before the MACROS_STRUCTURAL flush
  path) so they are accumulated inline like MACROS_TEXT_FORMATTING.
- Merge the identical MACROS_TEXT_FORMATTING and MACROS_TEXT_STYLE
  branches in `_nodes_to_text` into a single branch.
- Use argnlist[-1] instead of reversed() iteration for
  MACROS_COLOR_INLINE since the text content is always the last arg,
  consistent with _extract_macro_arg.

Fixes #3207

Signed-off-by: Smeet Agrawal <smeetagrawal23@gmail.com>
Signed-off-by: Smeet23 <smeetagrawal2003@gmail.com>

* refactor(latex): extract _macro_node_to_text to reduce complexity

Split the macro-handling branch of `_nodes_to_text` into a dedicated
`_macro_node_to_text` helper so that cyclomatic complexity stays within
the ruff C901 limit (was 31, now < 30 for both methods).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Smeet23 <smeetagrawal2003@gmail.com>

* fix(latex): migrate nested-formatting test to tests/test_latex/

Upstream reorganised all latex tests from tests/test_backend_latex.py
into tests/test_latex/. Move test_latex_nested_formatting_macros to
tests/test_latex/test_macros.py and fix ruff-reported style nits.

Signed-off-by: Smeet23 <smeetagrawal2003@gmail.com>

---------

Signed-off-by: Smeet Agrawal <smeetagrawal23@gmail.com>
Signed-off-by: Smeet23 <smeetagrawal2003@gmail.com>
Co-authored-by: Smeet Agrawal <smeetagrawal23@gmail.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 09:21:44 +02:00
..