* fix(pptx): skip malformed picture shapes instead of aborting conversion
MsPowerpointDocumentBackend._handle_pictures reads embedded image bytes via python-pptx's shape.image accessor. On PPTX files with slightly malformed <p:pic> shapes, shape.image raises three exceptions that the existing (UnidentifiedImageError, OSError, ValueError) clause does not catch, so one bad picture aborts conversion of the entire presentation:
- InvalidXmlError when <p:blipFill> is missing
- KeyError when <a:blip r:embed> points to an unknown relationship
- AttributeError when the embedded part's content-type isn't an image
These files open normally in Keynote and Google Drive, so the backend should handle them as gracefully as it already handles truncated or unreadable image payloads.
This follows the same pattern as #2914, which extended the same except tuple with ValueError to handle linked (external) image references. The three cases above are the remaining shape.image failure modes that still escape.
Extend the except tuple to cover the three cases and log the same warning used for other unreadable images, leaving the rest of the presentation to convert normally. Add a regression fixture with one malformed picture per failure mode plus a focused test.
Fixes#3371
Signed-off-by: pateltejas <tejas226@hotmail.com>
* refactor(pptx): use warnings.warn for malformed picture skips
Address PR review feedback: use Python's warnings module with UserWarning to signal the skip to callers instead of logging.Logger.warning, matching the pattern used in msword_backend for "Skipping external image reference". This makes the skip visible via standard warning filters and catchable in tests.
Update the regression test to assert the warning is emitted via pytest.warns, which also suppresses the message during the test run so it doesn't clutter suite output.
Signed-off-by: pateltejas <tejas226@hotmail.com>
---------
Signed-off-by: pateltejas <tejas226@hotmail.com>
* fix(pptx): handle NotImplementedError from shape.shape_type
python-pptx raises NotImplementedError from Shape.shape_type for
<p:sp> elements that aren't placeholders, autoshapes, textboxes, or
freeforms (e.g. shapes with empty <p:spPr> from Google Slides exports,
LibreOffice, or Keynote). handle_groups() and handle_shapes() access
shape_type without catching this, crashing the entire conversion.
Add a _safe_shape_type() helper that returns None on
NotImplementedError, so unrecognized shapes skip only the GROUP
recursion and PICTURE extraction while text and table extraction
proceed normally.
Fixes#3308
Signed-off-by: Tejas Patel <tejas226@hotmail.com>
* Fix lint
Signed-off-by: Tejas Patel <tejas226@hotmail.com>
---------
Signed-off-by: Tejas Patel <tejas226@hotmail.com>
* fix(pptx): handle picture shapes with external image references
When processing PowerPoint files containing picture shapes that reference
external images (rather than embedded images), the python-pptx library
raises a ValueError("no embedded image") when accessing the `image`
property.
Previously, this caused the entire document conversion to fail because:
1. The `hasattr(shape, "image")` check at line 690 would trigger the
property getter, which raises ValueError (hasattr only catches
AttributeError, not ValueError)
2. The exception handler in `_handle_pictures()` only caught
UnidentifiedImageError and OSError, not ValueError
This fix:
- Removes the unnecessary hasattr check since we already verify the
shape type is MSO_SHAPE_TYPE.PICTURE
- Adds ValueError to the exception handler in `_handle_pictures()` so
that picture shapes with external references are gracefully skipped
with a warning instead of crashing the pipeline
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* DCO Remediation Commit for Sam Quigley <quigley@emerose.com>
I, Sam Quigley <quigley@emerose.com>, hereby add my Signed-off-by to this commit: e69779e07b
Signed-off-by: Sam Quigley <quigley@emerose.com>
* tests(pptx): add a linked image to test the fix on e69779e
Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com>
---------
Signed-off-by: Sam Quigley <quigley@emerose.com>
Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Co-authored-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com>
* feat: Add PPTX notes slides
Presenter notes may have useful information and should also be extracted.
Signed-off-by: Maciej Wieczorek <maciej@wieczorek.co>
* feat: Move presenter notes into furniture
Signed-off-by: Maciej Wieczorek <maciej@wieczorek.co>
---------
Signed-off-by: Maciej Wieczorek <maciej@wieczorek.co>