Bug 64876 - Unable to convert pptx to pdf
Summary: Unable to convert pptx to pdf
Status: REOPENED
Alias: None
Product: POI
Classification: Unclassified
Component: XSLF (show other bugs)
Version: 4.1.2-FINAL
Hardware: PC All
: P2 critical (vote)
Target Milestone: ---
Assignee: POI Developers List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-11-05 16:41 UTC by Suhail Zamir
Modified: 2020-11-30 21:14 UTC (History)
0 users



Attachments
sample pptx file with issue (25.73 KB, application/vnd.openxmlformats-officedocument.presentationml.presentation)
2020-11-06 08:45 UTC, Suhail Zamir
Details
Converted using 5.x (10.99 KB, image/png)
2020-11-10 10:44 UTC, Suhail Zamir
Details
PDF checker result (15.70 KB, application/zip)
2020-11-11 22:17 UTC, Andreas Beeker
Details
Issue with acrobat reader (423.43 KB, application/pdf)
2020-11-13 09:34 UTC, Suhail Zamir
Details
pdf checker output of test4.pdf (4.34 KB, application/json)
2020-11-13 09:35 UTC, Suhail Zamir
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Suhail Zamir 2020-11-05 16:41:48 UTC
When tried converting pptx to pdf we get the below error

java.lang.RuntimeException: invalid wmf file - window records are incomplete.
	at org.apache.poi.hwmf.usermodel.HwmfPicture.getBounds(HwmfPicture.java:182)
	at org.apache.poi.hwmf.usermodel.HwmfPicture.draw(HwmfPicture.java:134)
	at org.apache.poi.hwmf.draw.HwmfImageRenderer.drawImage(HwmfImageRenderer.java:129)
	at org.apache.poi.sl.draw.DrawPictureShape.drawContent(DrawPictureShape.java:64)
	at org.apache.poi.sl.draw.DrawSimpleShape.draw(DrawSimpleShape.java:107)
	at org.apache.poi.sl.draw.DrawSheet.draw(DrawSheet.java:71)
	at org.apache.poi.sl.draw.DrawSheet.draw(DrawSheet.java:50)
	at org.apache.poi.sl.draw.DrawSheet.draw(DrawSheet.java:50)
	at org.apache.poi.sl.draw.DrawSlide.draw(DrawSlide.java:41)
	at org.apache.poi.xslf.usermodel.XSLFSlide.draw(XSLFSlide.java:373)
	at com.test.PPTxConverter.convertToPDFOld(PPTxConverter.java:75)
	at com.test.PPTxConverter.main(PPTxConverter.java:161)

The same implementation works fine with 3.15 , 3.12 versions.

The implementation is similar to https://github.com/yeokm1/docs-to-pdf-converter/blob/master/docs-to-pdf-converter/src/com/yeokhengmeng/docstopdfconverter/PptxToPDFConverter.java

We need to upgrade to 4.x as the older versions have few vulnerabilities.
Comment 1 Andreas Beeker 2020-11-05 16:51:49 UTC
please add the failing .pptx or send it to my apache email. I need to check the embedded WMF.
Comment 2 Suhail Zamir 2020-11-05 19:07:01 UTC
Thanks for your prompt response Andreas.
Those pptx files are internal and confidential. And now I verified pptx files without the WMF files related to our org, they are being converted fine.

I will try to get an approval and share it with you, or try to recreate a one  with similar wmf and without any confidential data and share it.

Thanks again.
Comment 3 Suhail Zamir 2020-11-06 08:45:49 UTC
Created attachment 37546 [details]
sample pptx file with issue

Hi Andreas,

Attaching the sample pptx file with the issue for your reference.

Thanks,
Suhail Zamir
Comment 4 Andreas Beeker 2020-11-08 22:52:38 UTC
If you use the trunk version, it will recognize the WMF file as EMF.
Apart of that, it's one of those images where the inner bounds don't match the outer bounds - therefore you need to use the -emfHeaderBounds (only in the trunk yet) with PPTX2PNG or set graphics.setRenderingHint(Drawable.EMF_FORCE_HEADER_BOUNDS, true).


To verify the bounds issue above, you can use PPTX2PNG with -dump <output.json> option. The view bounds are (0,0,1879,357)...

>},{   /* setViewportOrgEx - index: 4 */
>  "origin": { "x": 0.0, "y": 0.0 }
>},{   /* setWindowExtEx - index: 5 */
>  "size": { "width": 1879.0, "height": 357.0 }
>},{   /* setViewportExtEx - index: 6 */
>  "extents": { "width": 1879.0, "height": 357.0 }

but later on you see something like this ... which is outside that view:

> { "type": "move", "x": 2154.0, "y": 638.0 }


Currently I can't distinguish between images which header bounds are just describing the smallest bounds around the graphics and header bounds correctly specifying the placement of the image. Especially with EMFs nested in EMFs the header bounds seem to be ignored.
Comment 5 Suhail Zamir 2020-11-09 08:29:56 UTC
Thanks Andreas,

In which future version can we expect this issue to be addressed?
As in all lower versions where this issue is not occurring, there are security vulnerabilities, so downgrading is also not an option here.
Comment 6 Andreas Beeker 2020-11-09 10:09:41 UTC
(In reply to Suhail Zamir from comment #5)
> In which future version can we expect this issue to be addressed?

POI 5.0.0 will be released in December, I guess.
Please test-drive the nightly as described in [1] ("instructions to run") and send us/me feedback, if you have further issues.


[1] http://poi.apache.org/components/slideshow/ppt-wmf-emf-renderer.html
Comment 7 Suhail Zamir 2020-11-09 10:12:03 UTC
Sure Andreas,

We will verify and provide the feedback soon.

Thanks,
Suhail Zamir
Comment 8 Suhail Zamir 2020-11-10 10:44:29 UTC
Created attachment 37553 [details]
Converted using 5.x

Hi Andreas,

Now we are able to convert those pptx files. The WMF images which were causing issues while conversion are also converted correctly when the flag -emfHeaderBounds is added.

But we are not able to view the converted files in Acrobat Reader. I have attached the error.

Thanks,
Suhail Zamir
Comment 9 Andreas Beeker 2020-11-10 22:25:01 UTC
I've only tested with the typical tools in Linux.
So Chrome is also complaining about that file?

Are you saving the file straight away to the disc or is there some webservice in-between which prematurely closes the stream?

Can you provide the input-pptx to me privately?
Comment 10 Suhail Zamir 2020-11-11 04:58:00 UTC
Hi Andreas,

I am using the PPTX2PNG tool to convert it. And chrome doesn't show any error.

You can try the same with the PPTX I shared earlier.

Thanks,
Suhail Zamir
Comment 11 Andreas Beeker 2020-11-11 22:17:18 UTC
Created attachment 37559 [details]
PDF checker result

PDF checker [1] didn't find any issues.
You can compare your output with mine and you can upload the file to pdf-checker to see if there are any issues.

if there aren't issues report and it still doesn't work ... I'll try to install adobe reader and see for myself ...

[1] https://www.datalogics.com/products/pdf-tools/pdf-checker/
Comment 12 Suhail Zamir 2020-11-13 09:34:39 UTC
Created attachment 37562 [details]
Issue with acrobat reader

Attaching the pdf file converted using PPTX2PNG tool with issue.
Same opens fine in chrome.
Comment 13 Suhail Zamir 2020-11-13 09:35:32 UTC
Created attachment 37563 [details]
pdf checker output of test4.pdf

There seem to be no issues with this pdf as well. Attaching the pdf checker output
Comment 14 Suhail Zamir 2020-11-18 07:04:52 UTC
Hi Andreas,

Did you get a chance to verify this on acrobat reader?

Thanks,
Suhail Zamir
Comment 15 Andreas Beeker 2020-11-18 07:19:08 UTC
Yes I did.
I've opened it in Acrobat 6.0 and it complained about Type 3 fonts, i.e. it doesn't like embedded PS fonts. I haven't yet tried to fix it, as I'm busy with fixing EMF/WMF related issues.
Comment 16 Andreas Beeker 2020-11-30 21:14:41 UTC
I've opened an issue over at pdfbox-graphics2d site [1] and tried to write a mcve example, but couldn't reproduce the Type 3 error.

So the error is based on the shapes somehow.

[1] https://github.com/rototor/pdfbox-graphics2d/issues/29