Bug 64693 - POI HwmfGraphics cannot read the embedded document title
Summary: POI HwmfGraphics cannot read the embedded document title
Status: RESOLVED FIXED
Alias: None
Product: POI
Classification: Unclassified
Component: XSLF (show other bugs)
Version: 4.1.2-FINAL
Hardware: PC All
: P2 normal (vote)
Target Milestone: ---
Assignee: POI Developers List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-08-28 08:23 UTC by Lee
Modified: 2020-09-03 19:59 UTC (History)
1 user (show)



Attachments
the orignal document (860.77 KB, application/vnd.openxmlformats-officedocument.presentationml.presentation)
2020-08-29 00:58 UTC, Lee
Details
the converted img file (3.51 KB, image/png)
2020-09-03 07:08 UTC, Lee
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Lee 2020-08-28 08:23:10 UTC
I use the org.apache.poi.xslf.util.PPTX2PNG to convert the ppt file into a png file;  but the XSLFSlide's draw method cannot draw the Chinese charset correctly when the ppt contains an embedded file, and the embedded  file's title contains Chinese charset.  the embedded  file's title in the converted image file is garbled.
Comment 1 Andreas Beeker 2020-08-28 21:03:15 UTC
Please add the ppt file and optionally a png/jpg file rendered with Office.
If the ppt/wmf uses a special font, which I can't google&download, then please also attach it.
Comment 2 Lee 2020-08-29 00:58:39 UTC
Created attachment 37411 [details]
the orignal  document
Comment 3 Andreas Beeker 2020-08-30 11:33:55 UTC
fixed via r1881322

The problem with WMF is, that the bytes in the ExtTextOut records are depending on the used font and it usually configured to use the default charset which is depending on the system locale. [1]

Therefore I've introduced a charset option, which can be either set ...
- via a rendering hint:
graphics.setRenderingHint(Drawable.DEFAULT_CHARSET, Charset.forName("GBK"));
- or directly set it in the HwmfPicture/HemfPicture via:
HwmfPicture.setDefaultCharset(Charset.forName("GBK"))

The rendering hint is necessary to pass the option when rendering slides and no direct access to the Hwmf/Hemf classes is possible.
The direct access to Hwmf/Hemf can be used, when extracting text from the records.

To simplify the handling, I've added a "-charset" option to PPTX2PNG - so in your case, you need to add "-charset GBK"

Apart of that I've fixed some deprecated API usage, detailed the output on GenericRecordJsonWriter on BufferedImages and fixed some of the image composition raster operations, i.e. the icon has a transparent background instead of a black one.



[1] https://docs.microsoft.com/en-us/openspecs/windows_protocols/ms-wmf/0d0b32ac-a836-4bd2-a112-b6000a1b4fc9
Comment 4 Lee 2020-09-03 06:36:01 UTC
Hi, thank you for your reply!

I tried the newest code and use the PPTX2PNG -charset, but the embedded file cannot be display, i cannot see the icon and title in the saved image file.



(In reply to Andreas Beeker from comment #3)
> fixed via r1881322
> 
> The problem with WMF is, that the bytes in the ExtTextOut records are
> depending on the used font and it usually configured to use the default
> charset which is depending on the system locale. [1]
> 
> Therefore I've introduced a charset option, which can be either set ...
> - via a rendering hint:
> graphics.setRenderingHint(Drawable.DEFAULT_CHARSET, Charset.forName("GBK"));
> - or directly set it in the HwmfPicture/HemfPicture via:
> HwmfPicture.setDefaultCharset(Charset.forName("GBK"))
> 
> The rendering hint is necessary to pass the option when rendering slides and
> no direct access to the Hwmf/Hemf classes is possible.
> The direct access to Hwmf/Hemf can be used, when extracting text from the
> records.
> 
> To simplify the handling, I've added a "-charset" option to PPTX2PNG - so in
> your case, you need to add "-charset GBK"
> 
> Apart of that I've fixed some deprecated API usage, detailed the output on
> GenericRecordJsonWriter on BufferedImages and fixed some of the image
> composition raster operations, i.e. the icon has a transparent background
> instead of a black one.
> 
> 
> 
> [1]
> https://docs.microsoft.com/en-us/openspecs/windows_protocols/ms-wmf/0d0b32ac-
> a836-4bd2-a112-b6000a1b4fc9
Comment 5 Lee 2020-09-03 07:08:41 UTC
Created attachment 37420 [details]
the converted img file
Comment 6 Andreas Beeker 2020-09-03 18:37:33 UTC
My guess is, you forgot to include the scratchpad and the "provided" jars, which contains the WMF renderer.

While trying to describe below, I found out, that the PPTX2PNG argument handling is case-sensitive and so doesn't match the docs, but apart of that it works for me. I think we should also add the "provided" jars to the binary bundle, as this will be less pain for the users and nobody cares nowadays about a increase in the bundle size.


- Download and unzip the nightly:
https://ci-builds.apache.org/job/POI/job/POI-DSL-1.8/lastSuccessfulBuild/artifact/build/dist/

- Download the "provided" jars and put them in the directory "provided":
https://search.maven.org/artifact/org.apache.xmlgraphics/batik-all/1.13/pom
https://search.maven.org/artifact/xml-apis/xml-apis-ext/1.3.04/jar
https://search.maven.org/artifact/org.apache.xmlgraphics/xmlgraphics-commons/2.4/jar

- Execute the java command (Unix-paths needs to be replaced for Windows):
java -cp poi-5.0.0-SNAPSHOT.jar:poi-ooxml-5.0.0-SNAPSHOT.jar:poi-ooxml-schemas-5.0.0-SNAPSHOT.jar:poi-scratchpad-5.0.0-SNAPSHOT.jar:lib/*:ooxml-lib/*:provided/* org.apache.poi.xslf.util.PPTX2PNG -format png -fixside long -scale 1000 -charset GBK -outdir . ttt.pptx
Comment 7 Andreas Beeker 2020-09-03 19:59:38 UTC
Added the "provided" jars to the binary bundle.
Ignore case in PPTX2PNG argument handling.
added instructions to the renderer docs.