Bug 61169 - Text with Japanese characters overflows textbox
Summary: Text with Japanese characters overflows textbox
Alias: None
Product: POI
Classification: Unclassified
Component: SL Common (show other bugs)
Version: 3.16-FINAL
Hardware: All All
: P2 normal (vote)
Target Milestone: ---
Assignee: POI Developers List
Depends on:
Blocks: 45140
  Show dependency tree
Reported: 2017-06-09 09:27 UTC by François Beaune
Modified: 2017-07-08 22:26 UTC (History)
0 users

Java repro case (1.45 KB, text/plain)
2017-06-09 09:27 UTC, François Beaune
PowerPoint file generated by repro case (24.97 KB, application/vnd.openxmlformats-officedocument.presentationml.presentation)
2017-06-09 09:28 UTC, François Beaune
Test class with registered font (2.97 KB, text/x-java)
2017-06-17 00:02 UTC, Andreas Beeker
Result when using Apache POI (commit a753adb84805ff0f7b7385905780b07e5fe9e4ab on GitHub) (35.05 KB, image/png)
2017-06-26 13:52 UTC, François Beaune

Note You need to log in before you can comment on or make changes to this bug.
Description François Beaune 2017-06-09 09:27:54 UTC
Created attachment 35041 [details]
Java repro case

When using the XSLF API, text with Japanese characters (left-to-right) overflows the textbox, even when using default styling (default font family, size and style).
Comment 1 François Beaune 2017-06-09 09:28:24 UTC
Created attachment 35042 [details]
PowerPoint file generated by repro case
Comment 2 Andreas Beeker 2017-06-14 22:17:27 UTC
tl;dr: the textbox is too short because of an undefined/unregistered font and there is an issue in calculating the text height / width in POI.

There are a few issues with the current rendering code, which also applies for calculating the text height:
- the textbox indents are ignored when the text height is calculated
- you need to register a font having those japanese glyphs in
- my test font (mona) has a textlayout leading of 0, hence the leading need to be fixed somehow

The rendering in Libre Office seems to use some kind of tracking (= opposite of kerning). Although the Tracking attribute can be added to the AttributedString, this is ignored when breaking the text. An alternative to modify the registered font [1] doesn't work.

[1] https://stackoverflow.com/questions/13229725
Comment 3 Andreas Beeker 2017-06-14 22:23:59 UTC
for the records, the corresponding SO issue:
Comment 4 Andreas Beeker 2017-06-15 16:21:42 UTC
"LineBreakMeasurer does not measure correctly if TextAttribute.TRACKING is set."
(Affects Version/s: 6.0, 7, 8, 8u102, 9)

To recap: Libre Office uses more lines to display the text, because the glyphs are wider spread opposed to the Java rendering. Although the rendering can be modified with the TRACKING attribute, the linebreak measurer is not taking it into account.

Maybe it's possible to copy&adapt the standard linebreak measurer ...
Comment 5 Andreas Beeker 2017-06-17 00:02:33 UTC
Created attachment 35059 [details]
Test class with registered font
Comment 6 Andreas Beeker 2017-06-17 00:06:14 UTC
Added a (partial *) ) fix via r1798986

Lets forget about the tracking issue mentioned above - you need to specify also the "ea" attribute for asian fonts - see my test class.

*) ... at least for the Mona font, the rendering output is similar to the libre office dimensions, so I'm closing this now.
Comment 7 François Beaune 2017-06-26 13:49:34 UTC
Thanks for the updates and the fix Andreas. We just tried our repro case with the latest Apache POI cloned from GitHub.

Unfortunately it looks like it doesn't entirely fix our problem. On Windows, there is pretty much no difference between Apache POI 3.16 Final and Git master as of today (with your fix). On Linux, the box is indeed taller but it still doesn't enclose all the text, see my latest attachment.
Comment 8 François Beaune 2017-06-26 13:52:54 UTC
Created attachment 35076 [details]
Result when using Apache POI (commit a753adb84805ff0f7b7385905780b07e5fe9e4ab on GitHub)
Comment 9 Andreas Beeker 2017-07-08 22:26:58 UTC
add resize methods with Graphics argument via r1801329

I still need to provide new methods to specify the charset - for east asian and complex script fonts, otherwise Libre Office and probably also Office don't use the set font family but default to something else, which renders futile any textbox calculation.