Bug 55802 - [PATCH] Wrong encoding used for non-ASCII characters in text runs
Summary: [PATCH] Wrong encoding used for non-ASCII characters in text runs
Alias: None
Product: POI
Classification: Unclassified
Component: XWPF (show other bugs)
Version: 3.9-FINAL
Hardware: PC All
: P2 normal (vote)
Target Milestone: ---
Assignee: POI Developers List
Depends on:
Reported: 2013-11-20 14:53 UTC by tp
Modified: 2017-08-06 19:42 UTC (History)
0 users

Bug (76.31 KB, image/jpeg)
2013-12-12 08:04 UTC, tp
[PATCH] add default font ranges (8.87 KB, patch)
2014-01-29 23:40 UTC, Andreas Beeker
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description tp 2013-11-20 14:53:01 UTC
i am creating a document in that way:

XWPFRun aktRun = aktpara.createRun();

My textA-String includes german words which has 'ä, ü, ß or ö' inside. When writing the document i have a wrong font inside the words. It use the Calibri Font instead of the Font of the other letters of the word at that position. The size and the other attributes are all right, only the font is wrong.
Comment 1 Nick Burch 2013-11-20 17:19:59 UTC
I'm not quite understanding the problem. Any chance you can create a short junit unit test that shows the problem?
Comment 2 tp 2013-12-12 08:04:59 UTC
Created attachment 31105 [details]
Comment 3 tp 2013-12-12 08:08:56 UTC
Sorry to answer that late. My Problem is still very important for me to get solved. In the Attachement you can see the Problem.
When i open my generated file with Word (or open Office should be the same) i get everything in the right font (the font i used in the Java Project).
But all the Special letters like ü,ö,ä,ß uses the Font 'Calibri'. Whats wrong there, i know These letters also exist in the other Fonts. Everything works, the underline, italic, bold, fontsize, but only the font is different there. I hope my Infos are now enough, otherwise i can Show you more sourcetext, but it doesnt seem it is because of that! Thank you already:)
Comment 4 Andreas Beeker 2013-12-22 00:39:48 UTC
You probably need to show more of your code and/or explain how you get the input data - with a simple test, this can't be reproduced:

public void testUmlaut() throws Exception {
	XWPFDocument doc = new XWPFDocument();
	XWPFRun run = doc.createParagraph().createRun();
	run.setText("Ort, Datum der Erstellung: Kornelimünster, am 8. November 2013");
	OutputStream os = new FileOutputStream("umlaut.docx");

... doesn't generate any other textrun elements in the docx:

<w:p><w:r><w:rPr><w:rFonts w:ascii="Arial"/><w:sz w:val="22"/></w:rPr><w:t>Ort, Datum der Erstellung: Kornelimünster, am 8. November 2013</w:t></w:r></w:p>

My guess is, that you are converting from old doc to docx format and while reading the input file, you are already receiving broken textruns.
Comment 5 tp 2014-01-06 07:55:14 UTC
Thank you for the answer. I used your code to create a test-document.
Exactly the same error happens. The letter 'ü' ist Calibri, the rest is Arial.

I wonder, if its correct at you ?!
Comment 6 tp 2014-01-06 07:57:41 UTC
And yes, the file includes: <?xml version="1.0" encoding="UTF-8"?>
<w:document xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main"><w:body><w:p><w:r><w:rPr><w:rFonts w:ascii="Arial"/><w:sz w:val="22"/></w:rPr><w:t>Ort, Datum der Erstellung: Kornelimünster, am 8. November 2013</w:t></w:r></w:p></w:body></w:document>

So it seems its something else which is saved incorrect.
Comment 7 tp 2014-01-29 09:33:06 UTC
May i get some help? It is very important for my Project that this works fine.
thank you in advance
Comment 8 Andreas Beeker 2014-01-29 22:02:29 UTC
The above example works ok in Libre Writer and Windows Wordpad, but MS Word (Viewer) seem to need the hAnsi attribute to be set: [1]

So as a temporary workaround, you'll need to write to the xmlbeans directly:

run.setText("Ort, Datum der Erstellung: Kornelimünster, am 8. November 2013");
run.setFontFamily("Times New Roman");
run.getCTR().getRPr().getRFonts().setHAnsi("Times New Roman");

[1] http://officeopenxml.com/WPtextFonts.php
Comment 9 Andreas Beeker 2014-01-29 23:40:37 UTC
Created attachment 31271 [details]
[PATCH] add default font ranges

This patch sets the font family of other font ranges to a default value if not they aren't specified explicitly.

(... to be applied when POI 3.10 final is released ...)
Comment 10 tp 2014-01-30 09:22:21 UTC
Many thanks Andreas,
finally it works.
Comment 11 Andreas Beeker 2014-01-30 09:55:01 UTC
I haven't committed the patch yet, so I mark it as reopened.
Comment 12 Andreas Beeker 2014-02-01 22:27:19 UTC
Patch applied with r1563496.