Apache OpenOffice (AOO) Bugzilla – Issue 86288
TXT: Urdu (Unicode 16) encoded text file not recognized correctly anymore
Last modified: 2017-05-20 11:15:20 UTC
I have a collection of text files that contain text in various languages, e.g. Arabic, Pashto, Urdu, Farsi, Russian, Chinese etc. These documents are either in the UTF-8 or UTF-16 encoding. I need to write a Basic macro that will load a file and export it into PDF. The default loading operation such as this oDoc = StarDesktop.loadComponentFromURL( cInURL, "_blank", 0, _ Array(MakePropertyValue( "Hidden", True ),)) works for some files but not for others. For instance, it seems to correctly load Arabic files but not Urdu. This seems to be related to the issue #63077 (http://qa.openoffice.org/issues/show_bug.cgi?id=63077) which has been reported a while ago. Has it been resolved in the release 2.3.X? In other words, is it possible to correctly load a UTF-8 or -16 encoded text file with BOM without explicitly specifying the encoding? Thank you.
Created attachment 51617 [details] Sample UTF-16 file that is loaded incorrectly
It is quite difficult to recognize a correct encoding of a txt file's content automatically. In OO 2.2 the attached file was displayed correctly, but this seemed to happen quite randomly in this case. MRU->MBA: a good time think about the txt handling when NO filter is preselected.
I had problems opening some files that had the standard BOM headers ("EF-BB-BF" for UTF-8 and "FF-FE" for UTF-16). I will attach the PDF file that was generated from the document "Urdu_sample_UTF16.txt" that I'd attached earlier. As you will see, the formatting is incorrect. A similar PDF generated from an Arabic text file in UTF-16 was fine. The bad formatting may have to do more with the actual Urdu Unicode characters than with the encoding itself, but still the question is whether it's possible to correctly load a UTF-8/-16 text file with BOM correctly regardless of what characters appears in the file.
Created attachment 51637 [details] PDF file from UTF-16 Urdu document -- bad formatting
target 3.x.
Reset assigne to the default "issues@openoffice.apache.org".