86288 – TXT: Urdu (Unicode 16) encoded text file not recognized correctly anymore

Issue 86288 - TXT: Urdu (Unicode 16) encoded text file not recognized correctly anymore

Summary: TXT: Urdu (Unicode 16) encoded text file not recognized correctly anymore

Status:	CONFIRMED

Alias:	None

Product:	Writer
Classification:	Application
Component:	programming (show other issues)
Version:	OOo 2.2.1
Hardware:	PC Windows XP

Importance:	P3 Trivial (vote)
Target Milestone:	---
Assignee:	AOO issues mailing list
QA Contact:

URL:
Keywords:

Depends on:
Blocks:

Reported:	2008-02-20 22:59 UTC by izavorin
Modified:	2017-05-20 11:15 UTC (History)
CC List:	1 user (show)

See Also:
Issue Type:	DEFECT
Latest Confirmation in:	---
Developer Difficulty:	---

Attachments
Sample UTF-16 file that is loaded incorrectly (5.23 KB, text/plain) 2008-02-20 23:02 UTC, izavorin	no flags	Details
PDF file from UTF-16 Urdu document -- bad formatting (43.37 KB, application/pdf) 2008-02-21 17:09 UTC, izavorin	no flags	Details
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this issue.

Description izavorin 2008-02-20 22:59:15 UTC

I have a collection of text files that contain text in various languages, e.g.
Arabic, Pashto, Urdu, Farsi, Russian, Chinese etc. These documents are either in
the UTF-8 or UTF-16 encoding. I need to write a Basic macro that will load a
file and export it into PDF. The default loading operation such as this

oDoc = StarDesktop.loadComponentFromURL( cInURL, "_blank", 0, _
       Array(MakePropertyValue( "Hidden", True ),))

works for some files but not for others. For instance, it seems to correctly
load Arabic files but not Urdu. This seems to be related to the issue #63077
(http://qa.openoffice.org/issues/show_bug.cgi?id=63077) which has been reported
a while ago. Has it been resolved in the release 2.3.X? In other words, is it
possible to correctly load a UTF-8 or -16 encoded text file with BOM without
explicitly specifying the encoding? Thank you.

Comment 1 izavorin 2008-02-20 23:02:54 UTC

Created attachment 51617 [details]
Sample UTF-16 file that is loaded incorrectly

Comment 2 michael.ruess 2008-02-21 11:00:32 UTC

It is quite difficult to recognize a correct encoding of a txt file's content
automatically. In OO 2.2 the attached file was displayed correctly, but this
seemed to happen quite randomly in this case.
MRU->MBA: a good time think about the txt handling when NO filter is preselected.

Comment 3 izavorin 2008-02-21 17:08:20 UTC

I had problems opening some files that had the standard BOM headers ("EF-BB-BF"
for UTF-8 and "FF-FE" for UTF-16). I will attach the PDF file that was generated
from the document "Urdu_sample_UTF16.txt" that I'd attached earlier. As you will
see, the formatting is incorrect. A similar PDF generated from an Arabic text
file in UTF-16 was fine. The bad formatting may have to do more with the actual
Urdu Unicode characters than with the encoding itself, but still the question is
whether it's possible to correctly load a UTF-8/-16 text file with BOM correctly
regardless of what characters appears in the file.

Comment 4 izavorin 2008-02-21 17:09:23 UTC

Created attachment 51637 [details]
PDF file from UTF-16 Urdu document -- bad formatting

Comment 5 Mathias_Bauer 2008-04-21 17:18:04 UTC

target 3.x.

Comment 6 Marcus 2017-05-20 11:15:20 UTC

Reset assigne to the default "issues@openoffice.apache.org".