Apache OpenOffice (AOO) Bugzilla – Issue 107254
Incorrect ToUnicode table mapping for ligatures in PDF
Last modified: 2010-06-15 08:22:01 UTC
When a font with ligatures is used in Writer and the document exported to PDF, the second letter of the ligature is effectively dropped from the file. For example, in the word 'description' the PDF file has the word 'descripton'. This is because the “ToUnicode” table associated with the (in my example) Calibri font explicitly maps the “ti” ligature into just a “t” rather than the proper “ti” as InDesign will do. This affects copy/paste from the PDF as well as screen reader users.
MRU->HDU: maybe same problem as issue 95057?
@pl: The root cause seems to be that PDFWriterImpl::createToUnicodeCMap() doesn't handle 1:n glyph- >unicode mappings yet. PDFWriterImpl::drawLayout() has a similar problem. It gets the indices into the text string but it just stores the first char at the index instead of all text until the next used index.
target
@hdu: committed a naive LTR solution to CWS vcl108; please prepare SalLayout::GetNextGlyphs for the following cases: - ligature at end of GetNextGlyphs run: the two or more Unicodes of a ligature could be split over GetNextGlyph runs, that needs to be avoided - RTL/LTR switches.
.
Removing accessibility keyword. This is simply a broken PDF export, nothing specific to accessibility. Targeted for OOo 3.3 anyway.
Fixed in CWS vcl108 which got into DEV300_m71. For ligatures in RTL runs there is the followup issue 112382.
Got into DEV300_m71 -> closing