Issue 107254 - Incorrect ToUnicode table mapping for ligatures in PDF
Summary: Incorrect ToUnicode table mapping for ligatures in PDF
Alias: None
Product: gsl
Classification: Code
Component: code (show other issues)
Version: OOO310m11
Hardware: All All
: P3 Trivial with 2 votes (vote)
Target Milestone: OOo 3.3
QA Contact: issues@gsl
Depends on:
Blocks: 112382 112263
  Show dependency tree
Reported: 2009-11-27 06:17 UTC by awkawk
Modified: 2010-06-15 08:22 UTC (History)
4 users (show)

See Also:
Issue Type: DEFECT
Latest Confirmation in: ---
Developer Difficulty: ---


Note You need to log in before you can comment on or make changes to this issue.
Description awkawk 2009-11-27 06:17:42 UTC
When a font with ligatures is used in Writer and the document exported to PDF, 
the second letter of the ligature is effectively dropped from the file.  For 
example, in the word 'description' the PDF file has the word 'descripton'.

This is because the “ToUnicode” table associated with the (in my example) 
Calibri font explicitly maps the “ti” ligature into just a “t” rather than the 
proper “ti” as InDesign will do.

This affects copy/paste from the PDF as well as screen reader users.
Comment 1 michael.ruess 2009-11-27 10:54:54 UTC
MRU->HDU: maybe same problem as issue 95057?
Comment 2 2009-11-27 11:08:51 UTC
@pl: The root cause seems to be that PDFWriterImpl::createToUnicodeCMap() doesn't handle 1:n glyph-
>unicode mappings yet. PDFWriterImpl::drawLayout() has a similar problem. It gets the indices into the 
text string but it just stores the first char at the index instead of all text until the next used index.
Comment 3 philipp.lohmann 2009-11-27 11:52:00 UTC
Comment 4 philipp.lohmann 2009-12-17 16:28:29 UTC
@hdu: committed a naive LTR solution to CWS vcl108; please prepare
SalLayout::GetNextGlyphs for the following cases:

- ligature at end of GetNextGlyphs run: the two or more Unicodes of a ligature
could be split over GetNextGlyph runs, that needs to be avoided
- RTL/LTR switches.
Comment 5 2009-12-18 07:48:03 UTC
Comment 6 malte_timmermann 2010-01-13 13:34:42 UTC
Removing accessibility keyword.
This is simply a broken PDF export, nothing specific to accessibility.
Targeted for OOo 3.3 anyway.
Comment 7 2010-06-14 14:12:34 UTC
Fixed in CWS vcl108 which got into DEV300_m71.
For ligatures in RTL runs there is the followup issue 112382.
Comment 8 2010-06-15 08:22:01 UTC
Got into DEV300_m71 -> closing