Issue 101327 - Extra spaces in PDF import of Hebrew documents
Summary: Extra spaces in PDF import of Hebrew documents
Status: CLOSED FIXED
Alias: None
Product: extensions
Classification: Extensions
Component: pdfimport (show other issues)
Version: OOO310m1
Hardware: Unknown All
: P3 Trivial (vote)
Target Milestone: milestone 1
Assignee: michael.ruess
QA Contact: wolframgarten
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2009-04-24 15:47 UTC by alan
Modified: 2010-04-30 11:15 UTC (History)
4 users (show)

See Also:
Issue Type: PATCH
Latest Confirmation in: ---
Developer Difficulty: ---


Attachments
Sample Hebrew PDF document (136.83 KB, text/plain)
2009-04-24 15:48 UTC, alan
no flags Details
propsed patch (803 bytes, text/plain)
2009-04-24 15:49 UTC, alan
no flags Details

Note You need to log in before you can comment on or make changes to this issue.
Description alan 2009-04-24 15:47:29 UTC
When importing the attached Hebrew PDF document, I saw that there were extra
spaces in the text. This was because the PDF importer was not treating a
non-breaking space (160) like a breaking space (32). I changed the code to treat
a non-breaking space like a breaking space in  PDFIProcessor::drawGlyphLine,
which solved the problem.

Note: the words are now split correctly, but the spaces between them are too
big. This will be reported in another issue.
Comment 1 alan 2009-04-24 15:48:19 UTC
Created attachment 61795 [details]
Sample Hebrew PDF document
Comment 2 alan 2009-04-24 15:49:59 UTC
Created attachment 61796 [details]
propsed patch
Comment 3 philipp.lohmann 2009-04-24 16:28:04 UTC
reassign
Comment 4 philipp.lohmann 2009-05-08 10:42:35 UTC
committed in CWS pdfextfix02
Comment 5 philipp.lohmann 2009-05-10 10:48:04 UTC
please verify in CWS pdfextfix02
Comment 6 philipp.lohmann 2009-05-11 09:40:35 UTC
@mru: thanks for taking over
Comment 7 michael.ruess 2009-05-11 12:38:47 UTC
Verified in CWS pdfextfix02.
Comment 8 michael.ruess 2010-04-30 11:15:50 UTC
Closed.