Issue 125171

Summary: Linked JPG image not preserved when exported to PDF with loss-less compression
Product: Writer Reporter: Ariel Constenla-Haile <arielch>
Component: save-exportAssignee: AOO issues mailing list <issues>
Status: CLOSED FIXED QA Contact:
Severity: Normal    
Priority: P3 CC: Armin.Le.Grand, fanyuzhen, jsc
Version: 3.3.0 or older (OOo)Flags: jsc: 4.1.1_release_blocker+
Target Milestone: 4.1.1   
Hardware: All   
OS: All   
Issue Type: DEFECT Latest Confirmation in: 4.2.0-dev
Developer Difficulty: ---
Attachments:
Description Flags
reduced bugdoc none

Description Ariel Constenla-Haile 2014-06-27 22:05:08 UTC
- Download the ZIP file from attachment 83619 [details]
- Unzip it, and open Hvordan-man-systematisk-forgifter.html with OpenOffice
- from OpenOffice Writer/Web, export to PDF with Lossless Compression checked

The exported PDF file is 4.3 Mb

- Save the file as ODT
- Open the ODT and break all links
- Export to PDF

The exported PDF file is 673 Kb, and all the JPG images are the same as the one embedded in the ODT zip container, which are in turn the same as in the internet.

Breaking linked JPG images preserves the format.
Exporting to PDF, linked JPG images don't preserve their original format.
Comment 1 Armin Le Grand 2014-06-30 08:46:37 UTC
This is because the current PDF export is more or less a recorded paint/print to a Metafile. Metafiles only know Bitmap/BitmapEx internally which represent the bitmap data, but have no information or access to the original files or their formats. The rough future direction (long term) would be to change all apps to primitive rendering and then write a primitive renderer for pdf export.

Historically it would have also helped if the pdf expoert would be an exporter working on the UNO API and the app models, but it's not.

Simpler in-between solutions would be possible, but would mean to add extra data to the existing stuff in a evtl. unclean way.
Comment 2 Armin Le Grand 2014-07-02 09:17:14 UTC
Checked again, there is some code to keep access to the original image in PDF export (using the PDFMetaData parallel to the metafile). Tried the following in AOO410:

- new writer
- insert jpeg linked
- export with loss less compression -> FileA
- break link
- export with loss less compression -> FileB
-> FileA and FileB are identical, no change visible (as it should be, PDF does not support linked graphics)

Tried the same on trunk version, same result.
@Ariel: What version are you using? I remember to have fixed something before AOO410 in that context...
Comment 3 Ariel Constenla-Haile 2014-07-02 10:29:52 UTC
(In reply to Armin Le Grand from comment #2)
> @Ariel: What version are you using? I remember to have fixed something
> before AOO410 in that context...

See bug 105243 comment 28, OOo 3.0.0, OOo 3.1.1, and 4.1.0
But the document from attachment 83619 [details] is an HTML file, may be bug is related to Writer/Web only or that particular file.
Comment 4 Armin Le Grand 2014-07-03 09:15:36 UTC
Okay, tried the original description

> - from OpenOffice Writer/Web, export to PDF with Lossless Compression checked
My created pdf is 10.552.779 bytes

> - Save the file as ODT
'Save as' does not offer odt, how was this done? 'Export' allows odt, using this.

> - Open the ODT and break all links
This implies close and reopen the created odt. Doing this. Annotation: Is comparing the pdf created from HTML with this exported-to odf intended? OKay, breaking all links

> - Export to PDF
Also using 'loss less' -> This pdf has 2.305.285 bytes

Exporting the created odf with existing links creates the same pdf as the original file (10.552.779 bytes).

Checked the odt with links (call it A), as expected contains no graphics (not even a 'Pictures' folder)

Checked the test odt file with broken links (call it B), contains all used graphics in the 'Pictures' folder as expected. All graphics are there with their original type and size -> breaking links works flawlessly.

Checked export to PDF again
A -> 10.553.042 bytes
B -> 2.305.285 bytes

There definitely is a difference and it seems to be connected to the odf being created from a HTML file. As could be seen this does not happen with newly crated odt files. Maybe it has to do with the links being urls to the web, checking that.
Comment 5 Armin Le Grand 2014-07-03 09:29:07 UTC
Could create the same effect with creating new odt files with adding http://www.transformation.dk/chemtrails/Hvordan-man-systematisk-forgifter_html_m71cb576b.jpg (the 1st graphic frm the bugdoc) as linked graphic -> has to do with linked graphics

The pdf from the linked graphic contains:
% PDFWriterImpl::writeBitmapObject
<</Type/XObject/Subtype/Image/Width 500/Height 370/BitsPerComponent 8/Length 5 0 R

The pdf from the broken link graphic contains:
% PDFWriterImpl::writeJPG
<</Type/XObject/Subtype/Image/Width 500 /Height 370 /BitsPerComponent 8 /ColorSpace/DeviceRGB/Filter/DCTDecode/Length 14645>>

Thus the basic error seems to be: linked graphics with URLs to locations in the web do not export the original graphic.
Comment 6 Armin Le Grand 2014-07-03 12:29:20 UTC
The whole mechanism to have jpegs as lossless compression in the pdf export works based on GfxLink. This is *not* about the graphic being defined using a url or being linked as the name implies, but about the helper struct which is used to buffer the graphic in a temp file or locally allocated buffer.
When doing pdf export in draw/impress/calc (aka DrawingLayer GraphicObject) this is set and gets used. In Writer it's not set and not used, probably since Writer is more intelligent in handling external graphic links and thus not buffering them in temp files on the local machine. Due to that the original jpeg is not available and cannot be used.
Central points are:
vclmetafileprocessor2d.cxx:756, pdfextoutdevdata.cxx:404, and (for reference, no longer used) notxtfrm.cxx:828.
Checked again with AOO410, already there -> no regression.
To solve: Hard to say, somehow Writer should probably locally buffer linked graphics...
Comment 7 Armin Le Grand 2014-07-03 14:00:31 UTC
Added a workaround to Writer code to support this.

Requesting AOO411 flag: It's a safe fix and can reduce the size of exported PDFs by a factor of 10 for linked jpegs in Writer.
Comment 8 Armin Le Grand 2014-07-03 14:04:11 UTC
Created attachment 83634 [details]
reduced bugdoc

To make reviewing easier I add this reduced test document. With the fix this will create a PDF (use lossless compression) with ca. 16KB. Without it will be about 150KB.
Comment 9 SVN Robot 2014-07-03 14:07:32 UTC
"alg" committed SVN revision 1607649 into trunk:
i125171 support lossless embedding of linked jpegs in writer for PDF export
Comment 10 jsc 2014-07-03 15:18:09 UTC
grant showstopper flag
Comment 11 SVN Robot 2014-07-04 08:44:20 UTC
"alg" committed SVN revision 1607796 into branches/AOO410:
i125171 support lossless embedding of linked jpegs in writer for PDF export
Comment 12 Armin Le Grand 2014-07-04 09:38:36 UTC
Added to AOO411, done
Comment 13 SVN Robot 2014-07-07 08:04:20 UTC
"hdu" committed SVN revision 1608363 into branches/AOO410:
#i125171# fix build breaker in Writer's notxtfrm.cxx
Comment 14 fanyuzhen 2014-07-19 19:36:45 UTC
Sigmund Vestergaard has checked this on his Red Hat 64-bit, I change the bug status to Verified/Fixed based on his check result below:
" I installed the 4.1.1 M2 and used the document attached to the Bugzilla issue to test PDF export with lossless compression of images.

All the images were preserved fine in the PDF, so it looks like the bug has been fixed. "