Issue 105243 - PDF file size increased between OOo 3.1 and 3.1.1 when using "Losless compression"
Summary: PDF file size increased between OOo 3.1 and 3.1.1 when using "Losless compres...
Status: CLOSED FIXED
Alias: None
Product: Writer
Classification: Application
Component: save-export (show other issues)
Version: OOO310m19
Hardware: All All
: P3 Trivial with 11 votes (vote)
Target Milestone: ---
Assignee: h.ilter
QA Contact: issues@sw
URL:
Keywords:
: 105970 (view as issue list)
Depends on:
Blocks:
 
Reported: 2009-09-21 15:16 UTC by simonst
Modified: 2014-06-29 20:32 UTC (History)
5 users (show)

See Also:
Issue Type: DEFECT
Latest Confirmation in: ---
Developer Difficulty: ---


Attachments
Export this file to pdf with jpeg- & lossless-compression (56.11 KB, application/vnd.oasis.opendocument.text)
2009-09-23 14:51 UTC, h.ilter
no flags Details
compare 3.1.0 (or earlier) with 3.1.1 (description inside; Linux or Windows - no matter) (31.50 KB, text/plain)
2009-09-25 14:24 UTC, fyva
no flags Details
ZIP file with PDFs and JPGs (383.92 KB, application/x-zip-compressed)
2014-06-27 18:41 UTC, Ariel Constenla-Haile
no flags Details
Try exporting the HTML-doc in this ZIP-archive to PDF (501.59 KB, application/x-zip-compressed)
2014-06-27 19:32 UTC, henrik_roseno
no flags Details
Cleaner version of the HTML-doc from: "Try exporting the HTML-doc in this ZIP-archive to PDF" (90.41 KB, text/html)
2014-06-28 16:07 UTC, henrik_roseno
no flags Details

Note You need to log in before you can comment on or make changes to this issue.
Description simonst 2009-09-21 15:16:40 UTC
When I insert an 46kB JPG image into an ODT document and export this document as
a PDF file, the filesize of the resulting PDF increases by 605kB. This only
happens when using "Losless compression" in OOo 3.1.1, when using OOo 3.1 the
filesize increases only by the JPGs size.
Comment 1 h.ilter 2009-09-23 14:49:36 UTC
I'll attach a bugdoc.
Comment 2 h.ilter 2009-09-23 14:51:52 UTC
Created attachment 64920 [details]
Export this file to pdf with jpeg- & lossless-compression
Comment 3 fyva 2009-09-25 12:35:59 UTC
Please, can you fix this issue in 3.2, because this is a regression and thus has
a higher priority than P3 ? I am always using "without compression", when I need
to combine a few scanned images into a pdf. Now when I paste 4 images each about
500kB in size, then the pdf becomes 7,5 MB in size. 3.1.0 version worked well in
this case.

When you paste a picture as link then both 3.1.0 and 3.1.1 enlarge the file
size. When you change dimensions of a picture, this picture is enlarged in file
size too. This is because the image is saved as a new picture without
compression. But, when you paste the picture as not a link and do nothig with
this picture, 3.1.1 should behave at least as 3.1.0 and not to change the file
size. (This is a very high priority issue for me.)
Comment 4 fyva 2009-09-25 12:54:54 UTC
In addition, sometimes pictures resized automatically to fit page margins. For
example, this happens when a scanned image is larger than the page without
margins. In this case, I usually make all the page margins equal to zero before
pasting a picture (then you can move the picture to the center and export it to
pdf in 3.1.0). In the attachment above it seems that the picture was
automatically resized and 3.1.0 would behave the same way as 3.1.1 in this case.
Comment 5 fyva 2009-09-25 14:24:54 UTC
Created attachment 64969 [details]
compare 3.1.0 (or earlier) with 3.1.1 (description inside; Linux or Windows - no matter)
Comment 6 keithcu 2009-09-27 20:34:39 UTC
I ran into this problem as well. I publish my book using OpenOffice, and the
file size went from 17 MB to over 68 MB. The bug happened in my upgrade from OO
3.1 to 3.11.

Thanks for looking into this! I live in OO, probably more the developers! ;-)
Comment 7 hnmcc 2009-09-28 22:22:48 UTC
In my business this fault means that we have to stop using PDF Export and start
using a 3rd-party PDF Writer, to achieve the file sizes we need.

As this is a regression - and an important one, because it torpedoes the
credibility and usefulness of the PDF Export function - it should be rated as
higher than a P3.
Comment 8 fyva 2009-10-07 12:02:31 UTC
This issue is between OOO310m13 and OOO310m15. It is present in
OOO310m16,OOO310m17, OOO310m18.
Comment 9 jbf.faure 2009-10-19 05:34:08 UTC
*** Issue 105970 has been marked as a duplicate of this issue. ***
Comment 10 simonst 2009-10-20 11:02:15 UTC
Is there a chance that this issue will be fixed in the upcoming OOo 3.2?
Comment 11 philipp.lohmann 2009-10-20 12:12:36 UTC
yes, if you convince the good people on releases@openoffice.org mailing list
that this is a showstopper.
Comment 12 philipp.lohmann 2009-10-21 18:32:35 UTC
pl->aw: Graphics that were originally jpeg graphics once were exported as the
original jpeg data even in lossless compression case (it makes not much sense
putting them into a bitmap and exporting that simply zlib compressed). This
works by using PDFExtOutDevData::BeginGroup and PDFExtOutDevData::EndGroup(
Graphic&, ... ) which "bypasses" the according DrawBitmap actions in the
metafile. I see that in vclmetafileprocessor2d.cxx these should get called,
however the debugger says this is not the case.

could you please have a look ?
Comment 13 Armin Le Grand 2009-10-28 13:11:39 UTC
AW: In the MetaFileRenderer (VclMetafileProcessor2D::processBasePrimitive2D in
drawinglayer) the case PRIMITIVE2D_ID_GRAPHICPRIMITIVE2D handles this as before
in goodies/source/graphic/grfmgr.cxx: PDFExtOutDevDataSupport is created when
the graphic is linked (rGraphic.IsLink() with rGraphic being a const Graphic&).
This is not the case.
AW->SJ: You should know better how a graphic represents that it is a linked
graphic. Could You have a look why rGraphic.IsLink() is false with graphics
which ought to be linked graphics?
Comment 14 fyva 2009-10-29 13:26:39 UTC
From a wider perspective, this issue is connected with i15508. When you insert
picture as link and export it to pdf, the file size increases because it is
converted to png. In this case 3.1.0 and 3.1.1 behave the same way.
Comment 15 sven.jacobi 2009-11-02 10:27:05 UTC
Yes, this seems to be a writer only problem, in Impress jpg graphics remain
unchanged. The PDFExtOutDevDataSync::BeginGroup ...
PDFExtOutDevDataSync::EndGroupGfxLink parentheses is missing.
Comment 16 sven.jacobi 2009-11-03 13:58:00 UTC
sj->od: This is a writer only problem. Impress and Calc is "bypassing" the
necessary information with "mpPDFExtOutDevData->BeginGroup()" and
mpPDFExtOutDevData->EndGroup(rGraphicPrimitive.getGraphicObject().GetGraphic(),
							rAttr.GetTransparency(),
							aCurrentRect,
							aCropRect);
in drawinglayer/source/processor2d/vclmetafileprocessor2d.cxx

To preserve jpg graphics when creating a pdf the writer should do similar.
Comment 17 Oliver-Rainer Wittmann 2009-12-04 15:43:29 UTC
By fix for issue 101545 the needed "PDFExtOutDevDataSync::BeginGroup() and
PDFExtOutDevDataSync::EndGroup(..) parentheses" are removed from method
<GraphicObject::Draw(..)>.
This causes this defect.

Proposed solution:
Undo the change of fix for issue 101545 and remove the
"PDFExtOutDevDataSync::BeginGroup() and PDFExtOutDevDataSync::EndGroup(..)
parentheses" from method <VclMetafileProcessor2D::processBasePrimitive2D(..)>

OD->AW: Please review the proposed solution. Thx in advance.
Comment 18 Armin Le Grand 2009-12-04 18:00:30 UTC
AW->OD: No, this is not the correct solution from my POV. The
BeginGroup/EndGroup parathneses for PDF export are needed (at least as long as
this is done using a Metafile and not primitives), but GraphicObject::Draw(..)
was and is not the correct place to do so.

We have two different graphic objects in SW and the rest of the office. It may
be that SW uses only GraphicObject::Draw(..) to draw it's graphic, but this is
not the case for the DrawingLayer graphics. Dependent on if the transformation
of the graphic uses a rotation or even a shear, there are three versions (see
VclProcessor2D::RenderBitmapPrimitive2D):

(1) RenderBitmapPrimitive2D_BitmapEx uses OutputDevice::DrawBitmapEx(..)
(2) RenderBitmapPrimitive2D_GraphicManager uses GraphicObject::Draw(..)
(3) RenderBitmapPrimitive2D_self credates an own pixel transformation to paint a
created BitmapEx

Method (3) e.g. is especially used for sheared graphics what is possible anytime
with primitives. Method (2) when only rotated, and (1) when neither shear nor
rotate.

The embedding itself is not even based on BitmapPrimitive2D, but on
GraphicPrimitive2D (which hold a Graphic), which is embedded into these
parantheses by not using GraphicPrimitive2D but it's decomposition and whatever
output method these will use (see 'case PRIMITIVE2D_ID_GRAPHICPRIMITIVE2D' in
VclMetafileProcessor2D::processBasePrimitive2D).

In short: Creating those parantheses should not be part of the Paint method, but
around it's call. It should happen before and after the call to
GraphicObject::Draw(..). The idea behind this is that there are more than this
way (not in SW currently) to actually paint that GraphicObject's content.

When doing the fix as proposed above, the Drawinglayer GraphicObjects (all
except SW's) will be double embedded in these parantheses since the
BeginGroup/EndGroup cannot be directly removed in 'case
PRIMITIVE2D_ID_GRAPHICPRIMITIVE2D' because any of the three visualisations (and
NOT only GraphicObject::Draw(..)) may be choosen in the decomposition.

If You want, i could add that embedding in SW, but You should know better where
to do it. The pattern is the same as in 'case PRIMITIVE2D_ID_GRAPHICPRIMITIVE2D'.

HTH!
Comment 19 Oliver-Rainer Wittmann 2010-01-04 14:06:25 UTC
fixed in cws sw33bf02 - changed files:
/goodies/inc/grfmgr.hxx,
/goodies/source/graphic/grfmgr.cxx,
/sw/source/core/doc/notxtfrm.cxx,
/sw/source/core/layout/paintfrm.cxx,
change set 8b6a4bda72ef
Comment 20 keithcu 2010-01-04 21:51:08 UTC
Sweet -- thanks od.
Comment 21 Oliver-Rainer Wittmann 2010-02-04 13:39:18 UTC
OD->HI: Checked in internal installation set of cws sw33bf02 - please verify.
Comment 22 michael.ruess 2010-02-16 14:25:57 UTC
Verified fix in CWS sw33bf02.
Comment 23 henrik_roseno 2014-06-26 16:13:12 UTC
This bug is supposed to have been fixed in 2010 (Status: VERIFIED FIXED). Then why do I have the same problem now in June 2014, using Apache OO 4.1.0?
Comment 24 Marcus 2014-06-26 19:13:58 UTC
"Verified fix in CWS sw33bf02" means it was fixed and verified in the code branch that a developer has created. Maybe it was never integrated into the TRUNK and therefore never really into the OpenOffice code base. However, only a developer can check this but not me.

@Armin: Please can you help here? Thanks.
Comment 25 Ariel Constenla-Haile 2014-06-26 21:32:59 UTC
Tip: search CWS in markmail with list:org.openoffice.cws-announce

http://markmail.org/search/?q=list%3Aorg.openoffice.cws-announce+sw33bf02

The CWS was integrated in DEV300_m76 http://markmail.org/message/l3bjbtpmwoetm377

]$ hg log --rev 8b6a4bda72ef
changeset:   266571:8b6a4bda72ef
user:        Oliver-Rainer Wittmann <od@openoffice.org>
date:        Mon Jan 04 13:30:37 2010 +0100
files:       goodies/inc/grfmgr.hxx goodies/source/graphic/grfmgr.cxx sw/source/core/doc/notxtfrm.cxx sw/source/core/layout/paintfrm.cxx
description:
sw33bf02: #i105243# Output of graphic in Writer with corresponding PDF handling

If the bug is reproducible in AOO 4, then it is a regression; feel free to reopen.
Comment 26 henrik_roseno 2014-06-27 08:38:31 UTC
Hi

Thanks for the response! 

Yes, I can reproduce the problem in OO 4.1.0:

I have a document with lots of jpg-photos. When I export it to PDF using the default "Lossless compression" the generated PDF-file is 4.5MB. But if I choose "JPEG-compression" and "Quality 100%", then the resulting PDF-file is 1MB. 

But I can't see a way to reopen this issue?

-Henrik
Comment 27 Marcus 2014-06-27 17:46:05 UTC
It seems I've not the permission to reopen this issue.

@Ariel: Do you have?
Comment 28 Ariel Constenla-Haile 2014-06-27 18:41:05 UTC
Created attachment 83618 [details]
ZIP file with PDFs and JPGs

The zip contains 

- original ODT from attachment 64969 [details]
- the file exported to PDF with loss-less compression, using OOo 3.0.0, OOo 3.1.1, and 4.1.0
- JPG files extracted from the respective files. From the ODT file, just unzipped; from the PDF files, using pdfimages:

pdfimages -j issue105243-aoo410.pdf issue105243-aoo410
pdfimages -j issue105243-ooo300.pdf issue105243-ooo300
pdfimages -j issue105243-ooo311.pdf issue105243-ooo311

AOO 4.1.0 keeps the original JPG file, just as OOo 3.0.0
The bug with OOo 3.1.1 cannot be reproduced in 4.1.0
This bug is fixed.
Comment 29 Ariel Constenla-Haile 2014-06-27 18:44:36 UTC
(In reply to Marcus from comment #27)
> It seems I've not the permission to reopen this issue.

Strange. IIRC using your apache.org address should give you those rights

> @Ariel: Do you have?

Yes, but the bug is fixed, at least tested with the two documents attached here.

@henrik_roseno: can you extract the images from your PDF file and compare them with the ones in the OpenDocument file? Aren't they the same JPGs?
Comment 30 henrik_roseno 2014-06-27 19:32:52 UTC
Created attachment 83619 [details]
Try exporting the HTML-doc in this ZIP-archive to PDF

Hi

Right now I can only reproduce the error when working with an HTML-file (in OOo 4.1.0). So I have attached the HTML-file and its image-files. All in a ZIP-archive. When I PDF-export it with "Lossless compression" the result is 4.5MB. When I use "JPEG compression - quality 100%" the result is 1MB. 

BUT there is an additional problem!: In both PDF-files, the images have been reduced to a 'thumbnail-size'.

If I load an ODT-document that contains image-files 'inside' the file itself and export it with the default "Lossless compression" and "JPEG compression - quality 100%" then suddently it works. The "Lossless compression" PDF-file is even considerably smaller than the "JPEG compression - quality 100%" PDF-file!

Best regards,
Henrik Rosenø
Comment 31 Ariel Constenla-Haile 2014-06-27 21:54:58 UTC
(In reply to henrik_roseno from comment #30)
> Try exporting the HTML-doc in this ZIP-archive to PDF
> 
> Hi
> 
> Right now I can only reproduce the error when working with an HTML-file (in
> OOo 4.1.0). 

I can reproduce the bug with OOo 3.0.0 and the current nightly build from the build bot.

> So I have attached the HTML-file and its image-files. All in a
> ZIP-archive. 

Note that your html file has

<base href="http://www.transformation.dk/chemtrails/">

with this, OpenOffice will fetch all the images from that site, not relative to the HTML file.

> When I PDF-export it with "Lossless compression" the result is
> 4.5MB.

I get the same result.
The bug is that the images are linked.
When exporting to PDF, the original JPG image format is not preserved.

As a workaround, save the HTML file as ODT.
With the ODT, go to menu Edit - Links, and break all the links.

I get an ODT with 565 Kb; this file, exported to PDF, generates a 673 Kb PDF, with all the original JPG files retained. 

In short, this is another bug, and is not a regression (happens the same in 3.0.0 and 4.1.0

> The "Lossless compression" PDF-file
> is even considerably smaller than the "JPEG compression - quality 100%"
> PDF-file!

This might be yet another bug.
Comment 32 Ariel Constenla-Haile 2014-06-27 22:06:07 UTC
(In reply to Ariel Constenla-Haile from comment #31)
> In short, this is another bug, and is not a regression (happens the same in
> 3.0.0 and 4.1.0

Now Issue 125171 - Linked JPG image not preserved when exported to PDF with loss-less compression
Comment 33 henrik_roseno 2014-06-28 16:07:31 UTC
Created attachment 83621 [details]
Cleaner version of the HTML-doc from: "Try exporting the HTML-doc in this ZIP-archive to PDF"

Here is a cleaner version of the HTML-document, i.e. no 'base'-tag etc., but still 4.5MB PDF-file.
Comment 34 Ariel Constenla-Haile 2014-06-29 20:32:50 UTC
(In reply to henrik_roseno from comment #33)
> Created attachment 83621 [details]
> Cleaner version of the HTML-doc from: "Try exporting the HTML-doc in this
> ZIP-archive to PDF"
> 
> Here is a cleaner version of the HTML-document, i.e. no 'base'-tag etc., but
> still 4.5MB PDF-file.

Yes, it does not make any difference; the problem is with *linked* graphics, it does not matter if they are pointing to the www or relative to the document.