Apache OpenOffice (AOO) Bugzilla – Issue 105243
PDF file size increased between OOo 3.1 and 3.1.1 when using "Losless compression"
Last modified: 2014-06-29 20:32:50 UTC
When I insert an 46kB JPG image into an ODT document and export this document as a PDF file, the filesize of the resulting PDF increases by 605kB. This only happens when using "Losless compression" in OOo 3.1.1, when using OOo 3.1 the filesize increases only by the JPGs size.
I'll attach a bugdoc.
Created attachment 64920 [details] Export this file to pdf with jpeg- & lossless-compression
Please, can you fix this issue in 3.2, because this is a regression and thus has a higher priority than P3 ? I am always using "without compression", when I need to combine a few scanned images into a pdf. Now when I paste 4 images each about 500kB in size, then the pdf becomes 7,5 MB in size. 3.1.0 version worked well in this case. When you paste a picture as link then both 3.1.0 and 3.1.1 enlarge the file size. When you change dimensions of a picture, this picture is enlarged in file size too. This is because the image is saved as a new picture without compression. But, when you paste the picture as not a link and do nothig with this picture, 3.1.1 should behave at least as 3.1.0 and not to change the file size. (This is a very high priority issue for me.)
In addition, sometimes pictures resized automatically to fit page margins. For example, this happens when a scanned image is larger than the page without margins. In this case, I usually make all the page margins equal to zero before pasting a picture (then you can move the picture to the center and export it to pdf in 3.1.0). In the attachment above it seems that the picture was automatically resized and 3.1.0 would behave the same way as 3.1.1 in this case.
Created attachment 64969 [details] compare 3.1.0 (or earlier) with 3.1.1 (description inside; Linux or Windows - no matter)
I ran into this problem as well. I publish my book using OpenOffice, and the file size went from 17 MB to over 68 MB. The bug happened in my upgrade from OO 3.1 to 3.11. Thanks for looking into this! I live in OO, probably more the developers! ;-)
In my business this fault means that we have to stop using PDF Export and start using a 3rd-party PDF Writer, to achieve the file sizes we need. As this is a regression - and an important one, because it torpedoes the credibility and usefulness of the PDF Export function - it should be rated as higher than a P3.
This issue is between OOO310m13 and OOO310m15. It is present in OOO310m16,OOO310m17, OOO310m18.
*** Issue 105970 has been marked as a duplicate of this issue. ***
Is there a chance that this issue will be fixed in the upcoming OOo 3.2?
yes, if you convince the good people on releases@openoffice.org mailing list that this is a showstopper.
pl->aw: Graphics that were originally jpeg graphics once were exported as the original jpeg data even in lossless compression case (it makes not much sense putting them into a bitmap and exporting that simply zlib compressed). This works by using PDFExtOutDevData::BeginGroup and PDFExtOutDevData::EndGroup( Graphic&, ... ) which "bypasses" the according DrawBitmap actions in the metafile. I see that in vclmetafileprocessor2d.cxx these should get called, however the debugger says this is not the case. could you please have a look ?
AW: In the MetaFileRenderer (VclMetafileProcessor2D::processBasePrimitive2D in drawinglayer) the case PRIMITIVE2D_ID_GRAPHICPRIMITIVE2D handles this as before in goodies/source/graphic/grfmgr.cxx: PDFExtOutDevDataSupport is created when the graphic is linked (rGraphic.IsLink() with rGraphic being a const Graphic&). This is not the case. AW->SJ: You should know better how a graphic represents that it is a linked graphic. Could You have a look why rGraphic.IsLink() is false with graphics which ought to be linked graphics?
From a wider perspective, this issue is connected with i15508. When you insert picture as link and export it to pdf, the file size increases because it is converted to png. In this case 3.1.0 and 3.1.1 behave the same way.
Yes, this seems to be a writer only problem, in Impress jpg graphics remain unchanged. The PDFExtOutDevDataSync::BeginGroup ... PDFExtOutDevDataSync::EndGroupGfxLink parentheses is missing.
sj->od: This is a writer only problem. Impress and Calc is "bypassing" the necessary information with "mpPDFExtOutDevData->BeginGroup()" and mpPDFExtOutDevData->EndGroup(rGraphicPrimitive.getGraphicObject().GetGraphic(), rAttr.GetTransparency(), aCurrentRect, aCropRect); in drawinglayer/source/processor2d/vclmetafileprocessor2d.cxx To preserve jpg graphics when creating a pdf the writer should do similar.
By fix for issue 101545 the needed "PDFExtOutDevDataSync::BeginGroup() and PDFExtOutDevDataSync::EndGroup(..) parentheses" are removed from method <GraphicObject::Draw(..)>. This causes this defect. Proposed solution: Undo the change of fix for issue 101545 and remove the "PDFExtOutDevDataSync::BeginGroup() and PDFExtOutDevDataSync::EndGroup(..) parentheses" from method <VclMetafileProcessor2D::processBasePrimitive2D(..)> OD->AW: Please review the proposed solution. Thx in advance.
AW->OD: No, this is not the correct solution from my POV. The BeginGroup/EndGroup parathneses for PDF export are needed (at least as long as this is done using a Metafile and not primitives), but GraphicObject::Draw(..) was and is not the correct place to do so. We have two different graphic objects in SW and the rest of the office. It may be that SW uses only GraphicObject::Draw(..) to draw it's graphic, but this is not the case for the DrawingLayer graphics. Dependent on if the transformation of the graphic uses a rotation or even a shear, there are three versions (see VclProcessor2D::RenderBitmapPrimitive2D): (1) RenderBitmapPrimitive2D_BitmapEx uses OutputDevice::DrawBitmapEx(..) (2) RenderBitmapPrimitive2D_GraphicManager uses GraphicObject::Draw(..) (3) RenderBitmapPrimitive2D_self credates an own pixel transformation to paint a created BitmapEx Method (3) e.g. is especially used for sheared graphics what is possible anytime with primitives. Method (2) when only rotated, and (1) when neither shear nor rotate. The embedding itself is not even based on BitmapPrimitive2D, but on GraphicPrimitive2D (which hold a Graphic), which is embedded into these parantheses by not using GraphicPrimitive2D but it's decomposition and whatever output method these will use (see 'case PRIMITIVE2D_ID_GRAPHICPRIMITIVE2D' in VclMetafileProcessor2D::processBasePrimitive2D). In short: Creating those parantheses should not be part of the Paint method, but around it's call. It should happen before and after the call to GraphicObject::Draw(..). The idea behind this is that there are more than this way (not in SW currently) to actually paint that GraphicObject's content. When doing the fix as proposed above, the Drawinglayer GraphicObjects (all except SW's) will be double embedded in these parantheses since the BeginGroup/EndGroup cannot be directly removed in 'case PRIMITIVE2D_ID_GRAPHICPRIMITIVE2D' because any of the three visualisations (and NOT only GraphicObject::Draw(..)) may be choosen in the decomposition. If You want, i could add that embedding in SW, but You should know better where to do it. The pattern is the same as in 'case PRIMITIVE2D_ID_GRAPHICPRIMITIVE2D'. HTH!
fixed in cws sw33bf02 - changed files: /goodies/inc/grfmgr.hxx, /goodies/source/graphic/grfmgr.cxx, /sw/source/core/doc/notxtfrm.cxx, /sw/source/core/layout/paintfrm.cxx, change set 8b6a4bda72ef
Sweet -- thanks od.
OD->HI: Checked in internal installation set of cws sw33bf02 - please verify.
Verified fix in CWS sw33bf02.
This bug is supposed to have been fixed in 2010 (Status: VERIFIED FIXED). Then why do I have the same problem now in June 2014, using Apache OO 4.1.0?
"Verified fix in CWS sw33bf02" means it was fixed and verified in the code branch that a developer has created. Maybe it was never integrated into the TRUNK and therefore never really into the OpenOffice code base. However, only a developer can check this but not me. @Armin: Please can you help here? Thanks.
Tip: search CWS in markmail with list:org.openoffice.cws-announce http://markmail.org/search/?q=list%3Aorg.openoffice.cws-announce+sw33bf02 The CWS was integrated in DEV300_m76 http://markmail.org/message/l3bjbtpmwoetm377 ]$ hg log --rev 8b6a4bda72ef changeset: 266571:8b6a4bda72ef user: Oliver-Rainer Wittmann <od@openoffice.org> date: Mon Jan 04 13:30:37 2010 +0100 files: goodies/inc/grfmgr.hxx goodies/source/graphic/grfmgr.cxx sw/source/core/doc/notxtfrm.cxx sw/source/core/layout/paintfrm.cxx description: sw33bf02: #i105243# Output of graphic in Writer with corresponding PDF handling If the bug is reproducible in AOO 4, then it is a regression; feel free to reopen.
Hi Thanks for the response! Yes, I can reproduce the problem in OO 4.1.0: I have a document with lots of jpg-photos. When I export it to PDF using the default "Lossless compression" the generated PDF-file is 4.5MB. But if I choose "JPEG-compression" and "Quality 100%", then the resulting PDF-file is 1MB. But I can't see a way to reopen this issue? -Henrik
It seems I've not the permission to reopen this issue. @Ariel: Do you have?
Created attachment 83618 [details] ZIP file with PDFs and JPGs The zip contains - original ODT from attachment 64969 [details] - the file exported to PDF with loss-less compression, using OOo 3.0.0, OOo 3.1.1, and 4.1.0 - JPG files extracted from the respective files. From the ODT file, just unzipped; from the PDF files, using pdfimages: pdfimages -j issue105243-aoo410.pdf issue105243-aoo410 pdfimages -j issue105243-ooo300.pdf issue105243-ooo300 pdfimages -j issue105243-ooo311.pdf issue105243-ooo311 AOO 4.1.0 keeps the original JPG file, just as OOo 3.0.0 The bug with OOo 3.1.1 cannot be reproduced in 4.1.0 This bug is fixed.
(In reply to Marcus from comment #27) > It seems I've not the permission to reopen this issue. Strange. IIRC using your apache.org address should give you those rights > @Ariel: Do you have? Yes, but the bug is fixed, at least tested with the two documents attached here. @henrik_roseno: can you extract the images from your PDF file and compare them with the ones in the OpenDocument file? Aren't they the same JPGs?
Created attachment 83619 [details] Try exporting the HTML-doc in this ZIP-archive to PDF Hi Right now I can only reproduce the error when working with an HTML-file (in OOo 4.1.0). So I have attached the HTML-file and its image-files. All in a ZIP-archive. When I PDF-export it with "Lossless compression" the result is 4.5MB. When I use "JPEG compression - quality 100%" the result is 1MB. BUT there is an additional problem!: In both PDF-files, the images have been reduced to a 'thumbnail-size'. If I load an ODT-document that contains image-files 'inside' the file itself and export it with the default "Lossless compression" and "JPEG compression - quality 100%" then suddently it works. The "Lossless compression" PDF-file is even considerably smaller than the "JPEG compression - quality 100%" PDF-file! Best regards, Henrik Rosenø
(In reply to henrik_roseno from comment #30) > Try exporting the HTML-doc in this ZIP-archive to PDF > > Hi > > Right now I can only reproduce the error when working with an HTML-file (in > OOo 4.1.0). I can reproduce the bug with OOo 3.0.0 and the current nightly build from the build bot. > So I have attached the HTML-file and its image-files. All in a > ZIP-archive. Note that your html file has <base href="http://www.transformation.dk/chemtrails/"> with this, OpenOffice will fetch all the images from that site, not relative to the HTML file. > When I PDF-export it with "Lossless compression" the result is > 4.5MB. I get the same result. The bug is that the images are linked. When exporting to PDF, the original JPG image format is not preserved. As a workaround, save the HTML file as ODT. With the ODT, go to menu Edit - Links, and break all the links. I get an ODT with 565 Kb; this file, exported to PDF, generates a 673 Kb PDF, with all the original JPG files retained. In short, this is another bug, and is not a regression (happens the same in 3.0.0 and 4.1.0 > The "Lossless compression" PDF-file > is even considerably smaller than the "JPEG compression - quality 100%" > PDF-file! This might be yet another bug.
(In reply to Ariel Constenla-Haile from comment #31) > In short, this is another bug, and is not a regression (happens the same in > 3.0.0 and 4.1.0 Now Issue 125171 - Linked JPG image not preserved when exported to PDF with loss-less compression
Created attachment 83621 [details] Cleaner version of the HTML-doc from: "Try exporting the HTML-doc in this ZIP-archive to PDF" Here is a cleaner version of the HTML-document, i.e. no 'base'-tag etc., but still 4.5MB PDF-file.
(In reply to henrik_roseno from comment #33) > Created attachment 83621 [details] > Cleaner version of the HTML-doc from: "Try exporting the HTML-doc in this > ZIP-archive to PDF" > > Here is a cleaner version of the HTML-document, i.e. no 'base'-tag etc., but > still 4.5MB PDF-file. Yes, it does not make any difference; the problem is with *linked* graphics, it does not matter if they are pointing to the www or relative to the document.