Apache OpenOffice (AOO) Bugzilla – Issue 77679
export to PDF or HTML does not contain text
Last modified: 2013-02-07 22:39:33 UTC
From http://bugs.debian.org/425521: --- snip --- If you make an OO Impress document that contains an image with a superposed text box, then export it either to PDF or to HTML, the resulting document does not contain the text. (pdf2text, grep, etc find nothing, nor does Google.) Also, if you print to file and inspect the .ps, there is no text. (It contains an *image* of the text, so it looks sort of OK.) Problems: The document will not be indexed by Google because there is no text for them to find. Text quality is not very good because it's been converted to a bitmap. --- snip ---
In my mind it looks like a bug in the debian OOo build but nevertheless please send me a (OOo) bugdoc to reproduce the bug.
cgu: so you want to tell me that impress documents exported by the "normal" OOo to HTML or PDF have normal text for text instead of being an image? This bug is *not* about the text not appearing in the export... In any case, you don't have to ask *me* for the doc, but the bug submitter (see URL). I just forwarded this bug.
cgu: I just tried it with the official 2.2.1rc2. - add a graphic (be it some box) - add text (I'ven choosen the title and text style) and added Foo and Bar. export to HTML. You get a Table of Contents with that one slide and that slide has Foo and Bar as images.
cgu: Indeed, for PDF it works; also in "my" Verson (tried "my" 2.2.1~rc1). pdftotext properly gives Foo and Bar... Also with the 2.0.4 of the bug submitter. But for HTML, see my previous comment. It works when you don't have images on your slide, but not when you have them on your slide, and the bugs submitter wants the text then to be text, too. I've no idea how google tried to index the stuff in case of PDFs, though. Maybe something there is still broken?
"But for HTML, see my previous comment. It works when you don't have images on your slide, but not when you have them on your slide, and the bugs submitter wants the text then to be text, too." correction (hmm..): Tried it again with the "normal" OOo 2.2.1rc2 and even when only text is on the slide the export results in the slide being an image.
Export 'text only' pages as text and not as graphic is an enhancement.