Issue 10384

Summary: filter to import pdf files
Product: Writer Reporter: Unknown <non-migrated>
Component: open-importAssignee: ooo
Status: CLOSED DUPLICATE QA Contact: issues@sw <issues>
Severity: Trivial    
Priority: P3 CC: ahz001, andre.schnabel, daniele, deepcerulean, discoleo, don.troodon, drichard, gleppert, gschintgen, haui, issues, jeanweber, jian.li, kamataki, kami911, kpalagin, mantas, masaya.k, Mathias_Bauer, mfedyk, neteler, norbert.notz, obsazeny.cz, oo, openoffice, pagalmes.lists, petertrypsteen, r0polach, rpolach, stp, sundman, t.c.black, yoshimit
Version: OOo 1.1Keywords: rfe_eval_ok, usability
Target Milestone: ---   
Hardware: All   
OS: All   
Issue Type: ENHANCEMENT Latest Confirmation in: ---
Developer Difficulty: ---

Description Unknown 2003-01-01 20:57:09 UTC
i think that a filter to import a pdf file into the wordprocessor would be a 
wonderful feature (or a simple converter from pdf to the native openoffice file 
format).
this feature is currently missing in microsoft office, so openoffice could 
attract more people!
thanks.
Comment 1 h.ilter 2003-01-07 16:44:35 UTC
Reassigned to BH
Comment 2 eric.savary 2003-04-16 15:33:48 UTC
Set to "NEW"
Comment 3 Unknown 2003-04-22 18:27:22 UTC
I've just read that the new Office 2003 is going to implement this 
feature....

"ScanSoft and Microsoft have teamed to bring you a new plug-in for 
Microsoft Office 2003, one that allows you to instantly convert PDF 
into editable documents directly from within Microsoft Word 2003 - 
complete with the layout of the original. The ScanSoft PDF Converter 
for Microsoft Word unlocks the information trapped in PDF files, 
separating text from graphics, tables and columns. Now you can re-use 
information in PDF documents that you download from the Web or 
receive as email attachments. (No other software is required.)"
Comment 4 blum 2003-05-31 21:32:38 UTC
import text, vector- and pixelimages in draw, so you can use -import-
it in all applications
Comment 5 tamblyne 2003-08-18 21:37:39 UTC
*** Issue 18149 has been marked as a duplicate of this issue. ***
Comment 6 andreschnabel 2003-08-19 21:42:17 UTC
updated some status-infos
Comment 7 andreschnabel 2003-08-19 21:43:53 UTC
*** Issue 15884 has been marked as a duplicate of this issue. ***
Comment 8 bettina.haberer 2003-10-13 11:32:21 UTC
There os no chance to implement this feature in OO.o 2.0.
Comment 9 lohmaier 2003-11-19 19:22:11 UTC
*** Issue 22488 has been marked as a duplicate of this issue. ***
Comment 10 oc 2004-01-08 11:35:49 UTC
*** Issue 24182 has been marked as a duplicate of this issue. ***
Comment 11 rcabane 2004-02-07 09:25:26 UTC
Similar function in Koffice (although rudimentary).
Really attractive.

Comment 12 flibby05 2004-04-30 16:39:34 UTC
*** Issue 28575 has been marked as a duplicate of this issue. ***
Comment 13 sgautier.ooo 2004-09-12 11:40:43 UTC
reassigning & adding keywords according to new RFE process - sophie
Comment 14 lohmaier 2004-09-17 01:24:39 UTC
*** Issue 27834 has been marked as a duplicate of this issue. ***
Comment 15 glu 2004-10-29 04:59:09 UTC
Hi,

  I thought that it's illegal to make PDF files *editable* in software without
Adoble's special permission, because it declared already in PDF reference.

  I think for both proprietary and open source software, intellectual
proprietary should be payed enough attention to and respected

-Gavin

http://partners.adobe.com/asn/tech/pdf/specifications.jsp

PDF Reference, version 1.5 by Adobe

1 Introduction

1.4 Intellectual Property

The general idea of using an interchange format for electronic documents is in
the public domain. Anyone is free to devise a set of unique data structures and
operators that define an interchange format for electronic documents. However,
Adobe Systems Incorporated owns the copyright for the particular data structures
and operators and the written specification constituting the interchange format
called the Portable Document Format. Thus, these elements of the Portable
Document Format may not be copied without Adobe’s permission.

Adobe will enforce its copyright. Adobe’s intention is to maintain the integrity of
the Portable Document Format standard. This enables the public to distinguish
between the Portable Document Format and other interchange formats for electronic
documents. However, Adobe desires to promote the use of the Portable
Document Format for information interchange among diverse products and
applications. Accordingly, Adobe gives anyone copyright permission, subject to
the conditions stated below, to:

• Prepare files whose content *conforms* to the Portable Document Format

• Write drivers and applications that *produce* output represented in the Portable
Document Format

• Write software that accepts input in the form of the Portable Document Format
and *displays*, *prints*, or otherwise *interprets* the contents

• Copy Adobe’s copyrighted list of data structures and operators, as well as the
example code and PostScript language function definitions in the written
specification, to the extent necessary to use the Portable Document Format for
the purposes above

The conditions of such copyright permission are:

• Authors of software that accepts input in the form of the Portable Document
Format must make reasonable efforts to ensure that the software they create re-
spects the access permissions and permissions controls listed in Table 3.20 of
this specification, to the extent that they are used in any particular document.

• Anyone who uses the copyrighted list of data structures and operators, as stated
above, must include an appropriate copyright notice.

• Accessing the document in ways not permitted by the document’s access permissions
is a violation of the document author’s copyright.

This limited right to use the copyrighted list of data structures and operators does
not include the right to copy this book, other copyrighted material from Adobe,
or the software in any of Adobe’s products that use the Portable Document Format,
in whole or in part, nor does it include the right to use any Adobe patents,
except as may be permitted by an official Adobe Patent Clarification Notice (see
the Bibliography).

Acrobat, Acrobat Capture, Adobe Reader, ePaper, the “Get Adobe Reader” Web
logo, the “Adobe PDF” Web logo, and all other trademarks, service marks, and
logos used by Adobe (the “Marks”) are the registered trademarks or trademarks of
Adobe Systems Incorporated in the United States and other countries. Nothing in
this book is intended to grant you any right or license to use the Marks for any
purpose. 
Comment 16 malvineous 2004-10-30 02:57:39 UTC
It shouldn't be 'illegal' to modify PDF files - the excerpt you quoted says you
should respect the access permissions of the file.  This means that a PDF marked
as read-only is not meant to be altered, but a PDF file without this restriction
is fine to edit.
Comment 17 michael.ruess 2005-05-08 14:55:25 UTC
*** Issue 48727 has been marked as a duplicate of this issue. ***
Comment 18 michael.ruess 2005-05-12 07:29:22 UTC
*** Issue 49121 has been marked as a duplicate of this issue. ***
Comment 19 steve2005 2005-12-10 07:59:04 UTC
The above Adobe quote above by Gavin (didn't find it at that link though) is all
about what a person may do with Adobe code (ie their copyrighted list of data
structures and operators). However nobody can copyright an idea. If someone
comes up with their own code that functionally does the same thing, then that's
not a copy and copyright law (worldwide) does not apply. You can patent an idea
but that is always on a case by case basis and is significantly different from
copyright.

Since Ghostscript (http://en.wikipedia.org/wiki/Ghostscript) can view pdf files,
I think it is extremely likely that Adobe can not stop openoffice from adding
this functionality. Plus if a company prevents interoperability then it will
likely run afoul of "Exclusive Dealing, Tied Selling, Market Restriction and
Abuse of Dominant Position" laws that exist in most countries.

Everything adobe says above is nonsense*. For the purposes of creating
openoffice functionality everything adobe says above can be ignored on legal
basis. Adobe may still be able to prevent whatever they dislike from happening
by throwing lawyers and money at it. (But with any court case, you don't need to
be in the legal right, you just need to force the other guy to go away.)

*  Except circumventing DRM which can be illegal under the DMCA (not copyright)
But most countries don't have a law like the DMCA.

My point: Other open source projects can open pdf files, so why not openoffice?
In the material quoted above Adobe is claiming that copyright does more than it
can do anywhere in the world. All references to copyright in Gavin's Adobe quote
are wrong. That being said legality is always an issue of what laws are relevant
in what jurisdiction, and openoffice won't be used in just one country.
Comment 20 kami911 2006-03-07 22:10:53 UTC
Editing/importing PDF files is an interesting unique feature that request many
users...
Comment 21 stp 2006-03-10 08:11:36 UTC
*** Issue 45452 has been marked as a duplicate of this issue. ***
Comment 22 lohmaier 2006-04-12 11:07:32 UTC
*** Issue 64318 has been marked as a duplicate of this issue. ***
Comment 23 lohmaier 2006-04-12 11:09:21 UTC
*** Issue 46047 has been marked as a duplicate of this issue. ***
Comment 24 oopser 2006-06-06 17:32:41 UTC
from the link above - http://partners.adobe.com/public/developer/pdf/
index_reference.html , we see the following text:

 The PDF Reference provides a description of the Portable Document Format and is 
intended for application developers wishing to develop applications that create 
PDF files directly, as well as read or modify PDF document content.


So quite clearly there are no issues with Adobe in writing software to import a 
pdf and modifying it, other than that end-users respect the copyright of any 
document they may choose to edit.

Scribus does do this reasonably well but I wish to work with OpenOffice.

In view of the increased use of pdf it would be very nice to have the capability 
to import pfd into OpenOffice.

Thx

Comment 25 aziem 2006-09-24 17:51:31 UTC
*** Issue 69812 has been marked as a duplicate of this issue. ***
Comment 26 drichard 2006-11-08 18:38:58 UTC
Adding myself to the CC list.

This feature absolutely would be wonderful.   I'm requesting that in the short
term, that OOo not allow people to even view the source code of the PDF.  The OS
knows what the file is and should just skip the import process altogether.  We
get a high number of support calls from users that open PDFs with OOo, only to
see what appears to be garbage to them.
Comment 27 norbert2 2006-12-05 19:43:37 UTC
Have you ever tried this feature in WordPerfect or KOffice? The resulting
layouts have really nothing to do with the source PDF. This is due to the
differences of word processors and page description languages (PDF, postscript).

Please do not expect the functionality of a PDF editor!!!

So in my opinion it is wasted time to implement such a feature. The only thing
it could be used for is to have pictures and text extracted of PDFs for usage in
Writer documents.
Comment 28 paul_e_t 2007-01-05 17:16:20 UTC
. . . . .The functionality of a "PDF Editor" is not needed.  What is sorely 
needed is the ability to import a xx.pdf document (into OOo Writer) so that the 
information (text, graphics, formats) may be written/saved/edited as an OOo 
document (not PDF).
. . . . .The Portable Document Format is NOT as portable as one may believe.  
The worst event a writer can run into in a search is to get/hit-on information 
in PDF format.  What happens here is that one can only immediately read or save 
(in PDF since export is limited) the document. This format is bloated, 
restrictive, a thought interruption, portable only between OS file systems, not 
usable (portable) in word-processing programs generally and then only if one 
has the necessary escape to a proprietary image editor/reader.  The format was 
never intended to: ....
. . . . .The single most portable format now is/has been, TEXT !
During the APRAnet days the DOD needed.... let a contract to .... then 
developed the spec to .... out of .... came ... so that documents could be 
transmitted to other event sites so that information could be distributed in 
real time to needed national defense agencies for ....  Let there be a PDF. Who 
owns the format specified by, purchased and developed by the American 
Taxpayer ???  Modern day computers and OS's are so standard and open (except 
for a few hang in's) that, the best that, could happen is for the unnecessary 
format to just go away and disappear ... PDF quietly just go away. Develope 
better Open Systems Format (OSF), text Word Processing Systems that can handle 
text intermixed with graphics.  Hello OOo good-bye PDF :>).
PaulT


Comment 29 gar37bic 2007-01-19 17:49:20 UTC
We have clients who want certain .pdf files delivered to them as .doc files, so
they can edit them with the tools they are used to.  We plan to support several
other formats as well.  This is planned as a major new feature of our financial
data reporting service, with significant revenue implications.

I need to complete a working PHP-based application to do this within a week.  I
had thought that OOo already supported this, so I'm already committed to finding
a solution.  I had planned to use OOo in headless mode, calling it from a PHP
script.  I've tried pdftohtml, but my build seems to be broken, at least so far
- it publishes essentially empty HTML pages.  pdftk slices and dices PDF but
doesn't output anything else.  pdf2ps and similar programs seem only to produce
images of the pdf file, not at all what I need.
Comment 30 dprina 2007-01-19 18:54:05 UTC
you can use, for example, xpdf or pdttohtml.
Here you can find some interesting link for programs that manage .PDF files
(sorry, comments are Italian language only):

http://linguistico.sf.net/wiki/doku.php?id=software_libero:documenti_formati#visualizzatori_convertitori_documenti_pdf

Else you can use Abiword or Kword (I don't remember who) that can convert a .PDF
and let you modify it.

Ciao
Davide
Comment 31 norbert2 2007-01-21 16:52:06 UTC
If at all, I think PDF import for Draw would make more sense than for Writer
since the structure of PDFs (absolute positioned objects, discontiguous text
area) should be better suitable for draw.
Comment 32 norbert2 2007-01-21 16:52:18 UTC
If at all, I think PDF import for Draw would make more sense than for Writer
since the structure of PDFs (absolute positioned objects, discontiguous text
areas) should be better suitable for draw.
Comment 33 lohmaier 2007-02-17 14:28:24 UTC
*** Issue 73166 has been marked as a duplicate of this issue. ***
Comment 34 lohmaier 2007-02-17 14:31:32 UTC
*** Issue 71813 has been marked as a duplicate of this issue. ***
Comment 35 snmishra 2007-04-02 20:18:16 UTC
Jarnal ( http://www.dklevine.com/general/software/tc1000/jarnal.htm ) takes an
interesting approach to importing PDF, which get's most of the functionality one
needs. It imports PDF document as a readonly background layer and you can write
on top of it. Even "just" this feature would be neat in that it would allow one
to annotate PDF files using openoffice.
Comment 36 ggs 2007-04-04 10:48:16 UTC
Yes, it would be great if OpenOffice could edit PDFs just like Word documents.
But that would be overly complex. Indeed, what many of the commenters seem to
ignore is the fundamental difference between a page description language (PDF)
which lacks all of the structural information and a word processing document.

What is however realistic is using PDFs as *graphics*. This is a common practice
in the (pdf)LaTeX world. Apparently (I'm basing this on a comment on the scribus
bug tracker), PDF starting at version 1.4 allows for such a placement of PDFs in
a PDF. This would be useful in at least the following two ways:

* use PDF as a very complete and universally compatible (but non-modifiable)
vector graphics format. (It can easily be generated by any program and then
cropped to the right dimensions)

* use full page PDFs as background and modify the contents by overpainting or
adding comments, drawings, etc.

(Of course, this would probably entail the inclusion of a pdf rendering library
for printing and displaying.)
Comment 37 gvsa123 2007-04-29 17:08:46 UTC
Hello, 
I'm no programmer so pardon me.
I think that the issue with PDF's in OpenOffice has been made complicated by 
the technical aspects... I think what end-users only want is to be able to 
read/view PDF's with a program associated with OpenOffice - I guess that would 
suffice. 

From the previous comments, it seems to me that integrating a program to 
view/edit PDF's is not congruent with how OO was written, coming from a 
programmers perspective. I guess that means there is a problem with having an 
interoperability between the proposed PDF viewer/editor and Writer, for 
example, in a way similar to the interoperability of Writer and Impress.

I mentioned above that association is the only need... to put it simply, 
imagine renaming Adobe Acrobat Reader into something like OpenOffice Portable 
and include it in the download and installation of OpenOffice, to be used as an 
application within the suite to view/annotate/highlight/whatever (as opposed 
to "editing") PDF's, but not necessarily written to interoperate with like 
Writer, Impress, and Draw do.

If I understood it correctly, what is feasible is being able to import/convert 
PDF's into Writer "editable" format (odt, doc, etc..). That's when the 
interoperability would come in - when a user wants to edit/modify (different 
from simply making highlights, comments, etc...) the PDF document.

But as far as "Adobe Acrobat Reader functionality" is concerned, I do not see 
how it would be impossible for developers to create an application that is 
associated with OpenOffice that would allow users to view/annotate PDF's like 
how they do with Adobe, Foxit, etc...

Am I coming across?
Comment 38 peetje_berg 2007-06-19 08:32:15 UTC
i think a filter to import a file into the wordprocessor is a great suggestion 
i completly disagree with gvsa123 to only read/view a pdf is enough (acrobat 
reader is free so what's the point for creating it)
Comment 39 dudson 2007-06-28 13:42:51 UTC
Importing PDF's are a very good idea. MS Office can import PDF and Microsoft 
and Adobe are two separately companies. 
Besides if pdf is being imported (therefor converted into open document 
format), it isn't a pdf anymore. 
When you're modyfing imported pdf's they are no longer pdf's! In pdf's are 
restrictions from the person who has written the pdf files, if those 
restrictions are being followed, then there is no problem.

I strongly agree with peetje berg, looking of pdf's sounds a job for Adobe 
Reader.
Comment 40 norbert2 2007-08-14 20:22:47 UTC
I have read in the news, that an import filter for draw is in development. So
shouldn't the component of this issue be changed to Draw!?
Comment 41 discoleo 2007-10-07 17:00:43 UTC
There are 3 use cases for a PDF-input filter. To better understand what should
be developed, I will address these use cases:

A.) TEXT-IMPORT
B.) TEXT+LAYOUT IMPORT (non-editable)
C.) TEXT+LAYOUT (editable)

A.) TEXT-IMPORT
    ===========
Sometimes, people want to import the text-stream to edit it in their preferred
program and use it in their own work. In these instances, the exact layout is
not that important, and what the import filter should do is:
 - generate a continuous text stream
   [i.e. NOT just every line terminated by CR/LF,
    like the current Adobe Acrobat select tool]
 - optimally detect some text-structure:
    -- like sub-/super-script
    -- paragraphs
    -- tables

B.) TEXT+LAYOUT
    ===========
Users need sometimes to complete a document/form. Often, governments and other
institutions publish official documents in pdf-format (simple PDFs, NOT
pdf-forms), BUT one cannot add any text to these pdf documents.

The import-filter should therefore:
 - import the pdf (both text-streams and layout) as a background
 - users shall be able to write in the foreground,
   over the background document
  [however, this should be handled better than pasting
   the pdf-document as an image and writing over an image]
   -- it should be possible to position the cursor on the baseline
      of a text-line, so that newly written text fits the existing text
   -- the tool should detect existing text-box boundaries,
      so that one can write new text extending from those boundaries
 - optimally, some minimal 'pdf-editing' features should be possible
   -- move whole sections, e.g. if the new text does NOT fit in the
      existing free space, move the whole section downwards

C.) TEXT+LAYOUT (fully editable)
    ===========
Of course, this would be a nice feature, BUT - considering the pdf format -,
seems a little bit elusive.

However, pdf-documents saved by OOo should contain additional information, that
should allow importing them in OOo in a fully editable state. At least
OOo-generated documents shall allow this editing mode.


CONCLUSIONS
===========
There is a new wiki page documenting progress:
see http://wiki.services.openoffice.org/wiki/Writer/ToDo/PDF_Import

I think, further comments should go to the wiki-page.

PS: added myself to the cc-list
Comment 42 norbert2 2007-10-07 17:51:43 UTC
@discoleo:

(A) TEXT IMPORT:
I think copy-paste from acrobat is enough.


"pdf-documents saved by OOo should contain additional information,"
This is not needed because of the "hybrid" PDFs that are announced to be
introduce with OOo 3.0.
Comment 43 discoleo 2007-10-07 18:24:38 UTC
discoleo->norbert2

> (A) TEXT IMPORT:
> I think copy-paste from acrobat is enough.

I am very disappointed of the Acrobat copy/paste mechanism. There are some
critical problems, at least in Acrobat 7:
 - every individual line is exported with a CR/LF at the line end
   -- therefore, paragraphs are broken in individual lines
      and one cannot import a whole text stream
   -- with a 100+ line document, this is NO longer a trivial task
     [exporting as txt + gawk processing removes every formatting]
 - various formattings, like sub-/super-script and underline
   don't work at all
 - lists (maybe more difficult to implement)
 - other layout issues (paragraphs/tables) need significant improvement
 - import graphical elements, too
   -- e.g. images between text-streams (both graphic + text stream)
   -- while, of course the text stream is more important, there are often
      situations, where one needs the images, too, and here Acrobat fails
      completely
Comment 44 Martin Hollmichel 2008-01-05 16:57:02 UTC
since this is work in progress reassign to a real owner.
Comment 45 sergiocallegari 2008-01-25 13:42:32 UTC
The A, B, C cases are forgetting the situation where one wants to use PDF as a
simple "read-only" vector graphics format, like one does with pdfLaTeX (unless
this is incorporated in case B).

Cases A, B and C add many complications to this (rather simple) goal due to the
need to extract at least some elementary layout information.  Hence, in my
opinion, trying to satisfy the goals of A, B, and C, has the rather high price
of keeping on postponing the much more simplistic usage of PDF import above.

At first sight the functionality might appear unnecessary.  The very PDF-Import
wiki, for instance says:

"For now we uses a workaround making from the PDF-page a EPS with tiff-preview
who can now be placed in a OO-doc. The previeuw is low-resolution but the EPS is
printed in its original resolution" seeming to suggest that "for now" this is
more than acceptable.

In my opinion it is not, for a few reasons:

1) It is not true that the EPS is printed in its original resolution.  At least
not always.  If you try to print it on "virtual paper" via an export to PDF ->
distribution as a PDF -> print from acrobat reader or xpd, you will see that the
eps image gets in the exported PDF as a low resolution bitmap. Too sad.

2) If you want to import a PDF image with no desire to edit it, but just to use
it in a presentation (impress) then you get an extremely poor presentation.

3) A very easy way to get a hi-quality equation editor in openoffice would be to
use LaTeX as a formatting engine, to produce pdf versions of the formulas.  But
this is not possible now since we have no idea at how to import the resulting
PDF. And so the creation of smart packages like eqe or OOoLaTeX is hindered by
the fact that they need to produce bitmaps or at best emf files (that have a
thousand font-related problems, since they do not embed neither fonts nor
encodings).

4) A very easy way to get a hi-quality image generator in openoffice would be to
use Asymptote as a code-2-image translator.  However asymptote outputs pdf, so
we have the same problem as above (just worsened since Asy cannot output emf).

So please, even before thinking at A, B and C, give us a plain hi-quality
readonly importer for PDF graphics (e.g. store PDF in the opendocument files,
render it on screen via xpdf, translate it into postscript for printing via xpdf
again).
Comment 46 tcblack 2008-03-05 16:31:06 UTC
I'm curious if the PoDoFo library may be of assistance here:
http://podofo.sourceforge.net/  (LGPL and GPL licensing)
I note that Scribus http://rants.scribus.net/2006/12/10/pdf-surgery/ is
interested as well.
Comment 47 Mathias_Bauer 2008-04-21 11:45:50 UTC
Kai, I'm a bit puzzled about how we should handle this issue.
We will have PDF import in 3.0, but as we wanted to weigh layout stability
higher than post editing capabilities in the first version we decided to import
the PDF as drawing documents containing text frames.

I'm sure that this is enough already for most of the users requiring PDF import.
Reading the comments in this issue strongly supports this, only a few scenarios
mentioned here will not be supported by our solution. 

So I opt for closing this issue as "fixed" once the PDF import is available and
then create a new issue for importing/converting the imported Draw file into
Writer. This will show us how high the demand for this might be - the high
number of votes we have in this issue surely is caused by the complete absence
of any PDF import.

Comment 48 marshall 2008-04-21 15:28:52 UTC
Many people will loose touch with the issue when it will be closed as fixed and
new issue created. Moreover, many viewpoints are already filed here and do not
need to be rediscovered. 

As for the issue itself, i understand that what can be done at time of 3.0 and
what was already decided is that PDF will be imported as drawing documents
containing text frames. Until this is completed in version 3.0 there is no
further need to debate further future of this function. 
Comment 49 kpalagin 2008-04-23 08:36:26 UTC
Any estimate (like milestone or date) when this capability will get into 
builds? 

WBR,
KP.
Comment 50 Mathias_Bauer 2008-04-24 07:36:13 UTC
The problem with keeping this issue open is that its high vote count is
misleading - it's very probable that many users will be pleases with what we
will have in 3.0. By closing this issue we will get a better and more reliable
feedback about whether people miss import into Writer or not.

And besides that: people asked for pdf import and we have done it. So basically
this issue is fixed. Normal business.
Comment 51 kpalagin 2008-04-24 08:04:36 UTC
I guess I have found wrong issue then.
Which one is correct for tracking implemented pdf import?

Thanks.
Comment 52 marshall 2008-04-24 10:11:10 UTC
http://eis.services.openoffice.org/EIS2/cws.ShowCWS?Path=DEV300%2Fpdfimport

mba: so according to the above, the pdfimport is in the latest snapshot and no
further development until 3.0 release is expected. is that a correct assumption?

if so, then lets close this issue and wait for the outcome.
Comment 53 Mathias_Bauer 2008-04-25 12:48:51 UTC
marshall: yes, with the exception of bug fixes the filter that now is available
in the current developer release will be what will be in the final version.
Comment 54 kpalagin 2008-04-25 13:06:05 UTC
I am sorry to ask stupid question, but how do I import pdf? 
Using m10 I can't find way of doing that.

Comment 55 discoleo 2008-04-25 14:12:33 UTC
I would like to test the import fist before more extensive comments, but
limiting the import only to Draw is a little bit disappointing.

Considering that there are currently some serious troubles with inserting Draw
objects in Writer documents and the fact that most pdf's are actually textual
information, this is rather very limiting.

But I'll test it first, if it becomes available.
Comment 56 Mathias_Bauer 2008-04-25 14:33:42 UTC
kpalagin: What do you want to track? The first version of the filter is
integrated and I assume that further enhancements or bug fixes will be done with
new issues.

At the moment it's unclear if and when we will have resources to implement an
import into *Writer* and how this should be done. The simplest approach would be
to let Writer import odg files - that would be nice not only for pdf import.
OTOH I was told that a direct import into Writer could create a slightly better
quality. The question remains if the effort to implement that is judged by that.
Perhaps closing this issue and waiting for further feedback could help.
Comment 57 norbert2 2008-04-25 16:16:52 UTC
"The first version of the filter is integrated..."

I have installed DEV300_m10 Linux build. But I cannot find any way to import a
PDF file ()PDF is not listed in the file open request window. Could someone
please explain how to access this filter?
Comment 58 clippka 2008-04-25 16:27:07 UTC
the pdf import for impress is an extension, it will be available through 

http://extensions.services.openoffice.org/

I think there is currently one cws still in qa until the extension is usable, it
is planned to have it ready and uploaded for OOo 3.0 beta, which is around the
corner.
Comment 59 Mathias_Bauer 2008-04-25 17:05:12 UTC
cl: thanks for clarifying this, it didn't know about the missing CWS and thought
it was possible to build the filter in m10. So let's wait for the beta.

@discoleo: I don't deny that import into Writer could be useful, I just don't
think that all users asking for the filter and voting here think the same. So by
closing this issue and waiting for new input we can get a better estimation how
strong this requirement is.

BTW: I didn't suggest to *insert* drawing documents into Writer, I wanted to
suggest that Writer directly creates a text document from the odg file. This
could be an alternative and much cheaper solution. And it has the additional
advantage that we can get a text import into Writer for all drawing or impress
documents, including ppt files. Especially the latter was already required here
and there.
Comment 60 malvineous 2008-04-26 00:16:31 UTC
Well FWIW all I'm after is the ability to make minor corrections to PDFs -
fixing typos, adjusting the odd page number, etc.  If the PDF is imported as a
bunch of text boxes and other elements, this will suit me perfectly.

Having said that though, an import into Writer would certainly be nice, but I
always assumed that it would produce a less than perfect result (as some PDF
elements may not be able to be reproduced exactly in Writer), so for minor
alterations to PDFs the import into Draw suits me best, as hopefully it can
produce an identical looking PDF at the other end.
Comment 61 Mathias_Bauer 2008-04-28 10:24:29 UTC
malvineous: you correctly described the use case that the filter is designed
for: import a PDF so that it looks as similar as possible to the original.

This is not possible in Writer as a continuous text flow is in contradiction to
layout preservation. Writer would be the right choice for the use case that only
the content but not the layout of the PDF is important and that the content
should become editable flow text. IMHO this can be achieved by providing an
import filter odf->Writer and this still would be my suggestion for the next step. 

I know that others think that they can tweak the PDF import in Writer directly
to get slightly better layout stability but IMHO this is much effort for a still
not perfect solution, a bad cost-benefit ratio. OTOH an odg/odp import for
Writer would give us a more useful capabilities, e.g. ppt import and the like.

But these considerations should be done in a new issue. This one is too old and
too long already. It's hard to keep the overview and it would be better to start
anew with fresh minds and fresh ideas.
Comment 62 norbert2 2008-05-07 21:46:25 UTC
"So let's wait for the beta."

@mba:
I just have installed OOo 3.0.0 Beta and still cannot find any PDF import.
Comment 63 Mathias_Bauer 2008-05-11 17:02:16 UTC
Yes, it is my fault that I "omitted" an important detail (and cl meanwhile
corrected that here): you will need not only the beta but also the filter
*extension* and it seems that it still was not released. According to cl that
shouldn't last that long. I'm waiting too. :-)
Comment 64 joergwartenberg 2008-05-11 21:21:43 UTC
Will this extension be part of the normal OOo 3.0 distribution? Otherwise this
issue should stay in the state New, because OOo doesn't have this feature...
Comment 65 drichard 2008-05-12 14:36:58 UTC
I don't want to start any flames, just giving my viewpoint as the person
responsible for 750 OOo users.  This plugin to me should be included in the main
build.  I'm very nervous about adding and using extensions in general.   Once
you deploy something to users, you cannot ever roll it back or people get upset.
 The problem with this being extension is that if a newer version of OOo comes
out and fails to be tested or fails in the future, I'm going to have some angry
users.  Once deployed it will become a part of normal workflow for them. 
Extensions to me are little addons from users to aid in the use of OOo, and
should not be used for major features.
Comment 66 Mathias_Bauer 2008-05-13 09:31:32 UTC
Extensions can be deployed by an admin also. Users then won't see a difference
to a built-in filter. The advantage of extensions is that users can choose if
they want it or not. OOo was often critized to be "bloated". So we offer new
functionality as an extension if it is not obvious that everybody wants to have
it and if it is technically possible.

IMHO both preconditions are met in the pdf import filter.

In case you don't know it: extensions are installed by an admin and for all
users by using "unopkg add --shared name_of_extension.oxt".
Comment 67 norbert2 2008-05-14 20:33:52 UTC
"This plugin to me should be included in the main
build."

I guess that Sun wants to avoid license conflicts with Adobe.

Also I never have seen the OOo-PDF extension, it won't turn OOo into a real PDF
editor. It IS an inport/export solution. I suspect something like the solution
in the recent build of Inkscape.

I can't imagine that Adobe would have problems with such solutions.
Comment 68 joergwartenberg 2008-05-17 08:59:13 UTC
I couldn't found a specification of this feature. Just some old wiki pages.
Shouldn't a spec be available, 'before' the implementation of a new feature starts?
Comment 69 ooo 2008-06-06 13:18:55 UTC
set to duplicate of #i80285. Added this task number to the description field of
the mentioned reference task, just to preserve the feedback given in this task.

*** This issue has been marked as a duplicate of 80285 ***
Comment 70 ooo 2008-06-06 13:22:37 UTC
closing task