Apache OpenOffice (AOO) Bugzilla – Issue 26733
uncompressed dictionary of words appearing in document
Last modified: 2013-02-07 22:41:32 UTC
"It would be nice" if documents, spreadsheets, etcetera were saved with an uncompressed list of words appearing in the document. Since the list of words is uncompressed it would be easier to find all documents with specific words appearing in them. Alternatively, the entire user entered text part of a document could be saved uncompressed. This would have the advantage of giving the ability to search for phrases, but would expand the size of the document. (Check box options for one, the other, or neither?)
I don't think that it makes sense because it is very easy to look into a compressed archive (.jar) but may be the UserExperience has a different view.
In windows 2000 the xml is some kind of zip that Winzip can open. It could probably be searched with zgrep. If the xml is stored in .jar on another platform this is another good reason to have a portion of the xml containing words or phrases in uncompressed format. It will then be searchable without "inside" knowledge of the compression scheme used on the xml format (using grep on every platform).
Hi, this wish has been more or less solved by the current generation of file indexers (such as strigi) because they unpack archive files to index their internal files. Thanks!
To grep the issues easier via "requirements" I put the issues currently lying on my owner to the owner "requirements".