Apache OpenOffice (AOO) Bugzilla – Full Text Issue Listing
|Summary:||Specify what happens to the meta data of text document entities when the document is edited|
|Product:||Writer||Reporter:||Oliver-Rainer Wittmann <orw>|
|Status:||CLOSED FIXED||QA Contact:||issues@sw <issues>|
|Priority:||P3||CC:||discoleo, issues, mst.ooo, svante.schubert|
|Version:||OOo 3.0 Beta|
|Issue Type:||FEATURE||Latest Confirmation in:||---|
|Issue Depends on:|
Description Oliver-Rainer Wittmann 2008-07-11 07:54:40 UTC
Develop specification about the handling of meta data of text document entities represented in the Writer core (paragraphs, headings, bookmarks, lists, list items, etc.) when certain user actions are performed to such entities.
Comment 1 Oliver-Rainer Wittmann 2008-08-08 09:37:10 UTC
Done - please find the information here --> http://wiki.services.openoffice.org/wiki/Writer/Metadata_Support#Handling_of_metadata_on_editing.2Fupdating_an_text_document
Comment 2 discoleo 2008-08-08 09:58:24 UTC
Having read briefly the specification, there are some things that worry me. I might have misunderstood the specification. How does CUT/COPY/PASTE behave? More precisely, is the META-data lost? This would be bad. Although the meta-data content will be often accessed/extracted by external applications (scripts, other programs), user actions like CUT/COPY/PASTE need to preserve this meta-data, too. I hope this is kept in mind when implementing this issue. The point is, the major use of meta-data is in *reusing content*, so there is really no difference between manual reuse (cut/copy/paste) and automatised reuse (meta-data extraction by external applications). If the text-object looses its meta-data during cut/copy/paste than the most useful feature gets lost. The meta-data was a single use issue. Sometimes it might be necessary to strip away all/some meta-data, BUT this is a completely different issue (and needs to be addressed within the security framework). This is not an issue of meta-data per se. So, I would keep these 2 issues separate and always preserve the meta-data.
Comment 3 Oliver-Rainer Wittmann 2008-08-08 10:45:01 UTC
ad discoleo: Thanks for the fast feedback. I am not sure, if I understand your concern correctly. The metadata itself is not touched by actions cut/copy/paste of a certain entity having metadata regardless, if the action is performed via the user interface or the UNO-API. It is the metadata reference of the entities, which is affected by these actions. Think of the following text document: Text document containing two paragraphs, each having metadata. The paragraphs refer to its metadata via a metadata reference - mainly value of ODF property xml:id. Action 1: When the user cuts the first paragraph, the metadata are still in the RDF/RDFa repository and be accessed. But no entity of the text document no longer referencing this metadata. Action 2: The first paragraph is copied and pasted as a new third paragraph. The newly inserted third paragraph will lose its metadata reference, because the referenced metadata is already referenced by the first paragraph. Action 3: The first paragraph is cut and pasted as the new third paragraph. The newly inserted third paragraph will keep its metadata reference and still has metadata, because this metadata is still in the RDF/RDFa repository of the text document. Action 4: The first paragraph is copied and pasted as the first paragraph into a newly created text document. The newly inserted first paragraph in the new text document will have a metadata reference, but this metadata reference can not be resolved inside the new text document, because the RDF/RDFa repository of the new text document does not contain any metadata. I hope this explanation resolves your concerns. BTW, what we need to decide is what happens to metadata, which are no longer referenced (e.g. after action 1), when the text document is saved. Should the created ODF still contain such metadata or does the created ODF only contains metadata, which are referenced by a certain entity of the ODF? What is your opinion? What do you mean by "the major use of metadata is in reusing content"?
Comment 4 discoleo 2008-08-08 11:40:26 UTC
*Reusing Content* One of the main use-cases of meta-data is for re-using content. Lets explain this a little bit and I will take a dictionary-style example where we define the meaning of a new word using meta-data. [Actual use cases are likely to be well beyond this very simple/trivial example.] Now, we have the first document where we have this new word and the meta-data. However, I might i.) copy this new word to a new document or ii.) within the same document, and I wish that it's meta-data is preserved: i.) the meta-data should be copied to the new document ii.) the 2nd copy should point to the existing meta-data Lets take a more realistic example: we may have a diagnostic study performed, and the lab reports the results. These results may contain a lot of meta-data (like Lab-specific parameters, machine type, test conditions, ...). Now, I receive these results and wish to write a new document which contains these results. I will copy/paste the desired data and I would welcome that the meta-data is copied as well. Basically, copying the meta-data, too, has 2 roles: i.) it will flank, and therefore identify the specific data in the 2nd document (so it can be automatically detected in this document, too) ii.) it conveys additional information, not present in the visible text Now, lets go back to the Actions in the previous e-mail. I still might misunderstand this feature (or meta-data more globally), but this is *my meaning* of meta-data (or the way I find it most useful). Action 1: CUT Nothing points to the meta-data, but after paste, the text-object should point again at this meta-data (either left in place, or copied to the new document IF it was pasted to a new document) Action 2: COPY and PASTE as new paragraph As we can't predict which paragraph is important (will be re-used for its content/meta-data), both paragraphs should continue to point to the same meta-data. Especially, because manually copying data won't ensure that the right paragraph is copied. [IF the data was handled automatically, the parser could detect the paragraph that still has the meta-data attached to it, but this is not true for manual copy/paste.] Action 3: CUT and PASTE (meta-data is preserved) Action 4: COPY and PASTE in new document The meta-data stream should be copied to the new document and the paragraph shall point to this new copy. As I said, the content might traverse different documents: PROVIDER 1 => generates meta-data [document 1] => COPIED to document 2 [rather then CUT/PASTE, the COPY/PASTE is more likely] => COPIED to document 3 => ... All users will benefit from the meta-data, so all objects originating from the original object should reference the meta-data (or a copy of the meta-data). As I said, my expertise in meta-data is very limited, but this is what I understand from meta-data and how I imagine it being most useful. When nothing points to the meta-data anymore, then the best way to handle this is to delete the meta-data (should be undoable as long as the Undo is allowed; should be deleted completely after the undo is not possible anymore).
Comment 5 Oliver-Rainer Wittmann 2008-08-08 12:30:09 UTC
ad discoleo: I think I have got your request - I think it is a valid request, but I think that we can not fully support this request currently. > Now, we have the first document where we have this new word and the meta-data. > However, I might i.) copy this new word to a new document or ii.) within the >same document, and I wish that it's meta-data is preserved: > i.) the meta-data should be copied to the new document Currently, we can not support the copy of the meta-data. This has two reasons: - a technical one, which would be solvable. The current clipboard implementation lacks of a RDF/RDFa repository. - a complex one, on which MST can give more insight. The RDF data could be quite complex - it is a graph, as I was told. It is not an easy task to general identify, which RDF data has to be copied. > ii.) the 2nd copy should point to the existing meta-data This can not work, because the 2nd copy would have the same xml:id and a xml:id can be only assigned once to a certain entity. See the ODF 1.2 metadata specification, no two ODF elements can have the same value for its xml:id property. Your valid use case > PROVIDER 1 => generates meta-data [document 1] => COPIED to document 2 > [rather then CUT/PASTE, the COPY/PASTE is more likely] => > COPIED to document 3 => ... can be workaround by: provider 1 => generates meta-data [document 1] => open document 1 and save-as document 2 => open document 2, delete everything, which is not needed and save document 2 => open document 2 and save-as document 3 => open document 3, ...
Comment 6 mst.ooo 2008-08-08 13:27:58 UTC
Hi all, the main problem that i see with copying metadata is identifying which part of the repostiory should be considered the metadata that is "connected to" some ODF element. This is not trivial because the data model of RDF (which is the standard we use for metadata) is a graph. An ODF element may be mapped to a RDF URI, by way of its xml:id attribute. So you can basically construct arbitrary structures with RDF, and all that you know about the relationship to the ODF element that is copied is the mapped URI. If you, say, only copy the RDF statements that contain the URI of the element, then you will very likely copy too little. If you, say, copy the entire strongly connected component that the URI node is part of, as well as all nodes that are reachable from this scc, then you will likely copy too much (in the limit case, the entire graph). The problem is that we (the OO.org application) do not know what the _meaning_ of the metadata is (that is kind of the point of the whole enterprise). Deleting "unused" metadata has similar issues, of course. > > ii.) the 2nd copy should point to the existing meta-data > This can not work, because the 2nd copy would have the same xml:id and a xml:id > can be only assigned once to a certain entity. See the ODF 1.2 metadata > specification, no two ODF elements can have the same value for its xml:id > property. Well, almost; actually, the problem is that a single URI can be mapped only to a single xml:id in the manifest. The mapping between xml:id and URI is 1:1. This is (apparently) intentional.
Comment 7 discoleo 2008-08-08 13:49:05 UTC
Hello all, my view of the real-life scenario is like this: a simple content gets rarely passed by to another document as it is. More often, a new document is assmebled from various existing documents. Consider this: _ LAB 1 => generate document + meta-data | | LAB 2 => generate document + meta-data | | => LAB 1 => new data + meta-data | | RECEIVER => generate other content + meta-data | - RECEIVER will combine his own generated content with contents from the various LABS and will generate a new document containing this data and hopefully all the relevant meta-data along. This is a more realistic use-case. Then he passes this document along to a new RECEIVER who again will corroborate various sources to generate a new document with combined meta-data and passes it along to a new receiver (maybe back to the first one or another one, e.g. a hospital). With regard to pasting within the same document: I do not have a strong opinion about this, as in the field I work it won't occur that often. But copying the meta-data to a new document is something that in my opinion is very important and the main utility of meta-data.
Comment 8 Oliver-Rainer Wittmann 2008-08-08 14:41:47 UTC
ad discoleo: I see. In my opinion your use case is a valid one, which should be supported. But, I do not think that can be solved easily and in general. Please consider that the given specification should work in general, especially for metadata which are unknown to the application. If you have a certain extension installed, which manages a certain type of metadata, this extension will typically support the copying issue. At least I would expect that such an extension supports the user on this issue. Such an extension could also forbid editing or copying of the entities referencing its managed metadata. Again another workaround for your new use case: - Create new text document. - Use function "Menu - Insert - File" to include all needed documents. - Delete unwanted parts from the resulting document.
Comment 9 Oliver-Rainer Wittmann 2008-08-08 14:46:50 UTC
SUS and I adjusted the wording of specification.
Comment 10 Oliver-Rainer Wittmann 2008-10-28 11:24:19 UTC
made some adjustment to the handling of metadata on editing - see wiki. Some things needed to be changed to simplify implementation.
Comment 11 Oliver-Rainer Wittmann 2009-06-10 12:43:32 UTC
OD->SBA: Please check the created wiki page to verify this issue
Comment 12 stefan.baltzer 2009-06-17 09:29:59 UTC
Verified on WIKI page.
Comment 13 stefan.baltzer 2009-06-17 10:41:35 UTC
Adusting Target (CWS shifted to 3.2).